v0.3.0¶

Release Date: December 2024

Real-time streaming services and phone integration for voice agents.

Highlights¶

This release adds real-time WebSocket services and Twilio phone integration, enabling the SDK to power conversational AI voice agents.

New Services¶

WebSocket TTS (Real-Time Text-to-Speech)¶

Low-latency streaming text-to-speech via WebSocket, ideal for LLM integration.

conn, _ := client.WebSocketTTS().Connect(ctx, voiceID, &elevenlabs.WebSocketTTSOptions{
    ModelID:                  "eleven_turbo_v2_5",
    OutputFormat:             "pcm_16000",
    OptimizeStreamingLatency: 3,
})
defer conn.Close()

// Stream text from LLM
for text := range llmOutput {
    conn.SendText(text)
}
conn.Flush()

// Receive audio chunks
for audio := range conn.Audio() {
    player.Write(audio)
}

Features:

Stream text as it arrives (perfect for LLM output)
Configurable latency optimization (0-4)
Word-level alignment timestamps
SSML parsing support
Multiple output formats (PCM recommended for real-time)

WebSocket STT (Real-Time Speech-to-Text)¶

Live audio transcription via WebSocket with partial results.

conn, _ := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
    SampleRate:           16000,
    EnablePartials:       true,
    EnableWordTimestamps: true,
})
defer conn.Close()

// Send audio from microphone
go func() {
    for chunk := range micInput {
        conn.SendAudio(chunk)
    }
    conn.EndStream()
}()

// Receive transcripts
for transcript := range conn.Transcripts() {
    if transcript.IsFinal {
        fmt.Println("Final:", transcript.Text)
    }
}

Features:

Partial (interim) results for responsive UIs
Word-level timing with confidence scores
Automatic language detection
Multiple audio encodings (PCM, μ-law)

Speech-to-Speech (Voice Conversion)¶

Transform speech from one voice to another while preserving content.

resp, _ := client.SpeechToSpeech().Convert(ctx, &elevenlabs.SpeechToSpeechRequest{
    VoiceID:               targetVoiceID,
    Audio:                 sourceAudio,
    RemoveBackgroundNoise: true,
})

// Simple one-liner
audio, _ := client.SpeechToSpeech().Simple(ctx, voiceID, audioReader)

Features:

Voice conversion with content preservation
Background noise removal
Streaming conversion support
Seed audio for consistent style

Twilio Integration¶

Phone call integration for conversational AI agents.

// Register incoming call
resp, _ := client.Twilio().RegisterCall(ctx, &elevenlabs.TwilioRegisterCallRequest{
    AgentID: "your-agent-id",
    DynamicVariables: map[string]string{
        "caller_name": callerInfo.Name,
    },
})
// Return resp.TwiML to Twilio

// Make outbound call
call, _ := client.Twilio().OutboundCall(ctx, &elevenlabs.TwilioOutboundCallRequest{
    AgentID:            "your-agent-id",
    AgentPhoneNumberID: "phone-number-id",
    ToNumber:           "+1234567890",
})

Features:

Incoming call registration with TwiML response
Outbound calls via Twilio
SIP trunk integration
Dynamic variables for prompt injection
First message and system prompt overrides

Phone Number Management¶

Manage phone numbers for voice agents.

numbers, _ := client.PhoneNumbers().List(ctx)
number, _ := client.PhoneNumbers().Get(ctx, phoneID)
updated, _ := client.PhoneNumbers().Update(ctx, phoneID, &elevenlabs.UpdatePhoneNumberRequest{
    Label:   "Support Line",
    AgentID: agentID,
})
_ = client.PhoneNumbers().Delete(ctx, phoneID)

New Examples¶

Four new runnable examples demonstrating the real-time services:

Example	Description
`examples/websocket-tts/`	Real-time TTS streaming with LLM integration patterns
`examples/websocket-stt/`	Live transcription with partial results and word timing
`examples/speech-to-speech/`	Voice conversion with streaming and seed audio
`examples/twilio/`	HTTP server for phone call handling

All examples use context-based structured logging with slog via slogutil.ContextWithLogger() for request-scoped logging.

See Examples for usage instructions.

New Dependencies¶

github.com/gorilla/websocket v1.5.3 - WebSocket support

Documentation¶

4 new service documentation pages
Updated API coverage (now ~75 methods covered)
New "Real-Time" section in documentation navigation
Updated README with real-time service examples

API Coverage Update¶

Category	Status
WebSocket TTS	✓ Full
WebSocket STT	✓ Full
Speech-to-Speech	✓ Full
Phone/Twilio	✓ Partial (7 methods)

Use Cases¶

This release enables building:

Voice Agents - Conversational AI with real-time TTS/STT
Phone Bots - Automated phone call handling
LLM Voice Apps - Stream LLM output directly to speech
Live Transcription - Real-time audio transcription
Voice Changers - Real-time voice conversion

Installation¶

go get github.com/agentplexus/go-elevenlabs@v0.3.0

Upgrade Notes¶

This release is backward compatible. No changes required for existing code.

New features require the gorilla/websocket dependency which is automatically installed.