v0.3.0¶
Release Date: December 2024
Real-time streaming services and phone integration for voice agents.
Highlights¶
This release adds real-time WebSocket services and Twilio phone integration, enabling the SDK to power conversational AI voice agents.
New Services¶
WebSocket TTS (Real-Time Text-to-Speech)¶
Low-latency streaming text-to-speech via WebSocket, ideal for LLM integration.
conn, _ := client.WebSocketTTS().Connect(ctx, voiceID, &elevenlabs.WebSocketTTSOptions{
ModelID: "eleven_turbo_v2_5",
OutputFormat: "pcm_16000",
OptimizeStreamingLatency: 3,
})
defer conn.Close()
// Stream text from LLM
for text := range llmOutput {
conn.SendText(text)
}
conn.Flush()
// Receive audio chunks
for audio := range conn.Audio() {
player.Write(audio)
}
Features:
- Stream text as it arrives (perfect for LLM output)
- Configurable latency optimization (0-4)
- Word-level alignment timestamps
- SSML parsing support
- Multiple output formats (PCM recommended for real-time)
WebSocket STT (Real-Time Speech-to-Text)¶
Live audio transcription via WebSocket with partial results.
conn, _ := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
SampleRate: 16000,
EnablePartials: true,
EnableWordTimestamps: true,
})
defer conn.Close()
// Send audio from microphone
go func() {
for chunk := range micInput {
conn.SendAudio(chunk)
}
conn.EndStream()
}()
// Receive transcripts
for transcript := range conn.Transcripts() {
if transcript.IsFinal {
fmt.Println("Final:", transcript.Text)
}
}
Features:
- Partial (interim) results for responsive UIs
- Word-level timing with confidence scores
- Automatic language detection
- Multiple audio encodings (PCM, μ-law)
Speech-to-Speech (Voice Conversion)¶
Transform speech from one voice to another while preserving content.
resp, _ := client.SpeechToSpeech().Convert(ctx, &elevenlabs.SpeechToSpeechRequest{
VoiceID: targetVoiceID,
Audio: sourceAudio,
RemoveBackgroundNoise: true,
})
// Simple one-liner
audio, _ := client.SpeechToSpeech().Simple(ctx, voiceID, audioReader)
Features:
- Voice conversion with content preservation
- Background noise removal
- Streaming conversion support
- Seed audio for consistent style
Twilio Integration¶
Phone call integration for conversational AI agents.
// Register incoming call
resp, _ := client.Twilio().RegisterCall(ctx, &elevenlabs.TwilioRegisterCallRequest{
AgentID: "your-agent-id",
DynamicVariables: map[string]string{
"caller_name": callerInfo.Name,
},
})
// Return resp.TwiML to Twilio
// Make outbound call
call, _ := client.Twilio().OutboundCall(ctx, &elevenlabs.TwilioOutboundCallRequest{
AgentID: "your-agent-id",
AgentPhoneNumberID: "phone-number-id",
ToNumber: "+1234567890",
})
Features:
- Incoming call registration with TwiML response
- Outbound calls via Twilio
- SIP trunk integration
- Dynamic variables for prompt injection
- First message and system prompt overrides
Phone Number Management¶
Manage phone numbers for voice agents.
numbers, _ := client.PhoneNumbers().List(ctx)
number, _ := client.PhoneNumbers().Get(ctx, phoneID)
updated, _ := client.PhoneNumbers().Update(ctx, phoneID, &elevenlabs.UpdatePhoneNumberRequest{
Label: "Support Line",
AgentID: agentID,
})
_ = client.PhoneNumbers().Delete(ctx, phoneID)
New Examples¶
Four new runnable examples demonstrating the real-time services:
| Example | Description |
|---|---|
examples/websocket-tts/ |
Real-time TTS streaming with LLM integration patterns |
examples/websocket-stt/ |
Live transcription with partial results and word timing |
examples/speech-to-speech/ |
Voice conversion with streaming and seed audio |
examples/twilio/ |
HTTP server for phone call handling |
All examples use context-based structured logging with slog via slogutil.ContextWithLogger() for request-scoped logging.
See Examples for usage instructions.
New Dependencies¶
github.com/gorilla/websocket v1.5.3- WebSocket support
Documentation¶
- 4 new service documentation pages
- Updated API coverage (now ~75 methods covered)
- New "Real-Time" section in documentation navigation
- Updated README with real-time service examples
API Coverage Update¶
| Category | Status |
|---|---|
| WebSocket TTS | ✓ Full |
| WebSocket STT | ✓ Full |
| Speech-to-Speech | ✓ Full |
| Phone/Twilio | ✓ Partial (7 methods) |
Use Cases¶
This release enables building:
- Voice Agents - Conversational AI with real-time TTS/STT
- Phone Bots - Automated phone call handling
- LLM Voice Apps - Stream LLM output directly to speech
- Live Transcription - Real-time audio transcription
- Voice Changers - Real-time voice conversion
Installation¶
Upgrade Notes¶
This release is backward compatible. No changes required for existing code.
New features require the gorilla/websocket dependency which is automatically installed.