WebSocket TTS¶
Real-time text-to-speech streaming via WebSocket for low-latency voice synthesis.
Overview¶
The WebSocket TTS service enables streaming text to speech in real-time, making it ideal for:
- LLM Integration: Stream text from language models as it's generated
- Interactive Applications: Voice assistants, chatbots, real-time narration
- Low Latency: Get audio output before the full text is available
Basic Usage¶
// Connect to WebSocket TTS
conn, err := client.WebSocketTTS().Connect(ctx, voiceID, nil)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// Send text
conn.SendText("Hello, ")
conn.SendText("this is streaming ")
conn.SendText("text to speech!")
// Flush to finalize
conn.Flush()
// Receive audio chunks
for audio := range conn.Audio() {
// Play or save audio chunks
player.Write(audio)
}
With Options¶
opts := &elevenlabs.WebSocketTTSOptions{
// Use turbo model for lowest latency
ModelID: "eleven_turbo_v2_5",
// PCM format for real-time playback
OutputFormat: "pcm_16000",
// Latency optimization (0-4, higher = faster but lower quality)
OptimizeStreamingLatency: 3,
// Enable SSML parsing
EnableSSMLParsing: true,
// Voice settings
VoiceSettings: &elevenlabs.VoiceSettings{
Stability: 0.5,
SimilarityBoost: 0.75,
},
}
conn, err := client.WebSocketTTS().Connect(ctx, voiceID, opts)
Streaming from LLM¶
// Connect to TTS
conn, err := client.WebSocketTTS().Connect(ctx, voiceID, &elevenlabs.WebSocketTTSOptions{
ModelID: "eleven_turbo_v2_5",
OutputFormat: "pcm_16000",
OptimizeStreamingLatency: 3,
})
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// Stream LLM output to TTS
go func() {
for chunk := range llmOutputStream {
if err := conn.SendText(chunk); err != nil {
log.Printf("send error: %v", err)
return
}
}
conn.Flush()
}()
// Play audio as it arrives
for audio := range conn.Audio() {
audioPlayer.Write(audio)
}
Using StreamText Helper¶
// Create a channel of text chunks
textStream := make(chan string)
// Start streaming (this handles flushing automatically)
audioOut, errOut := conn.StreamText(ctx, textStream)
// Send text chunks
go func() {
defer close(textStream)
textStream <- "Hello, "
textStream <- "world!"
}()
// Receive audio
for audio := range audioOut {
// Process audio
}
// Check for errors
if err := <-errOut; err != nil {
log.Printf("streaming error: %v", err)
}
Word Alignments¶
// Receive word-level timing
go func() {
for align := range conn.Alignments() {
for i, char := range align.Characters {
fmt.Printf("%s: %.3fs - %.3fs\n",
char,
align.CharacterStart[i],
align.CharacterEnd[i])
}
}
}()
Error Handling¶
// Monitor errors
go func() {
for err := range conn.Errors() {
log.Printf("WebSocket error: %v", err)
}
}()
Stream Completion Behavior¶
ElevenLabs WebSocket TTS does not send an explicit "end of stream" signal. After calling Flush(), the server generates any remaining audio and then waits for more input. If no input arrives within the inactivity timeout (default 20 seconds), the server sends an input_timeout_exceeded error and closes the connection.
This behavior has implications for detecting when audio generation is complete:
Default Behavior¶
With the default 20-second timeout, your application will wait up to 20 seconds after the last audio chunk before the connection closes:
conn.Flush()
// This loop will block for up to 20 seconds after last audio
for audio := range conn.Audio() {
player.Write(audio)
}
Faster Completion Detection¶
For applications that need faster stream completion, set a shorter InactivityTimeout and treat the timeout as successful completion:
opts := &elevenlabs.WebSocketTTSOptions{
ModelID: "eleven_turbo_v2_5",
OutputFormat: "pcm_16000",
InactivityTimeout: 5, // 5 seconds instead of 20
}
conn, err := client.WebSocketTTS().Connect(ctx, voiceID, opts)
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// Send text and flush
conn.SendText("Hello, world!")
conn.Flush()
// Use Done() channel to detect completion
var receivedAudio bool
for {
select {
case audio, ok := <-conn.Audio():
if !ok {
return // Channel closed
}
receivedAudio = true
player.Write(audio)
case <-conn.Done():
// All audio received after flush
return
case err := <-conn.Errors():
// Treat timeout as success if we received audio
if receivedAudio && strings.Contains(err.Error(), "input_timeout_exceeded") {
return // Stream completed successfully
}
log.Printf("error: %v", err)
return
}
}
OmniVoice Provider¶
The OmniVoice TTS provider handles this automatically by setting a 5-second inactivity timeout and treating the timeout as successful completion when audio was received after flush.
Options Reference¶
| Option | Type | Default | Description |
|---|---|---|---|
ModelID |
string | eleven_turbo_v2_5 |
TTS model to use |
OutputFormat |
string | pcm_16000 |
Audio format |
VoiceSettings |
*VoiceSettings | nil | Voice parameters |
OptimizeStreamingLatency |
int | 3 | Latency vs quality (0-4) |
EnableSSMLParsing |
bool | false | Parse SSML in text |
LanguageCode |
string | "" | ISO language code |
ChunkLengthSchedule |
[]int | nil | Custom chunking |
InactivityTimeout |
int | 20 | Timeout in seconds |
Output Formats¶
For real-time playback, PCM formats are recommended:
pcm_16000- 16kHz PCM (lowest latency)pcm_22050- 22.05kHz PCMpcm_24000- 24kHz PCMpcm_44100- 44.1kHz PCM (highest quality)
MP3 formats are also available but add encoding latency:
mp3_44100_64- 64kbps MP3mp3_44100_128- 128kbps MP3mp3_44100_192- 192kbps MP3