Skip to content

v0.3.0

Release Date: December 2024

Real-time streaming services and phone integration for voice agents.

Highlights

This release adds real-time WebSocket services and Twilio phone integration, enabling the SDK to power conversational AI voice agents.

New Services

WebSocket TTS (Real-Time Text-to-Speech)

Low-latency streaming text-to-speech via WebSocket, ideal for LLM integration.

conn, _ := client.WebSocketTTS().Connect(ctx, voiceID, &elevenlabs.WebSocketTTSOptions{
    ModelID:                  "eleven_turbo_v2_5",
    OutputFormat:             "pcm_16000",
    OptimizeStreamingLatency: 3,
})
defer conn.Close()

// Stream text from LLM
for text := range llmOutput {
    conn.SendText(text)
}
conn.Flush()

// Receive audio chunks
for audio := range conn.Audio() {
    player.Write(audio)
}

Features:

  • Stream text as it arrives (perfect for LLM output)
  • Configurable latency optimization (0-4)
  • Word-level alignment timestamps
  • SSML parsing support
  • Multiple output formats (PCM recommended for real-time)

WebSocket STT (Real-Time Speech-to-Text)

Live audio transcription via WebSocket with partial results.

conn, _ := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
    SampleRate:           16000,
    EnablePartials:       true,
    EnableWordTimestamps: true,
})
defer conn.Close()

// Send audio from microphone
go func() {
    for chunk := range micInput {
        conn.SendAudio(chunk)
    }
    conn.EndStream()
}()

// Receive transcripts
for transcript := range conn.Transcripts() {
    if transcript.IsFinal {
        fmt.Println("Final:", transcript.Text)
    }
}

Features:

  • Partial (interim) results for responsive UIs
  • Word-level timing with confidence scores
  • Automatic language detection
  • Multiple audio encodings (PCM, μ-law)

Speech-to-Speech (Voice Conversion)

Transform speech from one voice to another while preserving content.

resp, _ := client.SpeechToSpeech().Convert(ctx, &elevenlabs.SpeechToSpeechRequest{
    VoiceID:               targetVoiceID,
    Audio:                 sourceAudio,
    RemoveBackgroundNoise: true,
})

// Simple one-liner
audio, _ := client.SpeechToSpeech().Simple(ctx, voiceID, audioReader)

Features:

  • Voice conversion with content preservation
  • Background noise removal
  • Streaming conversion support
  • Seed audio for consistent style

Twilio Integration

Phone call integration for conversational AI agents.

// Register incoming call
resp, _ := client.Twilio().RegisterCall(ctx, &elevenlabs.TwilioRegisterCallRequest{
    AgentID: "your-agent-id",
    DynamicVariables: map[string]string{
        "caller_name": callerInfo.Name,
    },
})
// Return resp.TwiML to Twilio

// Make outbound call
call, _ := client.Twilio().OutboundCall(ctx, &elevenlabs.TwilioOutboundCallRequest{
    AgentID:            "your-agent-id",
    AgentPhoneNumberID: "phone-number-id",
    ToNumber:           "+1234567890",
})

Features:

  • Incoming call registration with TwiML response
  • Outbound calls via Twilio
  • SIP trunk integration
  • Dynamic variables for prompt injection
  • First message and system prompt overrides

Phone Number Management

Manage phone numbers for voice agents.

numbers, _ := client.PhoneNumbers().List(ctx)
number, _ := client.PhoneNumbers().Get(ctx, phoneID)
updated, _ := client.PhoneNumbers().Update(ctx, phoneID, &elevenlabs.UpdatePhoneNumberRequest{
    Label:   "Support Line",
    AgentID: agentID,
})
_ = client.PhoneNumbers().Delete(ctx, phoneID)

New Examples

Four new runnable examples demonstrating the real-time services:

Example Description
examples/websocket-tts/ Real-time TTS streaming with LLM integration patterns
examples/websocket-stt/ Live transcription with partial results and word timing
examples/speech-to-speech/ Voice conversion with streaming and seed audio
examples/twilio/ HTTP server for phone call handling

All examples use context-based structured logging with slog via slogutil.ContextWithLogger() for request-scoped logging.

See Examples for usage instructions.

New Dependencies

  • github.com/gorilla/websocket v1.5.3 - WebSocket support

Documentation

  • 4 new service documentation pages
  • Updated API coverage (now ~75 methods covered)
  • New "Real-Time" section in documentation navigation
  • Updated README with real-time service examples

API Coverage Update

Category Status
WebSocket TTS ✓ Full
WebSocket STT ✓ Full
Speech-to-Speech ✓ Full
Phone/Twilio ✓ Partial (7 methods)

Use Cases

This release enables building:

  • Voice Agents - Conversational AI with real-time TTS/STT
  • Phone Bots - Automated phone call handling
  • LLM Voice Apps - Stream LLM output directly to speech
  • Live Transcription - Real-time audio transcription
  • Voice Changers - Real-time voice conversion

Installation

go get github.com/agentplexus/go-elevenlabs@v0.3.0

Upgrade Notes

This release is backward compatible. No changes required for existing code.

New features require the gorilla/websocket dependency which is automatically installed.