Skip to content

v0.7.0

Release Date: 2026-01-24

WebSocket STT migration to scribe_v2_realtime and TTS stream completion detection.

Highlights

  • WebSocket STT migrated to scribe_v2_realtime with breaking changes to options and message formats
  • WebSocket TTS stream completion via Done() channel for reliable end-of-stream handling

Breaking Changes

This release contains breaking changes to the WebSocket STT API. See Upgrade Notes below for migration guidance.

Before (v0.6.x) After (v0.7.0)
SampleRate + Encoding AudioFormat (e.g., "pcm_16000")
EnableWordTimestamps IncludeTimestamps
EnablePartials Removed (partials always enabled)
EndStream() Commit()
Model: scribe_v1 Model: scribe_v2_realtime

New Features

WebSocket STT: scribe_v2_realtime API

The real-time STT service now uses the ElevenLabs scribe_v2_realtime API with query-parameter-based configuration and commit semantics for transcript finalization.

conn, err := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
    ModelID:           "scribe_v2_realtime",
    AudioFormat:       "pcm_16000",
    IncludeTimestamps: true,
    CommitStrategy:    "manual", // or "vad"
})
if err != nil {
    log.Fatal(err)
}
defer conn.Close()

// Send audio
conn.SendAudio(audioChunk)

// Commit to finalize transcript
conn.Commit()

New options:

  • AudioFormat — unified format string (e.g., pcm_16000, pcm_44100, ulaw_8000)
  • IncludeTimestamps — word-level timing information
  • IncludeLanguageDetection — detected language in responses
  • CommitStrategy"manual" (explicit commit) or "vad" (voice activity detection)
  • VAD settings: VADSilenceThresholdSecs, VADThreshold, MinSpeechDurationMs, MinSilenceDurationMs

New methods:

  • Commit() — finalize the current transcript segment
  • SendAudioWithCommit(audio, commit) — send audio with optional commit
  • SessionID() — retrieve server-assigned session ID

WebSocket TTS: Stream Completion Detection

The Done() channel signals when all audio has been received after Flush():

conn.SendText("Hello, world!")
conn.Flush()

// Wait for all audio to be received
for {
    select {
    case audio, ok := <-conn.Audio():
        if !ok {
            return
        }
        player.Write(audio)
    case <-conn.Done():
        // All audio received after flush
        for audio := range conn.Audio() {
            player.Write(audio)
        }
        return
    }
}

OmniVoice Conformance Tests

Provider conformance tests validate that ElevenLabs TTS and STT providers correctly implement the OmniVoice interfaces. Run with:

ELEVENLABS_API_KEY=your-key go test ./omnivoice/...

Installation

go get github.com/agentplexus/go-elevenlabs@v0.7.0

Upgrade Notes

STT Migration Guide

1. Update options struct:

// Before
opts := &elevenlabs.WebSocketSTTOptions{
    ModelID:              "scribe_v1",
    SampleRate:           16000,
    Encoding:             "pcm_s16le",
    EnablePartials:       true,
    EnableWordTimestamps: true,
}

// After
opts := &elevenlabs.WebSocketSTTOptions{
    ModelID:           "scribe_v2_realtime",
    AudioFormat:       "pcm_16000",
    IncludeTimestamps: true,
    CommitStrategy:    "manual",
}

2. Replace EndStream() with Commit():

// Before
conn.EndStream()

// After
conn.Commit()

3. Update word access in transcripts:

// Before
word.Word
word.Confidence

// After
word.Text
// Confidence field removed

4. Audio format mapping:

Old (SampleRate + Encoding) New (AudioFormat)
8000 + pcm_s16le pcm_8000
16000 + pcm_s16le pcm_16000
22050 + pcm_s16le pcm_22050
24000 + pcm_s16le pcm_24000
44100 + pcm_s16le pcm_44100
48000 + pcm_s16le pcm_48000
8000 + pcm_mulaw ulaw_8000

Dependencies

  • Bump github.com/agentplexus/omnivoice from v0.2.0 to v0.3.0