v0.7.0¶

Release Date: 2026-01-24

WebSocket STT migration to scribe_v2_realtime and TTS stream completion detection.

Highlights¶

WebSocket STT migrated to scribe_v2_realtime with breaking changes to options and message formats
WebSocket TTS stream completion via Done() channel for reliable end-of-stream handling

Breaking Changes¶

This release contains breaking changes to the WebSocket STT API. See Upgrade Notes below for migration guidance.

Before (v0.6.x)	After (v0.7.0)
`SampleRate` + `Encoding`	`AudioFormat` (e.g., `"pcm_16000"`)
`EnableWordTimestamps`	`IncludeTimestamps`
`EnablePartials`	Removed (partials always enabled)
`EndStream()`	`Commit()`
Model: `scribe_v1`	Model: `scribe_v2_realtime`

New Features¶

WebSocket STT: scribe_v2_realtime API¶

The real-time STT service now uses the ElevenLabs scribe_v2_realtime API with query-parameter-based configuration and commit semantics for transcript finalization.

conn, err := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
    ModelID:           "scribe_v2_realtime",
    AudioFormat:       "pcm_16000",
    IncludeTimestamps: true,
    CommitStrategy:    "manual", // or "vad"
})
if err != nil {
    log.Fatal(err)
}
defer conn.Close()

// Send audio
conn.SendAudio(audioChunk)

// Commit to finalize transcript
conn.Commit()

New options:

AudioFormat — unified format string (e.g., pcm_16000, pcm_44100, ulaw_8000)
IncludeTimestamps — word-level timing information
IncludeLanguageDetection — detected language in responses
CommitStrategy — "manual" (explicit commit) or "vad" (voice activity detection)
VAD settings: VADSilenceThresholdSecs, VADThreshold, MinSpeechDurationMs, MinSilenceDurationMs

New methods:

Commit() — finalize the current transcript segment
SendAudioWithCommit(audio, commit) — send audio with optional commit
SessionID() — retrieve server-assigned session ID

WebSocket TTS: Stream Completion Detection¶

The Done() channel signals when all audio has been received after Flush():

conn.SendText("Hello, world!")
conn.Flush()

// Wait for all audio to be received
for {
    select {
    case audio, ok := <-conn.Audio():
        if !ok {
            return
        }
        player.Write(audio)
    case <-conn.Done():
        // All audio received after flush
        for audio := range conn.Audio() {
            player.Write(audio)
        }
        return
    }
}

OmniVoice Conformance Tests¶

Provider conformance tests validate that ElevenLabs TTS and STT providers correctly implement the OmniVoice interfaces. Run with:

ELEVENLABS_API_KEY=your-key go test ./omnivoice/...

Installation¶

go get github.com/agentplexus/go-elevenlabs@v0.7.0

Upgrade Notes¶

STT Migration Guide¶

1. Update options struct:

// Before
opts := &elevenlabs.WebSocketSTTOptions{
    ModelID:              "scribe_v1",
    SampleRate:           16000,
    Encoding:             "pcm_s16le",
    EnablePartials:       true,
    EnableWordTimestamps: true,
}

// After
opts := &elevenlabs.WebSocketSTTOptions{
    ModelID:           "scribe_v2_realtime",
    AudioFormat:       "pcm_16000",
    IncludeTimestamps: true,
    CommitStrategy:    "manual",
}

2. Replace EndStream() with Commit():

// Before
conn.EndStream()

// After
conn.Commit()

3. Update word access in transcripts:

// Before
word.Word
word.Confidence

// After
word.Text
// Confidence field removed

4. Audio format mapping:

Old (`SampleRate` + `Encoding`)	New (`AudioFormat`)
8000 + `pcm_s16le`	`pcm_8000`
16000 + `pcm_s16le`	`pcm_16000`
22050 + `pcm_s16le`	`pcm_22050`
24000 + `pcm_s16le`	`pcm_24000`
44100 + `pcm_s16le`	`pcm_44100`
48000 + `pcm_s16le`	`pcm_48000`
8000 + `pcm_mulaw`	`ulaw_8000`

Dependencies¶

Bump github.com/agentplexus/omnivoice from v0.2.0 to v0.3.0