v0.7.0¶
Release Date: 2026-01-24
WebSocket STT migration to scribe_v2_realtime and TTS stream completion detection.
Highlights¶
- WebSocket STT migrated to scribe_v2_realtime with breaking changes to options and message formats
- WebSocket TTS stream completion via
Done()channel for reliable end-of-stream handling
Breaking Changes¶
This release contains breaking changes to the WebSocket STT API. See Upgrade Notes below for migration guidance.
| Before (v0.6.x) | After (v0.7.0) |
|---|---|
SampleRate + Encoding |
AudioFormat (e.g., "pcm_16000") |
EnableWordTimestamps |
IncludeTimestamps |
EnablePartials |
Removed (partials always enabled) |
EndStream() |
Commit() |
Model: scribe_v1 |
Model: scribe_v2_realtime |
New Features¶
WebSocket STT: scribe_v2_realtime API¶
The real-time STT service now uses the ElevenLabs scribe_v2_realtime API with query-parameter-based configuration and commit semantics for transcript finalization.
conn, err := client.WebSocketSTT().Connect(ctx, &elevenlabs.WebSocketSTTOptions{
ModelID: "scribe_v2_realtime",
AudioFormat: "pcm_16000",
IncludeTimestamps: true,
CommitStrategy: "manual", // or "vad"
})
if err != nil {
log.Fatal(err)
}
defer conn.Close()
// Send audio
conn.SendAudio(audioChunk)
// Commit to finalize transcript
conn.Commit()
New options:
AudioFormat— unified format string (e.g.,pcm_16000,pcm_44100,ulaw_8000)IncludeTimestamps— word-level timing informationIncludeLanguageDetection— detected language in responsesCommitStrategy—"manual"(explicit commit) or"vad"(voice activity detection)- VAD settings:
VADSilenceThresholdSecs,VADThreshold,MinSpeechDurationMs,MinSilenceDurationMs
New methods:
Commit()— finalize the current transcript segmentSendAudioWithCommit(audio, commit)— send audio with optional commitSessionID()— retrieve server-assigned session ID
WebSocket TTS: Stream Completion Detection¶
The Done() channel signals when all audio has been received after Flush():
conn.SendText("Hello, world!")
conn.Flush()
// Wait for all audio to be received
for {
select {
case audio, ok := <-conn.Audio():
if !ok {
return
}
player.Write(audio)
case <-conn.Done():
// All audio received after flush
for audio := range conn.Audio() {
player.Write(audio)
}
return
}
}
OmniVoice Conformance Tests¶
Provider conformance tests validate that ElevenLabs TTS and STT providers correctly implement the OmniVoice interfaces. Run with:
Installation¶
Upgrade Notes¶
STT Migration Guide¶
1. Update options struct:
// Before
opts := &elevenlabs.WebSocketSTTOptions{
ModelID: "scribe_v1",
SampleRate: 16000,
Encoding: "pcm_s16le",
EnablePartials: true,
EnableWordTimestamps: true,
}
// After
opts := &elevenlabs.WebSocketSTTOptions{
ModelID: "scribe_v2_realtime",
AudioFormat: "pcm_16000",
IncludeTimestamps: true,
CommitStrategy: "manual",
}
2. Replace EndStream() with Commit():
3. Update word access in transcripts:
4. Audio format mapping:
Old (SampleRate + Encoding) |
New (AudioFormat) |
|---|---|
8000 + pcm_s16le |
pcm_8000 |
16000 + pcm_s16le |
pcm_16000 |
22050 + pcm_s16le |
pcm_22050 |
24000 + pcm_s16le |
pcm_24000 |
44100 + pcm_s16le |
pcm_44100 |
48000 + pcm_s16le |
pcm_48000 |
8000 + pcm_mulaw |
ulaw_8000 |
Dependencies¶
- Bump
github.com/agentplexus/omnivoicefrom v0.2.0 to v0.3.0