OmniVoice Capabilities¶

This page documents OmniVoice interface capabilities and their implementation status in go-elevenlabs.

Overview¶

OmniVoice provides vendor-agnostic interfaces for voice AI services. The go-elevenlabs SDK implements these interfaces, allowing ElevenLabs to be used as a drop-in provider.

TTS (Text-to-Speech) Provider¶

Interface: `tts.Provider`¶

Method	Description	Implemented	Conformance Test
`Name()`	Returns provider name	:white_check_mark:	:white_check_mark:
`Synthesize()`	Convert text to audio (batch)	:white_check_mark:	:white_check_mark:
`SynthesizeStream()`	Convert text to streaming audio	:white_check_mark:	:white_check_mark:
`ListVoices()`	List available voices	:white_check_mark:	:white_check_mark:
`GetVoice()`	Get voice by ID	:white_check_mark:	:white_check_mark:

Interface: `tts.StreamingProvider`¶

Method	Description	Implemented	Conformance Test
`SynthesizeFromReader()`	Stream text input to audio output	:white_check_mark:	:white_check_mark:

Synthesis Configuration¶

Config Field	Description	Supported
`VoiceID`	Voice identifier	:white_check_mark:
`Model`	TTS model (e.g., `eleven_turbo_v2_5`)	:white_check_mark:
`OutputFormat`	Audio format (`mp3`, `pcm`, `wav`, `opus`)	:white_check_mark:
`SampleRate`	Audio sample rate	:white_check_mark:
`Speed`	Speech speed multiplier	:white_check_mark:
`Pitch`	Voice pitch adjustment	:x: Not supported by ElevenLabs
`Stability`	Voice stability (0.0-1.0)	:white_check_mark:
`SimilarityBoost`	Voice similarity boost (0.0-1.0)	:white_check_mark:

STT (Speech-to-Text) Provider¶

Interface: `stt.Provider`¶

Method	Description	Implemented	Conformance Test
`Name()`	Returns provider name	:white_check_mark:	:white_check_mark:
`Transcribe()`	Transcribe audio bytes	:white_check_mark:	:white_check_mark:
`TranscribeFile()`	Transcribe from file path	:white_check_mark:	-
`TranscribeURL()`	Transcribe from URL	:white_check_mark:	-

Interface: `stt.StreamingProvider`¶

Method	Description	Implemented	Conformance Test
`TranscribeStream()`	Real-time streaming transcription	:white_check_mark:	:white_check_mark:

Transcription Configuration¶

Config Field	Description	Supported
`Language`	BCP-47 language code	:white_check_mark:
`Model`	STT model (`scribe_v2_realtime`)	:white_check_mark:
`SampleRate`	Audio sample rate	:white_check_mark: (via `AudioFormat`)
`Channels`	Audio channels	:white_check_mark: (mono only)
`Encoding`	Audio encoding (`pcm`, `mulaw`)	:white_check_mark: (via `AudioFormat`)
`EnablePunctuation`	Add punctuation	:white_check_mark: (always enabled)
`EnableWordTimestamps`	Word-level timing	:white_check_mark:
`EnableSpeakerDiarization`	Speaker identification	:white_check_mark: (batch API only)
`MaxSpeakers`	Maximum speakers to detect	:white_check_mark: (batch API only)
`Keywords`	Recognition hints	:x: Not supported
`VocabularyID`	Custom vocabulary	:x: Not supported

WebSocket STT Audio Formats¶

The WebSocket STT API supports these audio formats:

Format	Sample Rate	Use Case
`pcm_8000`	8 kHz	Telephony
`pcm_16000`	16 kHz	Standard (default)
`pcm_22050`	22.05 kHz	Higher quality
`pcm_24000`	24 kHz	High quality
`pcm_44100`	44.1 kHz	CD quality
`pcm_48000`	48 kHz	Professional
`ulaw_8000`	8 kHz	Twilio/telephony

Agent Provider¶

Interface: `agent.Provider`¶

Method	Description	Implemented	Conformance Test
`Name()`	Returns provider name	:white_check_mark:	-
`CreateSession()`	Create voice session	:white_check_mark:	-
`GetSession()`	Get session by ID	:white_check_mark:	-
`ListSessions()`	List active sessions	:white_check_mark:	-

Interface: `agent.Session`¶

Method	Description	Implemented	Conformance Test
`ID()`	Session identifier	:white_check_mark:	-
`Start()`	Begin voice session	:white_check_mark:	-
`Stop()`	End voice session	:white_check_mark:	-
`SendAudio()`	Send audio to agent	:white_check_mark:	-
`ReceiveAudio()`	Receive agent audio	:white_check_mark:	-
`SendText()`	Send text (bypass STT)	:white_check_mark:	-
`Events()`	Session event channel	:white_check_mark:	-
`Transcript()`	Conversation history	:white_check_mark:	-
`Metrics()`	Performance metrics	:white_check_mark:	-

Agent Configuration¶

Config Field	Description	Supported
`Name`	Agent name	:white_check_mark:
`SystemPrompt`	LLM system prompt	:x: (no LLM integration)
`VoiceID`	TTS voice	:white_check_mark:
`Language`	Primary language	:white_check_mark:
`STTProvider`	STT provider name	:x: (uses ElevenLabs)
`TTSProvider`	TTS provider name	:x: (uses ElevenLabs)
`LLMProvider`	LLM provider name	:x: (no LLM integration)
`InterruptionMode`	How to handle interruptions	:x: (not implemented)
`Tools`	Function calling	:x: (no LLM integration)

Agent Provider Limitations

The ElevenLabs agent provider combines WebSocket TTS and STT for bidirectional audio, but does not include LLM integration. You must handle:

Processing user transcripts
Generating agent responses
Calling SpeakText() to vocalize responses

For full conversational AI, integrate with an LLM provider separately.

Conformance Test Status¶

Running Tests¶

# Run all conformance tests (requires API key)
export ELEVENLABS_API_KEY="your-api-key"
go test -v ./omnivoice/...

Test Categories¶

Category	Description	TTS	STT	Agent
Interface	Basic interface compliance	:white_check_mark:	:white_check_mark:	-
Behavior	Edge cases (empty input, cancellation)	:white_check_mark:	:white_check_mark:	-
Integration	Real API calls	:white_check_mark:	:white_check_mark:	-

Test Results Summary¶

Provider	Interface	Behavior	Integration	Overall
TTS	:white_check_mark: Pass	:white_check_mark: Pass	:white_check_mark: Pass	Pass
STT	:white_check_mark: Pass	:white_check_mark: Pass	:white_check_mark: Pass	Pass
Agent	-	-	-	Not tested

Feature Comparison¶

vs Direct SDK Usage¶

Feature	OmniVoice Provider	Direct SDK
Vendor portability	:white_check_mark:	:x:
Consistent API	:white_check_mark:	:x:
Voice cloning	:x:	:white_check_mark:
Pronunciation dictionaries	:x:	:white_check_mark:
Projects (Studio)	:x:	:white_check_mark:
Audio isolation	:x:	:white_check_mark:
Sound effects	:x:	:white_check_mark:
Music generation	:x:	:white_check_mark:
Full API parameters	:x:	:white_check_mark:

When to Use OmniVoice

Use OmniVoice providers when you need vendor portability or a consistent API across providers. Use the SDK directly when you need ElevenLabs-specific features.

Version Compatibility¶

go-elevenlabs	OmniVoice	Notes
v0.7.0+	v0.2.0+	WebSocket STT uses scribe_v2_realtime
v0.5.0-v0.6.x	v0.1.0+	WebSocket STT uses deprecated scribe_v1