OmniVoice Capabilities
This page documents OmniVoice interface capabilities and their implementation status in go-elevenlabs.
Overview
OmniVoice provides vendor-agnostic interfaces for voice AI services. The go-elevenlabs SDK implements these interfaces, allowing ElevenLabs to be used as a drop-in provider.
TTS (Text-to-Speech) Provider
Interface: tts.Provider
| Method |
Description |
Implemented |
Conformance Test |
Name() |
Returns provider name |
:white_check_mark: |
:white_check_mark: |
Synthesize() |
Convert text to audio (batch) |
:white_check_mark: |
:white_check_mark: |
SynthesizeStream() |
Convert text to streaming audio |
:white_check_mark: |
:white_check_mark: |
ListVoices() |
List available voices |
:white_check_mark: |
:white_check_mark: |
GetVoice() |
Get voice by ID |
:white_check_mark: |
:white_check_mark: |
Interface: tts.StreamingProvider
| Method |
Description |
Implemented |
Conformance Test |
SynthesizeFromReader() |
Stream text input to audio output |
:white_check_mark: |
:white_check_mark: |
Synthesis Configuration
| Config Field |
Description |
Supported |
VoiceID |
Voice identifier |
:white_check_mark: |
Model |
TTS model (e.g., eleven_turbo_v2_5) |
:white_check_mark: |
OutputFormat |
Audio format (mp3, pcm, wav, opus) |
:white_check_mark: |
SampleRate |
Audio sample rate |
:white_check_mark: |
Speed |
Speech speed multiplier |
:white_check_mark: |
Pitch |
Voice pitch adjustment |
:x: Not supported by ElevenLabs |
Stability |
Voice stability (0.0-1.0) |
:white_check_mark: |
SimilarityBoost |
Voice similarity boost (0.0-1.0) |
:white_check_mark: |
STT (Speech-to-Text) Provider
Interface: stt.Provider
| Method |
Description |
Implemented |
Conformance Test |
Name() |
Returns provider name |
:white_check_mark: |
:white_check_mark: |
Transcribe() |
Transcribe audio bytes |
:white_check_mark: |
:white_check_mark: |
TranscribeFile() |
Transcribe from file path |
:white_check_mark: |
- |
TranscribeURL() |
Transcribe from URL |
:white_check_mark: |
- |
Interface: stt.StreamingProvider
| Method |
Description |
Implemented |
Conformance Test |
TranscribeStream() |
Real-time streaming transcription |
:white_check_mark: |
:white_check_mark: |
Transcription Configuration
| Config Field |
Description |
Supported |
Language |
BCP-47 language code |
:white_check_mark: |
Model |
STT model (scribe_v2_realtime) |
:white_check_mark: |
SampleRate |
Audio sample rate |
:white_check_mark: (via AudioFormat) |
Channels |
Audio channels |
:white_check_mark: (mono only) |
Encoding |
Audio encoding (pcm, mulaw) |
:white_check_mark: (via AudioFormat) |
EnablePunctuation |
Add punctuation |
:white_check_mark: (always enabled) |
EnableWordTimestamps |
Word-level timing |
:white_check_mark: |
EnableSpeakerDiarization |
Speaker identification |
:white_check_mark: (batch API only) |
MaxSpeakers |
Maximum speakers to detect |
:white_check_mark: (batch API only) |
Keywords |
Recognition hints |
:x: Not supported |
VocabularyID |
Custom vocabulary |
:x: Not supported |
The WebSocket STT API supports these audio formats:
| Format |
Sample Rate |
Use Case |
pcm_8000 |
8 kHz |
Telephony |
pcm_16000 |
16 kHz |
Standard (default) |
pcm_22050 |
22.05 kHz |
Higher quality |
pcm_24000 |
24 kHz |
High quality |
pcm_44100 |
44.1 kHz |
CD quality |
pcm_48000 |
48 kHz |
Professional |
ulaw_8000 |
8 kHz |
Twilio/telephony |
Agent Provider
Interface: agent.Provider
| Method |
Description |
Implemented |
Conformance Test |
Name() |
Returns provider name |
:white_check_mark: |
- |
CreateSession() |
Create voice session |
:white_check_mark: |
- |
GetSession() |
Get session by ID |
:white_check_mark: |
- |
ListSessions() |
List active sessions |
:white_check_mark: |
- |
Interface: agent.Session
| Method |
Description |
Implemented |
Conformance Test |
ID() |
Session identifier |
:white_check_mark: |
- |
Start() |
Begin voice session |
:white_check_mark: |
- |
Stop() |
End voice session |
:white_check_mark: |
- |
SendAudio() |
Send audio to agent |
:white_check_mark: |
- |
ReceiveAudio() |
Receive agent audio |
:white_check_mark: |
- |
SendText() |
Send text (bypass STT) |
:white_check_mark: |
- |
Events() |
Session event channel |
:white_check_mark: |
- |
Transcript() |
Conversation history |
:white_check_mark: |
- |
Metrics() |
Performance metrics |
:white_check_mark: |
- |
Agent Configuration
| Config Field |
Description |
Supported |
Name |
Agent name |
:white_check_mark: |
SystemPrompt |
LLM system prompt |
:x: (no LLM integration) |
VoiceID |
TTS voice |
:white_check_mark: |
Language |
Primary language |
:white_check_mark: |
STTProvider |
STT provider name |
:x: (uses ElevenLabs) |
TTSProvider |
TTS provider name |
:x: (uses ElevenLabs) |
LLMProvider |
LLM provider name |
:x: (no LLM integration) |
InterruptionMode |
How to handle interruptions |
:x: (not implemented) |
Tools |
Function calling |
:x: (no LLM integration) |
Agent Provider Limitations
The ElevenLabs agent provider combines WebSocket TTS and STT for bidirectional audio, but does not include LLM integration. You must handle:
- Processing user transcripts
- Generating agent responses
- Calling
SpeakText() to vocalize responses
For full conversational AI, integrate with an LLM provider separately.
Running Tests
# Run all conformance tests (requires API key)
export ELEVENLABS_API_KEY="your-api-key"
go test -v ./omnivoice/...
Test Categories
| Category |
Description |
TTS |
STT |
Agent |
| Interface |
Basic interface compliance |
:white_check_mark: |
:white_check_mark: |
- |
| Behavior |
Edge cases (empty input, cancellation) |
:white_check_mark: |
:white_check_mark: |
- |
| Integration |
Real API calls |
:white_check_mark: |
:white_check_mark: |
- |
Test Results Summary
| Provider |
Interface |
Behavior |
Integration |
Overall |
| TTS |
:white_check_mark: Pass |
:white_check_mark: Pass |
:white_check_mark: Pass |
Pass |
| STT |
:white_check_mark: Pass |
:white_check_mark: Pass |
:white_check_mark: Pass |
Pass |
| Agent |
- |
- |
- |
Not tested |
Feature Comparison
vs Direct SDK Usage
| Feature |
OmniVoice Provider |
Direct SDK |
| Vendor portability |
:white_check_mark: |
:x: |
| Consistent API |
:white_check_mark: |
:x: |
| Voice cloning |
:x: |
:white_check_mark: |
| Pronunciation dictionaries |
:x: |
:white_check_mark: |
| Projects (Studio) |
:x: |
:white_check_mark: |
| Audio isolation |
:x: |
:white_check_mark: |
| Sound effects |
:x: |
:white_check_mark: |
| Music generation |
:x: |
:white_check_mark: |
| Full API parameters |
:x: |
:white_check_mark: |
When to Use OmniVoice
Use OmniVoice providers when you need vendor portability or a consistent API across providers. Use the SDK directly when you need ElevenLabs-specific features.
Version Compatibility
| go-elevenlabs |
OmniVoice |
Notes |
| v0.7.0+ |
v0.2.0+ |
WebSocket STT uses scribe_v2_realtime |
| v0.5.0-v0.6.x |
v0.1.0+ |
WebSocket STT uses deprecated scribe_v1 |
See Also