Speech-to-Text¶
Transcribe audio files with optional speaker diarization.
Basic Usage¶
// Transcribe from URL
result, err := client.SpeechToText().TranscribeURL(ctx, "https://example.com/audio.mp3")
if err != nil {
log.Fatal(err)
}
fmt.Printf("Text: %s\n", result.Text)
fmt.Printf("Language: %s\n", result.LanguageCode)
Transcribe with File Upload¶
file, _ := os.Open("audio.mp3")
defer file.Close()
result, err := client.SpeechToText().Transcribe(ctx, &elevenlabs.TranscriptionRequest{
File: file,
Filename: "audio.mp3",
ModelID: "scribe_v1",
})
Speaker Diarization¶
Identify different speakers in the audio:
result, err := client.SpeechToText().TranscribeWithDiarization(ctx, audioURL)
if err != nil {
log.Fatal(err)
}
for _, word := range result.Words {
fmt.Printf("[%s] %s (%.2fs - %.2fs)\n",
word.Speaker, word.Text, word.Start, word.End)
}
Full Options¶
result, err := client.SpeechToText().Transcribe(ctx, &elevenlabs.TranscriptionRequest{
File: file,
Filename: "interview.mp3",
ModelID: "scribe_v1",
LanguageCode: "en", // ISO 639-1 code
Diarize: true, // Enable speaker detection
TagAudioEvents: true, // Tag laughter, music, etc.
NumSpeakers: 2, // Expected number of speakers
})
Request Options¶
| Option | Type | Description |
|---|---|---|
File |
io.Reader | Audio file to transcribe |
Filename |
string | Name of the audio file |
AudioURL |
string | URL to audio (alternative to file) |
ModelID |
string | Transcription model (default: scribe_v1) |
LanguageCode |
string | ISO 639-1 language code |
Diarize |
bool | Enable speaker diarization |
TagAudioEvents |
bool | Tag non-speech audio events |
NumSpeakers |
int | Expected number of speakers |
Response Structure¶
type TranscriptionResponse struct {
Text string // Full transcription text
LanguageCode string // Detected language
Words []TranscriptionWord // Word-level timestamps
}
type TranscriptionWord struct {
Text string // The word
Start float64 // Start time in seconds
End float64 // End time in seconds
Speaker string // Speaker ID (if diarization enabled)
}
Use Cases¶
Meeting Transcription¶
result, err := client.SpeechToText().Transcribe(ctx, &elevenlabs.TranscriptionRequest{
File: meetingFile,
Filename: "meeting.mp3",
Diarize: true,
NumSpeakers: 4,
})
// Group by speaker
speakers := make(map[string][]string)
for _, word := range result.Words {
speakers[word.Speaker] = append(speakers[word.Speaker], word.Text)
}
Subtitle Generation¶
result, err := client.SpeechToText().TranscribeURL(ctx, videoAudioURL)
// Generate SRT format
for i, word := range result.Words {
fmt.Printf("%d\n", i+1)
fmt.Printf("%s --> %s\n", formatTime(word.Start), formatTime(word.End))
fmt.Printf("%s\n\n", word.Text)
}
Podcast Processing¶
// Transcribe podcast episode
result, err := client.SpeechToText().Transcribe(ctx, &elevenlabs.TranscriptionRequest{
File: podcastFile,
Filename: "episode.mp3",
Diarize: true,
TagAudioEvents: true, // Detect music, laughter, etc.
})
Supported Audio Formats¶
- MP3
- WAV
- M4A
- FLAC
- OGG
- WEBM
Best Practices¶
- Use diarization for multi-speaker content - Interviews, meetings, podcasts
- Specify language when known for better accuracy
- Set expected speaker count for more accurate diarization
- Enable audio event tagging for richer metadata