TTS Script Authoring¶

A guide to authoring multilingual TTS scripts using the ttsscript package.

Why Use ttsscript?¶

Instead of storing raw SSML (which is engine-specific and hard to edit), author your scripts in a structured JSON format that:

Supports multiple languages in a single file
Handles pronunciations separately from content
Can be compiled to any TTS engine format
Is easy to edit and version control

Quick Start¶

1. Create a Script JSON File¶

{
  "title": "My Course",
  "default_voices": {
    "en": "21m00Tcm4TlvDq8ikWAM",
    "es": "EXAVITQu4vr4xnSDxMaL"
  },
  "pronunciations": {
    "API": {"en": "A P I", "es": "A P I"},
    "SDK": {"en": "S D K", "es": "S D K"}
  },
  "slides": [
    {
      "title": "Introduction",
      "segments": [
        {
          "text": {
            "en": "Welcome to the API course.",
            "es": "Bienvenidos al curso de API."
          },
          "pause_after": "500ms"
        }
      ]
    }
  ]
}

2. Load and Compile¶

import "github.com/agentplexus/go-elevenlabs/ttsscript"

// Load script
script, err := ttsscript.LoadScript("script.json")
if err != nil {
    log.Fatal(err)
}

// Compile for English
compiler := ttsscript.NewCompiler()
segments, err := compiler.Compile(script, "en")

3. Generate Audio¶

import elevenlabs "github.com/agentplexus/go-elevenlabs"

client, _ := elevenlabs.NewClient()
formatter := ttsscript.NewElevenLabsFormatter()
jobs := formatter.Format(segments)

for _, job := range jobs {
    audio, _ := client.TextToSpeech().Simple(ctx, job.VoiceID, job.Text)
    // Save audio file...
}

Script Structure¶

Top-Level Fields¶

Field	Type	Description
`title`	string	Script title
`description`	string	Optional description
`default_language`	string	Primary language code
`default_voices`	map	Voice IDs by language
`pronunciations`	map	Global pronunciation rules
`slides`	array	Ordered list of slides

Slide Fields¶

Field	Type	Description
`title`	string	Slide title (for reference)
`notes`	string	Speaker notes (not rendered)
`segments`	array	Audio segments

Segment Fields¶

Field	Type	Description
`text`	map	Text by language code
`voice`	map	Voice override by language
`pause_before`	string	Pause before (e.g., "500ms")
`pause_after`	string	Pause after (e.g., "1s")
`emphasis`	string	"strong", "moderate", "reduced"
`rate`	string	"slow", "medium", "fast", or "80%"
`pitch`	string	"low", "medium", "high", or "+10%"
`pronunciations`	map	Segment-specific pronunciations

Pronunciations¶

Pronunciations are applied automatically during compilation:

{
  "pronunciations": {
    "API": {"en": "A P I", "es": "A P I"},
    "kubectl": {"en": "kube control"},
    "nginx": {"en": "engine X"}
  }
}

Priority Order¶

Compiler-level - Added via compiler.AddPronunciation()
Segment-level - In segment.pronunciations
Script-level - In script.pronunciations

Higher priority overrides lower.

Add Pronunciations at Runtime¶

compiler := ttsscript.NewCompiler()
compiler.AddPronunciation("goroutine", "en", "go routine")
compiler.AddPronunciations("en", map[string]string{
    "API": "A P I",
    "SDK": "S D K",
})

Output Formats¶

ElevenLabs¶

formatter := ttsscript.NewElevenLabsFormatter()
jobs := formatter.Format(segments)

for _, job := range jobs {
    fmt.Printf("Voice: %s\n", job.VoiceID)
    fmt.Printf("Text: %s\n", job.Text)
    fmt.Printf("Pause after: %dms\n", job.PauseAfterMs)
}

SSML (Google, Amazon, Azure)¶

formatter := ttsscript.NewSSMLFormatter()
ssml, err := formatter.FormatScript(script, "en")
// Use with Google Cloud TTS, Amazon Polly, or Azure TTS

Example SSML output:

<?xml version="1.0" encoding="UTF-8"?>
<speak version="1.1" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en">
  <!-- Slide 1: Introduction -->
  Welcome to the A P I course.
  <break time="500ms"/>
</speak>

Batch Processing¶

Generate Manifest¶

config := ttsscript.NewBatchConfig("./output")
config.IncludeLanguageInFilename = true

manifest := ttsscript.GenerateManifest(jobs, config, "en")
// Returns []ManifestEntry with output filenames

Group by Voice¶

groups := formatter.GroupByVoice(jobs)
for voiceID, voiceJobs := range groups {
    // Process all jobs for this voice together
}

Multilingual Workflow¶

1. Author Once¶

{
  "slides": [{
    "segments": [{
      "text": {
        "en": "Hello world",
        "es": "Hola mundo",
        "fr": "Bonjour le monde"
      }
    }]
  }]
}

2. Compile for Each Language¶

languages := script.Languages() // ["en", "es", "fr"]

for _, lang := range languages {
    segments, _ := compiler.Compile(script, lang)
    jobs := formatter.Format(segments)

    // Generate audio for this language
    for _, job := range jobs {
        audio, _ := client.TextToSpeech().Simple(ctx, job.VoiceID, job.Text)
        // Save with language suffix
    }
}

Best Practices¶

Version control your scripts - JSON is easy to diff and merge
Separate pronunciations - Keep them in the script, not embedded in text
Use meaningful slide titles - They appear in comments and manifests
Test with one language first - Verify before generating all languages
Use consistent pause durations - Create a style guide for your project

Integration with Marp¶

For presentations, you can embed TTS annotations in Marp comments:

---
marp: true
---

<!--
tts: Welcome to the presentation.
pause: 500ms
-->

# Slide Title

Content here...

Then parse and convert to ttsscript format for audio generation.