Skip to content

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.

Unreleased

v0.4.3 - 2026-02-15

Highlights

  • Comprehensive tests for English and Chinese subtitle generation

Tests

  • TestWordsToSubtitleCues_EnglishWordGrouping for word-based cue grouping (0ddb8bc)
  • TestWordsToSubtitleCues_ChineseCharacters for character-by-character tokenization (0ddb8bc)
  • TestWordsToSubtitleCues_MixedChineseEnglish for mixed language content (0ddb8bc)
  • TestWordsToSubtitleCues_LongChineseText for multi-cue splitting (0ddb8bc)

v0.4.2 - 2026-02-15

Highlights

  • Fixed subtitle word cutoff at line boundaries

Fixed

  • Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (a301897)

Tests

  • TestWordsToSubtitleCues_LineCountLimit verifies cues split correctly at line boundaries (a301897)

v0.4.1 - 2026-02-14

Highlights

  • STT conformance tests for TranscribeFile and TranscribeURL batch transcription methods

Tests

  • TranscribeFile conformance test for local file transcription (c441944)
  • TranscribeURL conformance test for remote URL transcription (c441944)

v0.4.0 - 2026-02-14

Highlights

  • Subtitle generation from STT transcription results
  • Extensible config maps for provider-specific settings

Added

  • Subtitle package for SRT/VTT generation from transcription results (17730a7)
  • Configurable max characters per line and lines per cue for subtitles (17730a7)
  • Word-level timestamp-based cue splitting (17730a7)
  • Extensions map in TranscriptionConfig for provider-specific STT settings (84c37f5)
  • Extensions map in SynthesisConfig for provider-specific TTS settings (665c3be)

Fixed

  • Subtitle wrapText no longer clips words when text exceeds line limit (63144bb)

Documentation

  • Voice cloning guide with recording tips and phonetically balanced text (1f0cdd8)

Tests

  • Call system provider conformance tests (MakeCall, ListCalls, OnIncomingCall) (9683ca2)
  • Transport provider conformance tests (Listen, Connect, Protocol) (9683ca2)

v0.3.0 - 2026-01-24

Highlights

  • Provider conformance test suites for TTS and STT implementations

Added

  • TTS provider conformance test suite (Synthesize, SynthesizeStream, SynthesizeFromReader) (e3705c7)
  • Mock TTS provider for self-testing with configurable audio format responses (e3705c7)
  • STT provider conformance test suite (Transcribe, TranscribeStream) (69cfd20)
  • Mock STT provider with streaming transcription simulation (69cfd20)

Fixed

  • MCP session and tool handlers now log Close() errors instead of discarding (6099072)

Documentation

  • Provider conformance testing TRD describing test categories and API design (58a9697)

Build

v0.2.0 - 2026-01-18

Highlights

  • Audio codec package with PCM, mu-law, and a-law support for telephony
  • MCP server enabling Claude Code to make voice calls
  • Pipeline components connecting STT, TTS, and transport providers

Added

  • Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (f64fe1e)
  • Mu-law encoding/decoding for Twilio Media Streams (f64fe1e)
  • A-law encoding/decoding for international telephony (f64fe1e)
  • Audio resampling, normalization, and analysis utilities (f64fe1e)
  • MCP server with stdio transport for voice interactions (721cbac)
  • Voice interaction tools: initiate_call, continue_call, speak_to_user, end_call (721cbac)
  • Session management for tracking active voice calls (721cbac)
  • TTSPipeline for streaming TTS output to transport connections (11c906d)
  • StreamingTTSPipeline for connecting streaming LLM text to TTS to transport (11c906d)
  • STTPipeline for streaming audio from transport to STT with event callbacks (11c906d)

Documentation

  • Voice integration PRD outlining goals, user stories, and success metrics (fd86611)
  • Twilio integration TRD detailing Media Streams architecture (fd86611)

Tests

  • Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (f64fe1e)

v0.1.0 - 2025-12-28

Highlights

  • Initial OmniVoice voice abstraction layer for multi-provider telephony

Added

  • Voice abstraction layer with provider-agnostic interfaces (8a54bc2)
  • STT (speech-to-text) provider interface with streaming support (8a54bc2)
  • TTS (text-to-speech) provider interface with streaming support (8a54bc2)
  • Transport interface for audio connections (Twilio, Zoom, etc.) (8a54bc2)
  • Export CallOptions for provider implementations (7e1b52d)

Documentation

  • README with project overview and shields (4f298df)
  • Marp presentation for OmniVoice (d2d67cf)

Build

  • GitHub Actions CI workflow (4bad35d)
  • golangci-lint configuration and fixes (3693297)