Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.
Unreleased¶
v0.4.3 - 2026-02-15¶
Highlights¶
- Comprehensive tests for English and Chinese subtitle generation
Tests¶
TestWordsToSubtitleCues_EnglishWordGroupingfor word-based cue grouping (0ddb8bc)TestWordsToSubtitleCues_ChineseCharactersfor character-by-character tokenization (0ddb8bc)TestWordsToSubtitleCues_MixedChineseEnglishfor mixed language content (0ddb8bc)TestWordsToSubtitleCues_LongChineseTextfor multi-cue splitting (0ddb8bc)
v0.4.2 - 2026-02-15¶
Highlights¶
- Fixed subtitle word cutoff at line boundaries
Fixed¶
- Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (
a301897)
Tests¶
TestWordsToSubtitleCues_LineCountLimitverifies cues split correctly at line boundaries (a301897)
v0.4.1 - 2026-02-14¶
Highlights¶
- STT conformance tests for
TranscribeFileandTranscribeURLbatch transcription methods
Tests¶
TranscribeFileconformance test for local file transcription (c441944)TranscribeURLconformance test for remote URL transcription (c441944)
v0.4.0 - 2026-02-14¶
Highlights¶
- Subtitle generation from STT transcription results
- Extensible config maps for provider-specific settings
Added¶
- Subtitle package for SRT/VTT generation from transcription results (
17730a7) - Configurable max characters per line and lines per cue for subtitles (
17730a7) - Word-level timestamp-based cue splitting (
17730a7) Extensionsmap inTranscriptionConfigfor provider-specific STT settings (84c37f5)Extensionsmap inSynthesisConfigfor provider-specific TTS settings (665c3be)
Fixed¶
- Subtitle
wrapTextno longer clips words when text exceeds line limit (63144bb)
Documentation¶
- Voice cloning guide with recording tips and phonetically balanced text (
1f0cdd8)
Tests¶
- Call system provider conformance tests (
MakeCall,ListCalls,OnIncomingCall) (9683ca2) - Transport provider conformance tests (
Listen,Connect,Protocol) (9683ca2)
v0.3.0 - 2026-01-24¶
Highlights¶
- Provider conformance test suites for TTS and STT implementations
Added¶
- TTS provider conformance test suite (
Synthesize,SynthesizeStream,SynthesizeFromReader) (e3705c7) - Mock TTS provider for self-testing with configurable audio format responses (
e3705c7) - STT provider conformance test suite (
Transcribe,TranscribeStream) (69cfd20) - Mock STT provider with streaming transcription simulation (
69cfd20)
Fixed¶
- MCP session and tool handlers now log
Close()errors instead of discarding (6099072)
Documentation¶
- Provider conformance testing TRD describing test categories and API design (
58a9697)
Build¶
- MIT LICENSE file (
f124dcf)
v0.2.0 - 2026-01-18¶
Highlights¶
- Audio codec package with PCM, mu-law, and a-law support for telephony
- MCP server enabling Claude Code to make voice calls
- Pipeline components connecting STT, TTS, and transport providers
Added¶
- Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (
f64fe1e) - Mu-law encoding/decoding for Twilio Media Streams (
f64fe1e) - A-law encoding/decoding for international telephony (
f64fe1e) - Audio resampling, normalization, and analysis utilities (
f64fe1e) - MCP server with stdio transport for voice interactions (
721cbac) - Voice interaction tools:
initiate_call,continue_call,speak_to_user,end_call(721cbac) - Session management for tracking active voice calls (
721cbac) TTSPipelinefor streaming TTS output to transport connections (11c906d)StreamingTTSPipelinefor connecting streaming LLM text to TTS to transport (11c906d)STTPipelinefor streaming audio from transport to STT with event callbacks (11c906d)
Documentation¶
- Voice integration PRD outlining goals, user stories, and success metrics (
fd86611) - Twilio integration TRD detailing Media Streams architecture (
fd86611)
Tests¶
- Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (
f64fe1e)
v0.1.0 - 2025-12-28¶
Highlights¶
- Initial OmniVoice voice abstraction layer for multi-provider telephony
Added¶
- Voice abstraction layer with provider-agnostic interfaces (
8a54bc2) - STT (speech-to-text) provider interface with streaming support (
8a54bc2) - TTS (text-to-speech) provider interface with streaming support (
8a54bc2) - Transport interface for audio connections (Twilio, Zoom, etc.) (
8a54bc2) - Export
CallOptionsfor provider implementations (7e1b52d)