Skip to content

OmniVoice

Changelog

agentplexus/omnivoice

Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, this project adheres to Semantic Versioning, commits follow Conventional Commits, and this changelog is generated by Structured Changelog.

Unreleased ¶

v0.4.3 - 2026-02-15¶

Highlights¶

Comprehensive tests for English and Chinese subtitle generation

Tests¶

TestWordsToSubtitleCues_EnglishWordGrouping for word-based cue grouping (0ddb8bc)
TestWordsToSubtitleCues_ChineseCharacters for character-by-character tokenization (0ddb8bc)
TestWordsToSubtitleCues_MixedChineseEnglish for mixed language content (0ddb8bc)
TestWordsToSubtitleCues_LongChineseText for multi-cue splitting (0ddb8bc)

v0.4.2 - 2026-02-15¶

Highlights¶

Fixed subtitle word cutoff at line boundaries

Fixed¶

Subtitle cue chunking now checks actual wrapped line count instead of total character count, preventing words from being cut off when they would appear on a third line (a301897)

Tests¶

TestWordsToSubtitleCues_LineCountLimit verifies cues split correctly at line boundaries (a301897)

v0.4.1 - 2026-02-14¶

Highlights¶

STT conformance tests for TranscribeFile and TranscribeURL batch transcription methods

Tests¶

TranscribeFile conformance test for local file transcription (c441944)
TranscribeURL conformance test for remote URL transcription (c441944)

v0.4.0 - 2026-02-14¶

Highlights¶

Subtitle generation from STT transcription results
Extensible config maps for provider-specific settings

Added¶

Subtitle package for SRT/VTT generation from transcription results (17730a7)
Configurable max characters per line and lines per cue for subtitles (17730a7)
Word-level timestamp-based cue splitting (17730a7)
Extensions map in TranscriptionConfig for provider-specific STT settings (84c37f5)
Extensions map in SynthesisConfig for provider-specific TTS settings (665c3be)

Fixed¶

Subtitle wrapText no longer clips words when text exceeds line limit (63144bb)

Documentation¶

Voice cloning guide with recording tips and phonetically balanced text (1f0cdd8)

Tests¶

Call system provider conformance tests (MakeCall, ListCalls, OnIncomingCall) (9683ca2)
Transport provider conformance tests (Listen, Connect, Protocol) (9683ca2)

v0.3.0 - 2026-01-24¶

Highlights¶

Provider conformance test suites for TTS and STT implementations

Added¶

TTS provider conformance test suite (Synthesize, SynthesizeStream, SynthesizeFromReader) (e3705c7)
Mock TTS provider for self-testing with configurable audio format responses (e3705c7)
STT provider conformance test suite (Transcribe, TranscribeStream) (69cfd20)
Mock STT provider with streaming transcription simulation (69cfd20)

Fixed¶

MCP session and tool handlers now log Close() errors instead of discarding (6099072)

Documentation¶

Provider conformance testing TRD describing test categories and API design (58a9697)

Build¶

MIT LICENSE file (f124dcf)

v0.2.0 - 2026-01-18¶

Highlights¶

Audio codec package with PCM, mu-law, and a-law support for telephony
MCP server enabling Claude Code to make voice calls
Pipeline components connecting STT, TTS, and transport providers

Added¶

Audio codec package with PCM sample conversions (int16, float32, float64, bytes) (f64fe1e)
Mu-law encoding/decoding for Twilio Media Streams (f64fe1e)
A-law encoding/decoding for international telephony (f64fe1e)
Audio resampling, normalization, and analysis utilities (f64fe1e)
MCP server with stdio transport for voice interactions (721cbac)
Voice interaction tools: initiate_call, continue_call, speak_to_user, end_call (721cbac)
Session management for tracking active voice calls (721cbac)
TTSPipeline for streaming TTS output to transport connections (11c906d)
StreamingTTSPipeline for connecting streaming LLM text to TTS to transport (11c906d)
STTPipeline for streaming audio from transport to STT with event callbacks (11c906d)

Documentation¶

Voice integration PRD outlining goals, user stories, and success metrics (fd86611)
Twilio integration TRD detailing Media Streams architecture (fd86611)

Tests¶

Comprehensive unit tests for audio codec functions (mu-law, a-law, PCM) (f64fe1e)

v0.1.0 - 2025-12-28¶

Highlights¶

Initial OmniVoice voice abstraction layer for multi-provider telephony

Added¶

Voice abstraction layer with provider-agnostic interfaces (8a54bc2)
STT (speech-to-text) provider interface with streaming support (8a54bc2)
TTS (text-to-speech) provider interface with streaming support (8a54bc2)
Transport interface for audio connections (Twilio, Zoom, etc.) (8a54bc2)
Export CallOptions for provider implementations (7e1b52d)

Documentation¶

README with project overview and shields (4f298df)
Marp presentation for OmniVoice (d2d67cf)

Build¶

GitHub Actions CI workflow (4bad35d)
golangci-lint configuration and fixes (3693297)