Release Notes - OmniLLM v0.11.0¶

Release Date: 2026-01-10 Base Version: v0.10.0

Overview¶

Version 0.11.0 is a major feature release that adds four key reliability and cost optimization features: Fallback Providers, Circuit Breaker, Token Estimation, and Response Caching. This release also includes extended sampling parameters for fine-grained control over model outputs.

⚠️ Breaking Change: ClientConfig API refactored to use unified Providers []ProviderConfig slice. See Upgrade Guide for migration instructions.

Highlights:

Unified Provider Configuration: Cleaner API with Providers slice (index 0 = primary, 1+ = fallbacks)
Fallback Providers: Automatic failover to backup providers when primary fails
Circuit Breaker: Prevent cascading failures by temporarily skipping unhealthy providers
Token Estimation: Pre-flight validation to avoid context window limit errors
Response Caching: Reduce API costs by caching identical requests
Extended Sampling Parameters: TopK, Seed, N, ResponseFormat, Logprobs support

New Features¶

1. Fallback Providers¶

Automatic failover to backup providers when the primary provider fails with retryable errors (rate limits, server errors, network issues).

// Providers[0] is primary, Providers[1+] are fallbacks
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "openai-key"},       // Primary
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "anthropic-key"}, // Fallback 1
        {Provider: omnillm.ProviderNameGemini, APIKey: "gemini-key"},       // Fallback 2
    },
})

// If OpenAI fails, automatically tries Anthropic, then Gemini
response, err := client.CreateChatCompletion(ctx, request)

Key Features:

Intelligent error classification (only retries on retryable errors)
Auth errors (401/403) and invalid requests (400) do not trigger fallback
FallbackError type provides detailed attempt tracking
Works with both sync and streaming APIs

2. Circuit Breaker¶

Prevents cascading failures by temporarily skipping providers that are failing repeatedly.

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "openai-key"},
        {Provider: omnillm.ProviderNameAnthropic, APIKey: "anthropic-key"},
    },
    CircuitBreakerConfig: &omnillm.CircuitBreakerConfig{
        FailureThreshold: 5,               // Open after 5 consecutive failures
        SuccessThreshold: 2,               // Close after 2 successes in half-open
        Timeout:          30 * time.Second, // Wait before trying again
    },
})

Circuit States:

State	Description
Closed	Normal operation, requests flow through
Open	Provider is failing, requests skip immediately
Half-Open	Testing if provider has recovered

3. Token Estimation¶

Pre-flight token counting to validate requests before sending to the API.

// Standalone estimation
estimator := omnillm.NewTokenEstimator(omnillm.DefaultTokenEstimatorConfig())
tokens, _ := estimator.EstimateTokens("gpt-4o", messages)
window := estimator.GetContextWindow("gpt-4o") // 128000

// Automatic validation in client
client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    TokenEstimator: omnillm.NewTokenEstimator(omnillm.DefaultTokenEstimatorConfig()),
    ValidateTokens: true,
})

Built-in Context Windows:

40+ models supported (OpenAI, Anthropic, Gemini, X.AI, Ollama)
Custom context windows via CustomContextWindows map
Configurable characters-per-token ratio

4. Response Caching¶

Cache identical requests to reduce API costs with configurable TTL.

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    Cache: kvsClient, // Redis, DynamoDB, etc.
    CacheConfig: &omnillm.CacheConfig{
        TTL:       1 * time.Hour,
        KeyPrefix: "myapp:llm-cache",
    },
})

// Check if response was cached
if resp.ProviderMetadata["cache_hit"] == true {
    // Response came from cache
}

Cache Key Generation:

SHA-256 hash of model, messages, and parameters
Configurable inclusion of temperature and seed in cache key
Model allowlist for selective caching
Streaming requests skipped by default

5. Extended Sampling Parameters¶

New parameters for fine-grained control over model outputs:

Parameter	Type	Providers	Description
`TopK`	`*int`	Anthropic, Gemini, Ollama	Top K token selection
`Seed`	`*int`	OpenAI, X.AI, Ollama	Reproducible outputs
`N`	`*int`	OpenAI	Number of completions
`ResponseFormat`	`*ResponseFormat`	OpenAI, Gemini	JSON mode
`Logprobs`	`*bool`	OpenAI	Return log probabilities
`TopLogprobs`	`*int`	OpenAI	Top logprobs count

New Types¶

Error Classification¶

type ErrorCategory int

const (
    ErrorCategoryUnknown ErrorCategory = iota
    ErrorCategoryRetryable    // Rate limits, server errors, network errors
    ErrorCategoryNonRetryable // Auth errors, invalid requests
)

func ClassifyError(err error) ErrorCategory
func IsRetryableError(err error) bool
func IsNonRetryableError(err error) bool

Token Types¶

type TokenEstimator interface {
    EstimateTokens(model string, messages []provider.Message) (int, error)
    GetContextWindow(model string) int
}

type TokenLimitError struct {
    EstimatedTokens int
    ContextWindow   int
    AvailableTokens int
    Model           string
}

Fallback Types¶

type FallbackError struct {
    Attempts  []FallbackAttempt
    LastError error
}

type FallbackAttempt struct {
    Provider string
    Error    error
    Duration time.Duration
}

Updated ClientConfig¶

type ClientConfig struct {
    // Provider configuration (BREAKING CHANGE in v0.11.0)
    Providers []ProviderConfig  // Index 0 = primary, 1+ = fallbacks

    // Circuit Breaker (optional)
    CircuitBreakerConfig *CircuitBreakerConfig

    // Memory
    Memory       kvs.Client
    MemoryConfig *MemoryConfig

    // Observability
    ObservabilityHook ObservabilityHook
    Logger            *slog.Logger

    // Token Estimation
    TokenEstimator TokenEstimator
    ValidateTokens bool

    // Response Caching
    Cache       kvs.Client
    CacheConfig *CacheConfig
}

type ProviderConfig struct {
    Provider       ProviderName
    APIKey         string
    BaseURL        string
    Region         string
    Timeout        time.Duration
    HTTPClient     *http.Client
    Extra          map[string]any
    CustomProvider provider.Provider  // For 3rd party providers
}

Upgrade Guide¶

From v0.10.0¶

⚠️ Breaking Change: The ClientConfig API has been refactored to use a unified Providers slice.

go get github.com/agentplexus/omnillm@v0.11.0
go mod tidy

Migration: Basic Client¶

// Before (v0.10.0)
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Provider: omnillm.ProviderNameOpenAI,
    APIKey:   apiKey,
})

// After (v0.11.0)
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: apiKey},
    },
})

Migration: With Fallback Providers¶

// Before (v0.10.0)
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Provider: omnillm.ProviderNameOpenAI,
    APIKey:   apiKey,
    FallbackProviders: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameAnthropic, APIKey: anthropicKey},
    },
})

// After (v0.11.0) - Providers[0] is primary, Providers[1+] are fallbacks
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: apiKey},
        {Provider: omnillm.ProviderNameAnthropic, APIKey: anthropicKey},
    },
})

Migration: Custom HTTP Client / Timeout¶

// Before (v0.10.0)
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Provider:   omnillm.ProviderNameOpenAI,
    APIKey:     apiKey,
    Timeout:    5 * time.Minute,
    HTTPClient: customHTTPClient,
})

// After (v0.11.0) - Timeout/HTTPClient moved to ProviderConfig
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {
            Provider:   omnillm.ProviderNameOpenAI,
            APIKey:     apiKey,
            Timeout:    5 * time.Minute,
            HTTPClient: customHTTPClient,
        },
    },
})

Migration: Custom Provider¶

// Before (v0.10.0)
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    CustomProvider: myCustomProvider,
})

// After (v0.11.0) - CustomProvider moved to ProviderConfig
client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {CustomProvider: myCustomProvider},
    },
})

Enable Token Validation¶

client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: apiKey},
    },
    TokenEstimator: omnillm.NewTokenEstimator(omnillm.DefaultTokenEstimatorConfig()),
    ValidateTokens: true,
})

Enable Response Caching¶

client, _ := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: apiKey},
    },
    Cache: kvsClient, // Your KVS implementation
    CacheConfig: &omnillm.CacheConfig{
        TTL: 1 * time.Hour,
    },
})

New Files¶

File	Description
`circuitbreaker.go`	Circuit breaker implementation
`circuitbreaker_test.go`	Circuit breaker tests
`fallback.go`	Fallback provider wrapper
`fallback_test.go`	Fallback tests
`tokens.go`	Token estimation
`tokens_test.go`	Token estimation tests
`cache.go`	Response caching
`cache_test.go`	Cache tests

Test Coverage¶

Main package: 72.7% coverage
New feature code: 78-95% coverage
45+ unit tests

Performance Considerations¶

Fallback Providers¶

Fallback adds minimal latency when primary succeeds
Circuit breaker prevents unnecessary attempts to failing providers
Consider provider ordering by latency/cost

Token Estimation¶

Character-based estimation is fast but approximate
Actual token count may vary by 5-15%
Use conservative estimates for critical applications

Response Caching¶

Cache lookups add ~1ms latency
TTL should match your freshness requirements
Consider memory/storage costs for cache backend

README - Full feature documentation
CHANGELOG - Complete change history
ROADMAP - Future plans