Skip to content

Response Caching

OmniLLM supports response caching to reduce API costs for identical requests.

Basic Usage

client, err := omnillm.NewClient(omnillm.ClientConfig{
    Providers: []omnillm.ProviderConfig{
        {Provider: omnillm.ProviderNameOpenAI, APIKey: "your-key"},
    },
    Cache: kvsClient, // Your KVS implementation (Redis, DynamoDB, etc.)
    CacheConfig: &omnillm.CacheConfig{
        TTL:       1 * time.Hour,
        KeyPrefix: "myapp:llm-cache",
    },
})

// First call hits the API
response1, _ := client.CreateChatCompletion(ctx, request)

// Second identical call returns cached response
response2, _ := client.CreateChatCompletion(ctx, request)

// Check if response was from cache
if response2.ProviderMetadata["cache_hit"] == true {
    fmt.Println("Response was cached!")
}

Configuration

cacheConfig := &omnillm.CacheConfig{
    TTL:                1 * time.Hour,       // Time-to-live
    KeyPrefix:          "omnillm:cache",     // Key prefix
    SkipStreaming:      true,                // Don't cache streaming (default)
    CacheableModels:    []string{"gpt-4o"},  // Only cache specific models (nil = all)
    IncludeTemperature: true,                // Temperature affects cache key
    IncludeSeed:        true,                // Seed affects cache key
}

Cache Key Generation

Cache keys are generated from a SHA-256 hash of:

  • Model name
  • Messages (role, content, name, tool_call_id)
  • MaxTokens, Temperature, TopP, TopK, Seed, Stop sequences

Different parameter values = different cache keys.

Cache Backends

Caching uses the same KVS backend as conversation memory:

  • Redis: High-performance distributed caching
  • DynamoDB: AWS-native caching
  • In-Memory: Development and testing
  • Custom: Any Sogo KVS implementation