Streaming¶

OmniLLM supports real-time response streaming for all providers.

Basic Streaming¶

stream, err := client.CreateChatCompletionStream(context.Background(), &omnillm.ChatCompletionRequest{
    Model: omnillm.ModelGPT4o,
    Messages: []omnillm.Message{
        {Role: omnillm.RoleUser, Content: "Tell me a short story about AI."},
    },
    MaxTokens:   &[]int{200}[0],
    Temperature: &[]float64{0.8}[0],
})
if err != nil {
    log.Fatal(err)
}
defer stream.Close()

fmt.Print("AI Response: ")
for {
    chunk, err := stream.Recv()
    if err == io.EOF {
        break
    }
    if err != nil {
        log.Fatal(err)
    }

    if len(chunk.Choices) > 0 && chunk.Choices[0].Delta != nil {
        fmt.Print(chunk.Choices[0].Delta.Content)
    }
}
fmt.Println()

Stream Interface¶

type ChatCompletionStream interface {
    // Recv receives the next chunk from the stream
    Recv() (*ChatCompletionStreamResponse, error)

    // Close closes the stream
    Close() error
}

Provider Support¶

Provider	Streaming
OpenAI	Yes
Anthropic	Yes (SSE)
Google Gemini	Yes
X.AI	Yes
Ollama	Yes
AWS Bedrock	Yes

Streaming with Observability¶

When using observability hooks, wrap the stream to track streaming metrics:

func (h *MyHook) WrapStream(ctx context.Context, info omnillm.LLMCallInfo, req *omnillm.ChatCompletionRequest, stream omnillm.ChatCompletionStream) omnillm.ChatCompletionStream {
    return &observableStream{
        stream:    stream,
        ctx:       ctx,
        info:      info,
        startTime: time.Now(),
    }
}