Feedback Scores¶
Feedback scores allow you to attach quality metrics to traces and spans. Use them to:
- Record user feedback (thumbs up/down, ratings)
- Store evaluation results from automated metrics
- Track quality metrics over time
Adding Feedback to Traces¶
// Add a numeric score
trace.AddFeedbackScore(ctx, "accuracy", 0.95, "High accuracy response")
// Add multiple scores
trace.AddFeedbackScore(ctx, "relevance", 0.87, "Mostly relevant")
trace.AddFeedbackScore(ctx, "helpfulness", 0.92, "Very helpful")
Adding Feedback to Spans¶
// Add feedback to specific spans
span.AddFeedbackScore(ctx, "latency_score", 0.75, "Response time acceptable")
span.AddFeedbackScore(ctx, "quality", 0.90, "Good quality output")
Score Parameters¶
| Parameter | Type | Description |
|---|---|---|
name |
string | Name of the score (e.g., "accuracy", "relevance") |
value |
float64 | Score value (typically 0.0 to 1.0) |
reason |
string | Optional explanation for the score |
Use Cases¶
User Feedback¶
func handleFeedback(ctx context.Context, traceID string, rating int) {
// Convert 1-5 rating to 0-1 score
score := float64(rating-1) / 4.0
trace := getTrace(traceID)
trace.AddFeedbackScore(ctx, "user_rating", score,
fmt.Sprintf("User rated %d/5", rating))
}
Automated Evaluation¶
func evaluateResponse(ctx context.Context, trace *opik.Trace, response string) {
// Run evaluation metrics
relevanceScore := evaluateRelevance(response)
factualScore := evaluateFactuality(response)
// Record as feedback scores
trace.AddFeedbackScore(ctx, "relevance", relevanceScore, "Automated relevance check")
trace.AddFeedbackScore(ctx, "factuality", factualScore, "Automated fact check")
}
A/B Testing¶
func recordABResult(ctx context.Context, trace *opik.Trace, variant string, converted bool) {
score := 0.0
if converted {
score = 1.0
}
trace.AddFeedbackScore(ctx, "conversion", score,
fmt.Sprintf("Variant %s, converted: %v", variant, converted))
}
Viewing Feedback Scores¶
Feedback scores are visible in the Opik UI:
- On the trace detail page
- In trace list summaries
- In experiment comparisons
- In analytics dashboards
Best Practices¶
- Use consistent names: Standardize score names across your application
- Normalize values: Use 0.0-1.0 range for easy comparison
- Include reasons: Add explanations for debugging and analysis
- Score at appropriate level: Use trace-level for overall quality, span-level for specific operations