AgentPlexus

OpenTelemetry Semantic Conventions for Agentic AI

Modern AI systems are evolving from single-model interactions to multi-agent architectures. An orchestrator coordinates specialized agents—research, synthesis, verification, quality—each making LLM calls, invoking tools, and passing context to other agents. When something goes wrong, where do you look?

OpenTelemetry’s GenAI semantic conventions cover model identification, token usage, and basic agent identity. But they don’t address the unique challenges of multi-agent systems: tracking workflows across agents, measuring handoff latency, understanding task dependencies, and attributing costs to specific agents and operations.

We built semantic conventions that extend OpenTelemetry’s gen_ai.agent.* namespace to fill this gap.

The Observability Gap

OpenTelemetry GenAI covers:

What’s missing:

Extending the Namespace

We extend gen_ai.agent.* with four new concept areas:

gen_ai.agent.*              # Agent identity (aligned with OTel)
gen_ai.agent.workflow.*     # Workflow/session tracking
gen_ai.agent.task.*         # Task execution
gen_ai.agent.handoff.*      # Agent-to-agent communication
gen_ai.agent.tool_call.*    # Tool invocations

The design principles:

Workflow Tracking

Workflows represent end-to-end processing sessions:

gen_ai.agent.workflow.id: "wf-550e8400-e29b-41d4-a716-446655440000"
gen_ai.agent.workflow.name: "statistics-extraction"
gen_ai.agent.workflow.status: "completed"
gen_ai.agent.workflow.task.count: 5
gen_ai.agent.workflow.task.completed_count: 5
gen_ai.agent.workflow.duration: 45000
gen_ai.usage.total_tokens: 15420
gen_ai.usage.cost: 0.0847

At a glance: workflow health, task success rate, duration, and cost.

Task Execution

Tasks represent individual units of work:

gen_ai.agent.task.id: "task-research-001"
gen_ai.agent.task.name: "extract_gdp_statistics"
gen_ai.agent.task.type: "extraction"
gen_ai.agent.id: "research-agent-1"
gen_ai.agent.task.llm.call_count: 3
gen_ai.agent.task.tool_call.count: 7
gen_ai.agent.task.duration: 12500
gen_ai.agent.task.error.type: "rate_limit"
gen_ai.agent.task.error.message: "OpenAI rate limit exceeded"

Immediate visibility: which agent failed, what it was doing, and why.

Agent Handoffs

Handoffs track agent-to-agent communication:

gen_ai.agent.handoff.id: "ho-789"
gen_ai.agent.handoff.type: "delegate"
gen_ai.agent.handoff.from.agent.id: "orchestrator"
gen_ai.agent.handoff.to.agent.id: "synthesis-agent"
gen_ai.agent.handoff.payload.size: 4096
gen_ai.agent.handoff.latency: 23
gen_ai.agent.handoff.status: "completed"

Understand communication patterns, measure latency, identify bottlenecks between agents.

Tool Call Invocations

Tool calls track actual function invocations (distinct from OTel’s gen_ai.tool.* which describes tool definitions):

gen_ai.agent.tool_call.id: "tc-search-042"
gen_ai.agent.tool_call.name: "web_search"
gen_ai.agent.tool_call.type: "search"
gen_ai.agent.tool_call.duration: 850
gen_ai.agent.tool_call.http.status_code: 200
gen_ai.agent.tool_call.response.size: 15360
gen_ai.agent.tool_call.retry_count: 1

Implementation: Middleware Approach

The conventions are implemented as middleware that minimizes code changes:

import (
    "github.com/agentplexus/omniobserve/agentops"
    "github.com/agentplexus/omniobserve/agentops/middleware"
)

// 1. Create a store
store, _ := agentops.Open("postgres", agentops.WithDSN(dsn))

// 2. Start a workflow
ctx, workflow, _ := middleware.StartWorkflow(ctx, store,
    "statistics-extraction",
    middleware.WithInitiator("user:123"),
)
defer middleware.CompleteWorkflow(ctx)

// 3. Wrap agent HTTP handlers (automatic task creation)
handler := middleware.AgentHandler(middleware.AgentHandlerConfig{
    AgentID:   "synthesis-agent",
    AgentType: "synthesis",
    Store:     store,
})(yourHandler)

// 4. Use instrumented client (automatic handoff tracking)
client := middleware.NewAgentClient(http.DefaultClient,
    middleware.AgentClientConfig{
        FromAgentID: "orchestrator",
        Store:       store,
    },
)

// 5. Wrap tool calls (automatic timing and error tracking)
results, _ := middleware.ToolCall(ctx, "web_search",
    func() ([]Result, error) {
        return searchService.Search(query)
    },
    middleware.WithToolType("search"),
)

What the Middleware Tracks

Component What It Tracks Code Changes
StartWorkflow() Lifecycle, duration, task counts ~3 lines
AgentHandler() Task timing, HTTP status, errors ~5 lines/agent
NewAgentClient() Handoff latency, payload size ~5 lines shared
ToolCall() Execution time, request/response size ~3 lines/call

Automatic Context Propagation

Context flows automatically across boundaries:

Within a process: Workflow, task, agent info attached to context.Context

Across services (HTTP headers):

X-AgentOps-Workflow-ID: wf-550e8400-...
X-AgentOps-Task-ID: task-123
X-AgentOps-Agent-ID: orchestrator

No manual ID passing required.

Practical Use Cases

Debugging Failed Workflows:

SELECT task_name, agent_id, error_type, error_message
FROM tasks
WHERE workflow_id = 'wf-550e8400-...'
  AND status = 'failed';

Cost Attribution by Agent:

SELECT agent_type, SUM(total_tokens) as tokens
FROM tasks
GROUP BY agent_type;

Finding Bottlenecks:

SELECT from_agent_id, to_agent_id, AVG(latency) as avg_latency
FROM handoffs
GROUP BY from_agent_id, to_agent_id;

YAML Model Definitions

Following OTel’s approach, conventions are defined in YAML:

model/
├── registry.yaml   # Attribute definitions
├── spans.yaml      # Span type definitions
└── events.yaml     # Event type definitions

These serve as the source of truth for code generation, documentation, and validation.

Compatibility

The conventions are designed to work with:

What’s Next

These conventions are part of OmniObserve’s AgentOps module. We’re actively developing:

Multi-agent AI systems deserve first-class observability. Standard semantic conventions make that possible.