Skip to main content

What are traces?

A trace represents a single execution of your AI agent from start to finish. It captures the complete workflow of every LLM call, tool invocation, and function execution as a hierarchical tree of operations called spans.
Cascade trace visualization showing hierarchical execution flow

Spans: The building blocks

A span is a single unit of work within a trace. Each span represents one operation in your agent’s execution.

LLM API call

Interaction with language models

Tool execution

Function and tool invocations

Reasoning step

Decision and extraction operations

Span types

Function spans

Created by the trace_run() context manager to mark the entry point of your agent execution.
  • Span name: Agent or function name
  • Custom metadata: Task IDs, user IDs, or any contextual data
  • Total execution duration: End-to-end timing
  • Success/error status: Whether the execution completed successfully
Example:
with trace_run("CustomerSupportAgent", metadata={"ticket_id": "TKT-789"}):
    # Everything inside becomes child spans
    ...
The trace_run() context manager creates the root span—all other operations inside become child spans automatically.

LLM spans

Created automatically by wrap_llm_client() to track every interaction with language models. Captured data:
AttributeDescription
Model namee.g., claude-3-5-sonnet-20241022
Providere.g., anthropic, openai
Prompt textComplete prompt and system messages
CompletionFull response text
Token countsInput, output, and total tokens
Estimated costCalculated cost in USD
LatencyResponse time in milliseconds
Streaming statusWhether response was streamed
Extracted reasoningReasoning steps if present in completion
LLM spans work with both messages.create() and messages.stream() methods automatically.

Tool spans

Created by the @tool decorator to track function and tool executions. Captured data:
  • Tool name: Function identifier
  • Tool description: Extracted from docstring
  • Serialized input parameters: All arguments passed to the function
  • Serialized output: Return value from the function
  • Execution duration: Time taken in milliseconds
  • Error details: Exception information if execution failed
Example:
@tool
def search_database(query: str, limit: int = 10) -> dict:
    """Search the database for matching records."""
    # Input: {"query": "...", "limit": 10}
    results = db.search(query, limit)
    # Output: {"results": [...], "count": 5}
    return results

Trace context propagation

Cascade SDK uses OpenTelemetry’s context propagation to maintain parent-child relationships automatically. How it works:
1

Root span creation

When you call trace_run(), a root span is created for your agent execution.
2

Tool span nesting

When a @tool decorated function is called inside a trace, the tool span becomes a child of the root span.
3

LLM span nesting

When the tool makes an LLM call with a wrapped client, the LLM span becomes a child of the tool span.
4

Automatic propagation

Context propagates through both sync and async function calls—no manual wiring needed.
The SDK handles context propagation automatically. You don’t need to pass context objects in.

Technical details

Data size limits

  • Text values truncated at 10,000 characters by default
  • Large objects serialized efficiently to JSON
  • Binary data excluded from capture

Async support

Full support for async/await patterns:
  • Async tool decorators propagate context correctly
  • Streaming LLM responses tracked incrementally
  • No blocking of async event loops

Error handling

When an operation fails, the span automatically captures detailed error information. Captured error details:
@tool
def risky_operation(data: str):
    result = json.loads(data)  # May raise JSONDecodeError
    return result
If this fails, the span will contain:
  • status_code: ERROR
  • tool.error: Exception message
  • Exception event with full stack trace
All exceptions are captured automatically, but they don’t prevent error propagation. Make sure to handle errors appropriately in your code.