Why Failure Modes?
Standard observability captures execution details: tool calls, latency metrics, and outputs. Failure Modes analyze trace data to classify failures by root cause using a structured taxonomy of agent system issues.Traditional observability
- What happened?
- Which tool was called?
- Where did latency spike?
Failure Modes
- Why did this agent system fail?
- Design or coordination issue?
- Systematic or incidental failure?
What gets analyzed
Failure Modes operates on completed DeepStream traces using data already captured by the Cascade SDK.| Data Source | Usage |
|---|---|
| Agent messages | Analyze inputs and outputs for patterns |
| System prompts and roles | Detect role violations and drift |
| Tool calls | Identify errors, loops, and retries |
| Agent handoffs | Track coordination and communication |
| Session structure | Detect ordering issues and loops |
| Execution metadata | Consider latency, cost, and retries |
No additional instrumentation is required beyond standard Cascade SDK tracing.
Failure categories
Failures are grouped into three high-level categories, each representing a different class of system risk.- System Design Failures
- Inter-Agent Misalignment
- Verification Failures
Failures caused by unclear or unstable agent design.Common signals:
- Agent violates its declared role
- Agent changes task scope mid-run
- Repeated loops without progress
- Premature termination without evidence
- Context drift across steps
How it works
1
Trace ingestion
DeepStream reconstructs a canonical session trace from distributed spans, including agent steps, tool calls, message routing, and artifacts.
2
Failure analysis
A post-processing analysis job inspects the trace and emits a structured Failure Report with failure categories, evidence spans, and explanations.
3
Storage and query
Failure reports are stored as analysis artifacts attached to sessions and can be queried, aggregated, and compared across runs.
Failure report structure
Each analyzed session produces a structured result showing what failed and why.Using Failure Modes in practice
Session debugging
Use Failure Modes to jump directly to problematic steps and understand why a session failed, not just where.What you can identify
What you can identify
- Differentiate between model errors and system design errors
- Find exact trace locations where failures occurred
- Understand the context and evidence for each failure
Trend analysis
Aggregate failures across many runs to identify patterns and measure system health. Use cases:- Identify dominant failure categories across your agent fleet
- Detect regressions after prompt or tool changes
- Measure system maturity over time
“After deploying v1.4, inter-agent failures increased 3×, mostly from ignored reviewer feedback.”
Pre-production validation
Run Failure Modes on staging or canary traffic before production deployment.Catch verification gaps
Identify missing validation steps before they reach production
Detect silent failures
Find cases where outputs look correct but lack evidence