Skip to main content
Failure Modes is a DeepStream analysis layer that converts raw agent traces into structured, system-level failure signals. Instead of reading long, unstructured multi-agent logs, teams can understand why an agentic workflow failed, or nearly failed, using a consistent taxonomy.
Failure Modes do not modify agent behavior or enforce controls. It is a diagnostic layer built entirely on top of Cascade’s observability pipeline.

Why Failure Modes?

Standard observability captures execution details: tool calls, latency metrics, and outputs. Failure Modes analyze trace data to classify failures by root cause using a structured taxonomy of agent system issues.

Traditional observability

  • What happened?
  • Which tool was called?
  • Where did latency spike?

Failure Modes

  • Why did this agent system fail?
  • Design or coordination issue?
  • Systematic or incidental failure?

What gets analyzed

Failure Modes operates on completed DeepStream traces using data already captured by the Cascade SDK.
Data SourceUsage
Agent messagesAnalyze inputs and outputs for patterns
System prompts and rolesDetect role violations and drift
Tool callsIdentify errors, loops, and retries
Agent handoffsTrack coordination and communication
Session structureDetect ordering issues and loops
Execution metadataConsider latency, cost, and retries
No additional instrumentation is required beyond standard Cascade SDK tracing.

Failure categories

Failures are grouped into three high-level categories, each representing a different class of system risk.
Failures caused by unclear or unstable agent design.Common signals:
  • Agent violates its declared role
  • Agent changes task scope mid-run
  • Repeated loops without progress
  • Premature termination without evidence
  • Context drift across steps
Example: A “Coder” agent starts explaining concepts instead of writing code, or an agent declares success without completing required steps.

How it works

1

Trace ingestion

DeepStream reconstructs a canonical session trace from distributed spans, including agent steps, tool calls, message routing, and artifacts.
2

Failure analysis

A post-processing analysis job inspects the trace and emits a structured Failure Report with failure categories, evidence spans, and explanations.
3

Storage and query

Failure reports are stored as analysis artifacts attached to sessions and can be queried, aggregated, and compared across runs.

Failure report structure

Each analyzed session produces a structured result showing what failed and why.
{
  "task_completion": false,
  "total_failures": 6,
  "categories": {
    "system_design": 2,
    "inter_agent": 3,
    "verification": 1
  },
  "failure_modes": [
    {
      "category": "inter_agent",
      "type": "IGNORED_AGENT_INPUT",
      "node_id": "agent:Reviewer:step_12",
      "evidence": [
        "Reviewer flagged missing edge cases",
        "Coder proceeded without modification"
      ]
    },
    {
      "category": "verification",
      "type": "NO_EVIDENCE_FOR_COMPLETION",
      "node_id": "agent:Coder:step_19",
      "evidence": [
        "Completion claimed",
        "No tests executed",
        "No verifier invoked"
      ]
    }
  ]
}

Using Failure Modes in practice

Session debugging

Use Failure Modes to jump directly to problematic steps and understand why a session failed, not just where.
  • Differentiate between model errors and system design errors
  • Find exact trace locations where failures occurred
  • Understand the context and evidence for each failure

Trend analysis

Aggregate failures across many runs to identify patterns and measure system health. Use cases:
  • Identify dominant failure categories across your agent fleet
  • Detect regressions after prompt or tool changes
  • Measure system maturity over time
Example insight:
“After deploying v1.4, inter-agent failures increased 3×, mostly from ignored reviewer feedback.”

Pre-production validation

Run Failure Modes on staging or canary traffic before production deployment.

Catch verification gaps

Identify missing validation steps before they reach production

Detect silent failures

Find cases where outputs look correct but lack evidence
Failure Modes is diagnostic only. It does not block execution, enforce policies, or automatically modify agent behavior.