Failure Modes

Failure Modes is a DeepStream analysis layer that converts raw agent traces into structured, system-level failure signals. Instead of reading long, unstructured multi-agent logs, teams can understand why an agentic workflow failed, or nearly failed, using a consistent taxonomy.

Failure Modes do not modify agent behavior or enforce controls. It is a diagnostic layer built entirely on top of Cascade’s observability pipeline.

Why Failure Modes?

Standard observability captures execution details: tool calls, latency metrics, and outputs. Failure Modes analyze trace data to classify failures by root cause using a structured taxonomy of agent system issues.

Traditional observability

What happened?
Which tool was called?
Where did latency spike?

Why did this agent system fail?
Design or coordination issue?
Systematic or incidental failure?

What gets analyzed

Failure Modes operates on completed DeepStream traces using data already captured by the Cascade SDK.

Data Source	Usage
Agent messages	Analyze inputs and outputs for patterns
System prompts and roles	Detect role violations and drift
Tool calls	Identify errors, loops, and retries
Agent handoffs	Track coordination and communication
Session structure	Detect ordering issues and loops
Execution metadata	Consider latency, cost, and retries

No additional instrumentation is required beyond standard Cascade SDK tracing.

Failure categories

Failures are grouped into three high-level categories, each representing a different class of system risk.

System Design Failures
Inter-Agent Misalignment
Verification Failures

Failures caused by unclear or unstable agent design.Common signals:

Agent violates its declared role
Agent changes task scope mid-run
Repeated loops without progress
Premature termination without evidence
Context drift across steps

Example: A “Coder” agent starts explaining concepts instead of writing code, or an agent declares success without completing required steps.

How it works

Trace ingestion

DeepStream reconstructs a canonical session trace from distributed spans, including agent steps, tool calls, message routing, and artifacts.

Failure analysis

A post-processing analysis job inspects the trace and emits a structured Failure Report with failure categories, evidence spans, and explanations.

Storage and query

Failure reports are stored as analysis artifacts attached to sessions and can be queried, aggregated, and compared across runs.

Failure report structure

Each analyzed session produces a structured result showing what failed and why.

{
  "task_completion": false,
  "total_failures": 6,
  "categories": {
    "system_design": 2,
    "inter_agent": 3,
    "verification": 1
  },
  "failure_modes": [
    {
      "category": "inter_agent",
      "type": "IGNORED_AGENT_INPUT",
      "node_id": "agent:Reviewer:step_12",
      "evidence": [
        "Reviewer flagged missing edge cases",
        "Coder proceeded without modification"
      ]
    },
    {
      "category": "verification",
      "type": "NO_EVIDENCE_FOR_COMPLETION",
      "node_id": "agent:Coder:step_19",
      "evidence": [
        "Completion claimed",
        "No tests executed",
        "No verifier invoked"
      ]
    }
  ]
}

Using Failure Modes in practice

Session debugging

Use Failure Modes to jump directly to problematic steps and understand why a session failed, not just where.

What you can identify

Differentiate between model errors and system design errors
Find exact trace locations where failures occurred
Understand the context and evidence for each failure

Trend analysis

Aggregate failures across many runs to identify patterns and measure system health. Use cases:

Identify dominant failure categories across your agent fleet
Detect regressions after prompt or tool changes
Measure system maturity over time

Example insight:

“After deploying v1.4, inter-agent failures increased 3×, mostly from ignored reviewer feedback.”

Pre-production validation

Run Failure Modes on staging or canary traffic before production deployment.

Catch verification gaps

Identify missing validation steps before they reach production

Detect silent failures

Find cases where outputs look correct but lack evidence

Failure Modes is diagnostic only. It does not block execution, enforce policies, or automatically modify agent behavior.

Getting started

Observability

Enforcement Modes

Safety

Security

Failure Modes

Why Failure Modes?

Traditional observability

Failure Modes

What gets analyzed

Failure categories

How it works

Failure report structure

Using Failure Modes in practice

Session debugging

Trend analysis

Pre-production validation

Catch verification gaps

Detect silent failures

Getting started

Observability

Enforcement Modes

Safety

Security

​Why Failure Modes?

Traditional observability

Failure Modes

​What gets analyzed

​Failure categories

​How it works

​Failure report structure

​Using Failure Modes in practice

​Session debugging

​Trend analysis

​Pre-production validation

Catch verification gaps

Detect silent failures

Why Failure Modes?

What gets analyzed

Failure categories

How it works

Failure report structure

Using Failure Modes in practice

Session debugging

Trend analysis

Pre-production validation