Skip to main content
Safety in Cascade is the process of ensuring that AI agents behave within explicitly defined boundaries during execution. Rather than prescribing a fixed definition of “safe”, Cascade allows teams to define their own safety policies that are continuously evaluated against agent behavior at runtime. Policies are evaluated using traces captured from LLM calls, tool usage, and reasoning steps, producing structured signals that inform enforcement decisions. This approach gives teams precise, context-aware control over what their agents are allowed to do and what behavior must be prevented.

How Safety Works in Cascade

Safety in Cascade is policy-driven and signal-based. As an agent executes, Cascade’s DeepStream system captures detailed traces of its behavior. These traces are evaluated in real time against user-defined safety policies. The evaluation produces Safety Signals that indicate policy compliance, behavioral drift, and semantic classification of agent actions. These signals enable teams to monitor agent behavior, identify violations, and take appropriate action based on the configured enforcement mode.

Safety Policies

Cascade supports three policy types, each designed to address a different class of agent behavior.

Tool Policies

Control which tools an agent can use. Evaluated at invocation to prevent access to destructive or sensitive operations.

Categorization Policies

Classify agent outputs into semantic categories and block out-of-policy content like harmful or unsafe responses.

Semantic Policies

Define behavioral constraints using natural language. Enforce complex, context-dependent rules through semantic reasoning.
Policies are defined as JSON configuration files and can be attached globally or per workflow. They can be updated without changing agent code, enabling rapid iteration and safe rollout.

Safety Signals

Safety Signals are the runtime indicators Cascade uses to evaluate agent behavior. There are three primary signal types:
  • Policy Violation Signals: Indicate when agent behavior does not comply with active safety policies. These signals are generated in real time and drive enforcement actions.
  • Drift Signals: Surface statistically significant changes in agent behavior across executions. These signals help teams understand how changes in prompts, tools, or models affect behavior over time.
  • Classification Signals: Label agent behavior into semantic categories. These classifications provide structured inputs that categorization policies act upon.
Safety Signals are aggregated into metrics that provide visibility into agent behavior and policy effectiveness across time and workflows.

Runtime Evaluation

Safety evaluation happens continuously while an agent runs. As traces are generated, Cascade:
  • Extracts relevant behavioral signals from tool calls, reasoning traces, and outputs
  • Evaluates all active policies against the current execution context
  • Classifies agent behavior into semantic categories
  • Detects drift against historical baselines
  • Produces safety findings tied to the trace
This enables safety decisions to be made using full execution context rather than static prompts or offline checks.

Who Safety Is For

The Safety system is designed for:
  • Developers building agent workflows
  • Platform teams managing agent behavior at scale
  • Product teams defining acceptable outcomes
  • Compliance and security teams monitoring behavioral risk

Next steps