Safety Signals

Overview

Safety Signals are the runtime indicators Cascade uses to evaluate agent behavior against defined safety policies and historical baselines. As agents execute, Cascade continuously analyzes tool calls, reasoning traces, and outputs to identify policy violations, behavioral drift, and anomalous patterns. These signals provide both immediate feedback for enforcement and longer-term insight into how agent behavior evolves over time. Safety Signals are designed to support enforcement, debugging, and iteration without relying on static checks or offline evaluation.

Policy Violation Signals

Policy violation signals are generated when agent behavior does not comply with the active set of safety policies. Each tool call and reasoning trace is evaluated against all selected policies for the agent or workflow. When a violation is identified, Cascade records the signal and applies an action based on the configured enforcement mode. These signals are evaluated in real time and are always contextualized within the full execution trace, ensuring that violations are interpreted correctly rather than in isolation.

Drift Signals

Drift signals surface longer-term changes in agent behavior across executions. In addition to evaluating individual events, Cascade maintains a rolling baseline derived from recent completed agent runs. At the end of each run, the system compares the run’s behavior against this baseline to detect statistically significant deviations. Only deviations that exceed significance thresholds are surfaced as drift signals.

Key Characteristics

Baselines are built from recent agent executions
Comparisons occur after run completion
Only statistically significant deviations are surfaced
Drift signals do not trigger alerts or enforcement actions

Drift signals are intended to help teams understand how changes in prompts, tools, models, or infrastructure affect agent behavior over time.

Classification Signals

Classification signals label agent behavior using structured semantic categories. Cascade uses internal categorization models to classify model events into one of 12 categories:

Informational
Out of Scope
Unsafe
Harmful
Hateful
Sexual Content
Violence
Self Harm
Deceptive
Privacy Risk
Criminal
None

Each classification is associated with a confidence score. These confidence scores are used during categorization policy evaluation to determine whether behavior is considered out of policy. Classification signals do not enforce behavior directly. They provide structured inputs that policies and enforcement modes act upon.

Safety Metrics

Safety Signals are aggregated into metrics that provide visibility into agent behavior and policy effectiveness. Collected metrics include:

Policy violations by type and severity
Actions taken in response to violations
Distribution of classification categories
Statistically significant drift signals
Trends over time by agent or workflow

These metrics are surfaced in the Cascade dashboard and are used to monitor safety posture, tune policies, and identify high-risk agents or workflows.

Getting started

Observability

Enforcement Modes

Safety

Security

Overview

Policy Violation Signals

Drift Signals

Key Characteristics

Classification Signals

Safety Metrics

Getting started

Observability

Enforcement Modes

Safety

Security

​Overview

​Policy Violation Signals

​Drift Signals

​Key Characteristics

​Classification Signals

​Safety Metrics

Overview

Policy Violation Signals

Drift Signals

Key Characteristics

Classification Signals

Safety Metrics