Threat Detection

Overview

Threat Detection in Cascade is designed to identify and contain adversarial attempts to manipulate, compromise, or abuse AI agents at runtime. Cascade uses proprietary evaluation models and security policies to analyze agent behavior across inputs, reasoning traces, tool usage, and outputs. These systems are purpose-built to detect threats that traditional application security controls do not cover, such as instruction manipulation and alignment drift. For security reasons, Cascade does not publicly disclose the internal structure or weights of its evaluation models. The system is designed to provide strong guarantees without exposing mechanisms that could be reverse engineered or bypassed.

How Threat Detection Works

Threat Detection operates continuously during agent execution. As agents run, Cascade’s DeepStream system captures detailed traces of behavior. These traces are evaluated in real time against security-specific detection models and internal security policies. When a threat is identified, Cascade records the event and applies a response based on the configured enforcement mode. Threat Detection is designed not only to identify attacks, but to prevent adversarial behavior from propagating into real-world actions.

Prompt Injection Detection

Prompt injection attacks attempt to override or manipulate an agent’s instructions to produce unintended behavior. Cascade detects prompt injection by evaluating the full execution path of an agent, including:

User inputs
Reasoning traces
Tool calls
Model outputs

Rather than relying on static prompt inspection, Cascade evaluates whether injected instructions influence downstream behavior in ways that violate expected intent or security constraints. If compromise is detected, Cascade ensures that injected behavior cannot propagate into tool execution, external actions, or final outputs, even if the injection attempt partially succeeds.

Data Exfiltration Detection

Data exfiltration attacks attempt to extract sensitive or unintended data through agent behavior. Mitigation approach: Cascade mitigates data exfiltration by sanitizing all database reads before they are introduced into the agent context. Sanitization occurs during streaming to prevent stale-data prompt attacks and instruction manipulation. Protection guarantees:

The underlying database remains unaffected
Embedded instructions cannot influence agent behavior
Downstream reasoning remains isolated from untrusted data

This approach protects both the agent and the data layer without introducing persistence risk.

Context Poisoning and Alignment Manipulation

Context poisoning attacks aim to gradually shift an agent’s behavior or alignment over time. How Cascade detects context poisoning: Cascade analyzes historical execution data across agent runs, translating behavioral traces into context and performance gradients compared over time.

Detection Capability	Benefit
Gradual manipulation detection	Identify coordinated attempts to shift behavior
Timeline establishment	Pinpoint when poisoning may have begun
Alignment degradation alerts	Surface issues before they result in failure

Response capability: By exposing these timelines, Cascade enables teams to retrain or roll back agents to a known-good point instead of performing full retraining.

Configuration

Threat Detection follows the same enforcement model as the rest of the Cascade platform, but can be configured independently when needed. By default, security enforcement uses the agent or project’s enforcement_mode. Teams may optionally define a separate enforcement mode for security-specific detections.

Default Behavior

If no security-specific enforcement mode is defined, Threat Detection uses the configured enforcement_mode. This ensures consistent behavior across safety and security by default.

Security-Specific Enforcement Configuration

Teams may override enforcement behavior for security detections by specifying security_enforcement_mode.

Python

from cascade import init_tracing

init_tracing(
    project="my_project",
    enforcement_mode="observe",
    security_enforcement_mode="detect",
    endpoint="https://api.runcascade.com/v1/traces",
    api_key="your-api-key"
)

In this configuration:

Safety policies run in Observe mode
Security threats trigger Detect-mode behavior and alerts

Security enforcement can also be configured per workflow or agent run using metadata.

Python

from cascade import trace_run

with trace_run(
    "PaymentAgent",
    metadata={
        "agent_id": "payment-agent",
        "security_enforcement_mode": "enforce"
    }
):
    agent.run("issue_refund")

This allows teams to apply stricter enforcement to high-risk agents without changing global settings.

Visibility and Response

While teams cannot modify the internal security frameworks or detection models, Threat Detection fully respects the configured enforcement mode. Detected threats are:

Evaluated using Cascade’s internal security models
Recorded with full execution context
Handled according to the active enforcement mode

Depending on configuration, Cascade may observe, alert on, or actively block adversarial behavior. All security events are surfaced in the Cascade dashboard, where teams can inspect threat details, understand propagation paths, and correlate incidents with agent changes or deployments.

Getting started

Observability

Enforcement Modes

Safety

Security

Threat Detection

Overview

How Threat Detection Works

Prompt Injection Detection

Data Exfiltration Detection

Context Poisoning and Alignment Manipulation

Configuration

Default Behavior

Security-Specific Enforcement Configuration

Visibility and Response

Getting started

Observability

Enforcement Modes

Safety

Security

​Overview

​How Threat Detection Works

​Prompt Injection Detection

​Data Exfiltration Detection

​Context Poisoning and Alignment Manipulation

​Configuration

​Default Behavior

​Security-Specific Enforcement Configuration

​Visibility and Response

Overview

How Threat Detection Works

Prompt Injection Detection

Data Exfiltration Detection

Context Poisoning and Alignment Manipulation

Configuration

Default Behavior

Security-Specific Enforcement Configuration

Visibility and Response