Skip to main content

Overview

Threat Detection in Cascade is designed to identify and contain adversarial attempts to manipulate, compromise, or abuse AI agents at runtime. Cascade uses proprietary evaluation models and security policies to analyze agent behavior across inputs, reasoning traces, tool usage, and outputs. These systems are purpose-built to detect threats that traditional application security controls do not cover, such as instruction manipulation and alignment drift. For security reasons, Cascade does not publicly disclose the internal structure or weights of its evaluation models. The system is designed to provide strong guarantees without exposing mechanisms that could be reverse engineered or bypassed.

How Threat Detection Works

Threat Detection operates continuously during agent execution. As agents run, Cascade’s DeepStream system captures detailed traces of behavior. These traces are evaluated in real time against security-specific detection models and internal security policies. When a threat is identified, Cascade records the event and applies a response based on the configured enforcement mode. Threat Detection is designed not only to identify attacks, but to prevent adversarial behavior from propagating into real-world actions.

Prompt Injection Detection

Prompt injection attacks attempt to override or manipulate an agent’s instructions to produce unintended behavior. Cascade detects prompt injection by evaluating the full execution path of an agent, including:
  • User inputs
  • Reasoning traces
  • Tool calls
  • Model outputs
Rather than relying on static prompt inspection, Cascade evaluates whether injected instructions influence downstream behavior in ways that violate expected intent or security constraints. If compromise is detected, Cascade ensures that injected behavior cannot propagate into tool execution, external actions, or final outputs, even if the injection attempt partially succeeds.

Data Exfiltration Detection

Data exfiltration attacks attempt to extract sensitive or unintended data through agent behavior. Mitigation approach: Cascade mitigates data exfiltration by sanitizing all database reads before they are introduced into the agent context. Sanitization occurs during streaming to prevent stale-data prompt attacks and instruction manipulation. Protection guarantees:
  • The underlying database remains unaffected
  • Embedded instructions cannot influence agent behavior
  • Downstream reasoning remains isolated from untrusted data
This approach protects both the agent and the data layer without introducing persistence risk.

Context Poisoning and Alignment Manipulation

Context poisoning attacks aim to gradually shift an agent’s behavior or alignment over time. How Cascade detects context poisoning: Cascade analyzes historical execution data across agent runs, translating behavioral traces into context and performance gradients compared over time.
Detection CapabilityBenefit
Gradual manipulation detectionIdentify coordinated attempts to shift behavior
Timeline establishmentPinpoint when poisoning may have begun
Alignment degradation alertsSurface issues before they result in failure
Response capability: By exposing these timelines, Cascade enables teams to retrain or roll back agents to a known-good point instead of performing full retraining.

Configuration

Threat Detection follows the same enforcement model as the rest of the Cascade platform, but can be configured independently when needed. By default, security enforcement uses the agent or project’s enforcement_mode. Teams may optionally define a separate enforcement mode for security-specific detections.

Default Behavior

If no security-specific enforcement mode is defined, Threat Detection uses the configured enforcement_mode. This ensures consistent behavior across safety and security by default.

Security-Specific Enforcement Configuration

Teams may override enforcement behavior for security detections by specifying security_enforcement_mode.
Python
from cascade import init_tracing

init_tracing(
    project="my_project",
    enforcement_mode="observe",
    security_enforcement_mode="detect",
    endpoint="https://api.runcascade.com/v1/traces",
    api_key="your-api-key"
)
In this configuration:
  • Safety policies run in Observe mode
  • Security threats trigger Detect-mode behavior and alerts
Security enforcement can also be configured per workflow or agent run using metadata.
Python
from cascade import trace_run

with trace_run(
    "PaymentAgent",
    metadata={
        "agent_id": "payment-agent",
        "security_enforcement_mode": "enforce"
    }
):
    agent.run("issue_refund")
This allows teams to apply stricter enforcement to high-risk agents without changing global settings.

Visibility and Response

While teams cannot modify the internal security frameworks or detection models, Threat Detection fully respects the configured enforcement mode. Detected threats are:
  • Evaluated using Cascade’s internal security models
  • Recorded with full execution context
  • Handled according to the active enforcement mode
Depending on configuration, Cascade may observe, alert on, or actively block adversarial behavior. All security events are surfaced in the Cascade dashboard, where teams can inspect threat details, understand propagation paths, and correlate incidents with agent changes or deployments.