Overview
Threat Detection in Cascade is designed to identify and contain adversarial attempts to manipulate, compromise, or abuse AI agents at runtime. Cascade uses proprietary evaluation models and security policies to analyze agent behavior across inputs, reasoning traces, tool usage, and outputs. These systems are purpose-built to detect threats that traditional application security controls do not cover, such as instruction manipulation and alignment drift. For security reasons, Cascade does not publicly disclose the internal structure or weights of its evaluation models. The system is designed to provide strong guarantees without exposing mechanisms that could be reverse engineered or bypassed.How Threat Detection Works
Threat Detection operates continuously during agent execution. As agents run, Cascade’s DeepStream system captures detailed traces of behavior. These traces are evaluated in real time against security-specific detection models and internal security policies. When a threat is identified, Cascade records the event and applies a response based on the configured enforcement mode. Threat Detection is designed not only to identify attacks, but to prevent adversarial behavior from propagating into real-world actions.Prompt Injection Detection
Prompt injection attacks attempt to override or manipulate an agent’s instructions to produce unintended behavior. Cascade detects prompt injection by evaluating the full execution path of an agent, including:- User inputs
- Reasoning traces
- Tool calls
- Model outputs
Data Exfiltration Detection
Data exfiltration attacks attempt to extract sensitive or unintended data through agent behavior. Mitigation approach: Cascade mitigates data exfiltration by sanitizing all database reads before they are introduced into the agent context. Sanitization occurs during streaming to prevent stale-data prompt attacks and instruction manipulation. Protection guarantees:- The underlying database remains unaffected
- Embedded instructions cannot influence agent behavior
- Downstream reasoning remains isolated from untrusted data
Context Poisoning and Alignment Manipulation
Context poisoning attacks aim to gradually shift an agent’s behavior or alignment over time. How Cascade detects context poisoning: Cascade analyzes historical execution data across agent runs, translating behavioral traces into context and performance gradients compared over time.| Detection Capability | Benefit |
|---|---|
| Gradual manipulation detection | Identify coordinated attempts to shift behavior |
| Timeline establishment | Pinpoint when poisoning may have begun |
| Alignment degradation alerts | Surface issues before they result in failure |
Configuration
Threat Detection follows the same enforcement model as the rest of the Cascade platform, but can be configured independently when needed. By default, security enforcement uses the agent or project’senforcement_mode. Teams may optionally define a separate enforcement mode for security-specific detections.
Default Behavior
If no security-specific enforcement mode is defined, Threat Detection uses the configuredenforcement_mode.
This ensures consistent behavior across safety and security by default.
Security-Specific Enforcement Configuration
Teams may override enforcement behavior for security detections by specifyingsecurity_enforcement_mode.
Python
- Safety policies run in Observe mode
- Security threats trigger Detect-mode behavior and alerts
Python
Visibility and Response
While teams cannot modify the internal security frameworks or detection models, Threat Detection fully respects the configured enforcement mode. Detected threats are:- Evaluated using Cascade’s internal security models
- Recorded with full execution context
- Handled according to the active enforcement mode