> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runcascade.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Production monitoring

> Run evaluations at scale with Backtest and Active Evals

Production monitoring lets you run your rubrics at scale across real agent traffic. Two modes cover different needs: **Backtest** runs rubrics retroactively over historical traces (for auditing, regression testing, or backfilling), while **Active Evals** run automatically on every new trace as it completes (for real-time quality monitoring). Use Backtest to answer "how has my agent been performing?" and Active Evals to catch regressions and failures as they happen.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/task_creation_part.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=8f66589c17b119f43694cba7265a7252" alt="Create Evaluation Task form showing task name, project selection, and Backtest or Active Evals mode options" width="3000" height="1914" data-path="images/task_creation_part.png" />
</Frame>

### Evaluation scope

You can run rubrics at **trace-level**, **span-level**, or **session-level**—evaluating full executions, individual steps, or multi-turn sessions. You can also scope evaluations to a specific agent or tool by specifying which one to run on. This lets you target exactly what you care about: e.g., a hallucination rubric only on LLM spans, or a tool-usage rubric only on a particular tool.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/run_on_particular_tool.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=ac9ca6fd8cf12971e6c3319a8956b230" alt="Evaluation task configuration showing scope options for trace, span, session level and agent/tool selection" width="75%" data-path="images/run_on_particular_tool.png" />
</Frame>

## Backtest evaluations

Backtest evaluations let you run a set of rubrics against traces you've already collected.

Select the rubrics you want to run, then choose your target: a specific set of traces, all traces from a project, or filter by date range. You can scope evaluations down to specific span types too: run a hallucination rubric only against LLM calls, or a tool-usage rubric only against tool spans.

Run rubrics on recent traces to catch regressions, before a release to audit quality, or over full history to backfill.

From the **Tasks** page, create a new Backtest task, select your rubrics and target traces, and hit run. Results populate as each trace is evaluated.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/backtest_photo.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=96a397ea9727d25e82a9357df1e5b803" alt="Backtest task creation form showing rubric selection, trace filtering options, and the run button" width="2998" height="1912" data-path="images/backtest_photo.png" />
</Frame>

## Active Evals

Active Evals run your rubrics automatically on every new trace. No manual triggering required.

Pick a project, select the rubrics you want active, and Cascade evaluates each incoming trace as it completes. Results flow into the **Failures** page in real time, so you see problems as they happen rather than discovering them days later.

<Warning>
  If your agent starts hallucinating more frequently, calls tools it shouldn't, or produces lower-quality outputs, you'll know immediately through failing scores and trend data on the dashboard.
</Warning>

You can pause and resume Active Evals at any time, add or remove rubrics as your agent evolves, and scope them to specific projects.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/active_evals_photo.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=091236ef53896c90b56cc2a9de156c6e" alt="Tasks page showing an active evaluation task with rubric list, project filter, and pause/resume controls" width="3012" height="1906" data-path="images/active_evals_photo.png" />
</Frame>
