> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runcascade.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Rubrics & Evaluation

> Define what good looks like and automatically score every agent execution

Cascade Evals let you systematically score every agent execution. Instead of manually reviewing traces, you define what "good" looks like, and Cascade tells you when your agent drifts.

Once you've created rubrics, you can see how well your agents perform across every execution—pass rates, failure modes, and where quality slips. You get a clear picture of agent behavior over time.

## Evaluation scope

When you create a rubric, you choose its **evaluation scope**: trace-level, span-level, or session-level. The scope determines what data the rubric can access and what it evaluates.

<CardGroup cols={3}>
  <Card title="Trace-level" icon="route">
    Evaluates the entire agent execution from start to finish. Has access to all LLM calls, tool calls, and the full trajectory.
    **Use for:** Overall quality, hallucination detection, efficiency
  </Card>

  <Card title="Span-level" icon="link">
    Evaluates individual steps—one LLM call or one tool call. Only has access to that specific span's data.
    **Use for:** LLM response quality, tool correctness, individual step validation
  </Card>

  <Card title="Session-level" icon="layers">
    Evaluates the full session trajectory across multiple traces/turns in one rubric run.
    **Use for:** Multi-turn continuity, cross-trace consistency, overall journey quality
  </Card>
</CardGroup>

When you run a rubric, evaluation results appear in the trace view. The example below shows trace-level evaluation output—including pass/fail status and detailed reasoning for each rubric.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/evaluation_display.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=6f79a0e8522cdcf869cf5d138dc2a90c" alt="Trace view showing trace-level evaluation results with pass/fail status and detailed failure reasoning" width="1560" height="1824" data-path="images/evaluation_display.png" />
</Frame>

### Template variables

Inside your Evaluation Template—the prompt field you enter when creating a rubric—you use **double curly braces** `{{variable_name}}` as placeholders. Cascade replaces each placeholder with the actual value for the trace, span, or session being evaluated. The variables available depend on the rubric's scope.

<Tabs>
  <Tab title="Trace-level">
    <ParamField path="input" type="string">
      Initial input to the agent / trace entry
    </ParamField>

    <ParamField path="actual_output" type="string">
      Final output of the trace
    </ParamField>

    <ParamField path="context" type="string">
      Retrieval context (if available)
    </ParamField>

    <ParamField path="trajectory" type="string">
      Full execution trajectory (all LLM & tool calls grouped by agent)
    </ParamField>

    <ParamField path="span_count" type="number">
      Total number of spans in the trace
    </ParamField>

    <ParamField path="tool_calls" type="string">
      All tool calls (name, input, output per call)
    </ParamField>

    <ParamField path="llm_calls" type="string">
      All LLM calls (model, prompt, completion per call)
    </ParamField>

    <ParamField path="duration" type="string">
      Total trace duration
    </ParamField>

    <ParamField path="has_error" type="boolean">
      Whether the trace has errors
    </ParamField>
  </Tab>

  <Tab title="Span-level">
    <ParamField path="input" type="string">
      The input to the span (LLM prompt or tool input)
    </ParamField>

    <ParamField path="actual_output" type="string">
      The output of the span (LLM completion or tool output)
    </ParamField>

    **Tool span variables:**

    <ParamField path="tool_name" type="string">
      Name of the tool
    </ParamField>

    <ParamField path="tool_input" type="string">
      Input passed to the tool
    </ParamField>

    <ParamField path="tool_output" type="string">
      Output returned by the tool
    </ParamField>

    **LLM span variables:**

    <ParamField path="prompt" type="string">
      The LLM prompt
    </ParamField>

    <ParamField path="completion" type="string">
      The LLM completion / response
    </ParamField>

    <ParamField path="model" type="string">
      The LLM model used
    </ParamField>
  </Tab>

  <Tab title="Session-level">
    <ParamField path="trajectory" type="string">
      Combined trajectory across all traces in the session
    </ParamField>

    <ParamField path="trace_count" type="number">
      Number of traces included in the session trajectory
    </ParamField>

    <ParamField path="span_count" type="number">
      Total spans across the session
    </ParamField>

    <ParamField path="tool_calls" type="string">
      All tool calls across traces
    </ParamField>

    <ParamField path="llm_calls" type="string">
      All LLM calls across traces
    </ParamField>

    <ParamField path="duration" type="string">
      Session duration
    </ParamField>

    <ParamField path="has_error" type="boolean">
      Whether any trace in session had errors
    </ParamField>

    **Compatibility variables:**

    <ParamField path="input" type="string">
      First known input in the session
    </ParamField>

    <ParamField path="actual_output" type="string">
      Last known output in the session
    </ParamField>

    <ParamField path="context" type="string">
      Retrieval context fallback
    </ParamField>
  </Tab>
</Tabs>

## Pre-built rubrics

Cascade ships with a library of ready-to-use rubrics covering general and use-case specific failure modes: helpfulness, hallucination, tool usage efficiency, and more.

<Steps>
  <Step title="Browse rubrics">
    Go to **Rubrics** in the sidebar and browse the built-in templates. Each rubric comes with a pre-configured rubric prompt, threshold, and output type.
  </Step>

  <Step title="Activate">
    Select the rubrics relevant to your agent and activate them.
  </Step>

  <Step title="Customize">
    Every pre-built rubric can be customized after activation. Adjust the prompt template, change the threshold, switch the model, or modify the scoring criteria to match your domain.
  </Step>
</Steps>

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/Rubric%20Page.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=1de6c66ac3aa1397824139c3e2c52089" alt="Rubrics page showing the list of built-in rubric templates with their descriptions and activate buttons" width="3456" height="1918" data-path="images/Rubric Page.png" />
</Frame>

## Custom rubrics

When pre-built rubrics don't cover your use case, create your own.

From the **Rubrics** page, click **Create Rubric** and choose between:

<CardGroup cols={2}>
  <Card title="Eval Model" icon="brain">
    Write a natural-language prompt that our model uses to evaluate your agent's behavior. Supports binary (pass/fail), scale (0-1), or classification outputs.
  </Card>

  <Card title="Code Rubric" icon="code">
    Write a Python function that programmatically checks agent outputs against your own logic.
    <Badge variant="coming-soon">Coming soon</Badge>
  </Card>
</CardGroup>

Define what to evaluate (the full trace, a specific agent, or individual spans), set your pass/fail threshold, and the rubric is ready to run.

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/scorer_creation_showcase.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=08b5a9fe33209ad9d7bc77230bb338b7" alt="Create Rubric form showing the evaluation scope, prompt template editor, and threshold configuration" width="1600" height="1762" data-path="images/scorer_creation_showcase.png" />
</Frame>

## Auto evals

Auto Evals analyze your trace data to surface what matters before you even define a rubric. Access them by pressing the **Generate Rubric** button while viewing a trace.

### How it works

<Steps>
  <Step title="Collect patterns">
    Cascade collects trace data across your agent's executions (tool call patterns, LLM outputs, decision paths, error rates) and identifies recurring behavioral patterns.
  </Step>

  <Step title="Detect critical paths">
    From these patterns, it detects which parts of the execution are most critical for your agent to succeed and where potential failures are likely to occur.
  </Step>

  <Step title="Review and activate">
    The result is a curated set of suggested rubrics tailored to your agent's actual behavior. You review the suggestions and select which ones to activate. No manual prompt engineering required.
  </Step>
</Steps>

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/auto_rubric_generation.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=844e7741e0d0d678336507b767ead6e4" alt="Auto Eval failure mode analysis showing suggested rubrics based on trace behavior" width="3456" height="1922" data-path="images/auto_rubric_generation.png" />
</Frame>

## Rubric generation from human comments

The fastest way to create a rubric is to describe what went wrong.

While reviewing a trace, you can leave comments directly on any span: a tool call that returned bad data, an LLM response that missed the point, or an agent that took an unnecessary detour. Use `@handles` to reference specific spans in your comment.

Cascade takes your comment, analyzes the trace context, and automatically generates a rubric that captures the issue. That rubric then runs against future traces, catching the same class of problem before it reaches users.

<Tip>
  **Example:** You notice a trace where the agent calls the weather API three times in a row. You comment: *"The agent should not make redundant tool calls."* Cascade generates a trajectory rubric that flags any trace exhibiting repeated tool calls, and applies it going forward.
</Tip>

<Frame>
  <img src="https://mintcdn.com/cascade-e69b1028/ehHq1Qh2c0Vu4bCS/images/Comment_create_scorer.png?fit=max&auto=format&n=ehHq1Qh2c0Vu4bCS&q=85&s=cbdf2201f9e5458d9bab55c99792b2e5" alt="Trace detail view showing the Comment and Create Rubric panel with span input, output, and create rubric button" width="1566" height="1810" data-path="images/Comment_create_scorer.png" />
</Frame>
