GuideFundamentals

Agent Traces

Recording and viewing agent execution traces

When running tasks with agents, cua-bench automatically records detailed traces of agent execution. These traces capture screenshots, actions, reasoning, and metadata—essential for debugging, evaluation, and dataset creation.

Trace Output Locations

After running a task with an agent, traces are saved to ~/.local/share/cua-bench/runs/<run_id>/<task>_v<variant>/:

~/.local/share/cua-bench/runs/96d41b51/slack_env_v0/
├── run.log                     # Execution logs
├── task_0_agent_logs/          # Cua Agent SDK trajectories (if using cua-agent)
│   └── trajectories/
│       └── 2026-01-07_claudesonnet4_210819_2a23/
│           ├── metadata.json   # Agent turns, model responses, usage
│           ├── turn_000/       # Screenshots and data per turn
│           ├── turn_001/
│           └── ...
└── task_0_trace/               # cua-bench trace dataset
    ├── data-00000-of-00001.arrow  # HuggingFace Dataset format
    ├── dataset_info.json
    └── state.json

cua-bench Trace Format

The cua-bench trace format uses HuggingFace Datasets with Apache Arrow for efficient storage and loading. Each trace contains:

Dataset Schema

{
  "event_name": str,          # Event type: "reset", "agent_step", "agent_action", etc.
  "data_json": str,           # JSON-encoded event metadata
  "data_images": [Image],     # List of screenshots (PIL Images)
  "trajectory_id": str,       # Unique trajectory identifier
  "timestamp": str            # ISO 8601 timestamp
}

Event Types

EventDescriptionImagesMetadata
resetTask setup completeInitial screenshotTask config, index
agent_stepAgent reasoning stepScreenshotStep number, model usage, output
agent_actionAgent performed actionScreenshot after actionAction type, arguments
agent_thinkingModel thinking/responseScreenshotThinking text (truncated)
solveOracle solution completeFinal screenshotCompletion status
evaluateTask evaluation-Evaluation result

Loading Traces Programmatically

from datasets import load_from_disk

# Load a trace dataset
trace = load_from_disk("~/.local/share/cua-bench/runs/96d41b51/slack_env_v0/task_0_trace")

# Iterate through events
for event in trace:
    print(f"Event: {event['event_name']}")
    print(f"Time: {event['timestamp']}")

    # Parse metadata
    import json
    data = json.loads(event['data_json'])
    print(f"Data: {data}")

    # Access screenshots
    for img in event['data_images']:
        img.show()  # PIL Image

Viewing Traces

Use the cua-bench trace viewer to visualize traces in your browser:

View Single Trace

# View a specific task trace
cb trace view <run_id>/<task>_v<variant>

# Example
cb trace view 96d41b51/slack_env_v0

Opens an interactive viewer showing:

  • Timeline of events
  • Screenshots at each step
  • Action details and metadata
  • Agent reasoning (if available)

View Agent Trajectories from a Run

# Open agent trajectories in the cua.ai trajectory viewer
cb trace traj <run_id>

# Example
cb trace traj 96d41b51

Zips and serves the cua-agent trajectory recordings from the run artifacts folder, then opens cua.ai/trajectory-viewer. For runs with multiple tasks, an index page is shown so you can open each session individually.

Recording Custom Events

When building custom agents, you can record your own events to the trace:

from cua_bench.agents.base import BaseAgent, AgentResult

class MyAgent(BaseAgent):
    async def perform_task(
        self,
        task_description: str,
        session,
        logging_dir=None,
        tracer=None,  # Tracer is passed automatically
    ) -> AgentResult:
        # Record custom event
        if tracer:
            screenshot = await session.screenshot()
            tracer.record(
                "my_custom_event",
                {
                    "step": 1,
                    "reasoning": "Analyzing the screen...",
                    "confidence": 0.95,
                },
                [screenshot]
            )

        # Your agent logic...

Was this page helpful?