CLI Reference

v0.2.3

Command	Description
`cb image create <platform>`	Create a base image
`cb image list`	List all images
`cb image info <name>`	Show image details
`cb image clone <src> <dst>`	Clone/fork an image
`cb image delete <name>`	Delete an image
`cb image shell <name>`	Interactive shell into image (protected)

Option	Description
`--download-iso`	Download Windows 11 ISO (~6GB)
`--iso <path>`	Use existing Windows ISO
`--name <name>`	Image name (default: same as platform)
`--winarena-apps`	Install WinArena benchmark apps (Chrome, LibreOffice, VLC, etc.)
`--memory <size>`	Memory (default: 8G)
`--cpus <count>`	CPU cores (default: 8)
`--disk <size>`	Disk size (default: 64G)
`--detach`, `-d`	Run in background
`--force`	Force recreation
`--skip-pull`	Don't pull Docker image (use local)
`--vnc-port <port>`	VNC port (default: auto-allocate from 8006)
`--api-port <port>`	API port (default: auto-allocate from 5000)

Option	Description
`--writable`	Modify golden image directly (dangerous!)
`--vnc-port <port>`	VNC port (default: auto-allocate from 8006)
`--api-port <port>`	API port (default: auto-allocate from 5000)
`--memory <size>`	Memory (default: 8G)
`--cpus <count>`	CPU cores (default: 8)
`--no-kvm`	Disable KVM acceleration
`--detach`, `-d`	Run in background

By default, cb image shell mounts the golden image read-only and uses an overlay for any changes. This protects your base images from accidental modification. Use --writable only when you intentionally want to modify the golden image.

Platform Commands

View available platforms.

cb platform list [options]

Option	Description
`--format <fmt>`	Output format: table, json

Status Commands

Show system status.

cb status

Shows Docker status, KVM availability, running shells, and active runs.

Run Commands

Run tasks with the 2-container architecture (agent container + environment container).

cb run <subcommand> [options]

Command	Description
`cb run task <path>`	Run a single task
`cb run dataset <path>`	Run all tasks in a dataset (parallel)
`cb run list`	List all runs with status
`cb run status <id>`	Show detailed run status
`cb run watch <id>`	Watch a run in real-time
`cb run stop <id>`	Stop/cancel a run
`cb run logs <id>`	View combined logs from a run or session

Task Options

Tasks run asynchronously by default and return immediately with a run ID. Use --wait for synchronous execution.

cb run task <path> [options]

Option	Description
`--wait`, `-w`	Wait for task completion (default: run async)
`--debug`	Auto-allocate debug ports (VNC + API)
`--variant-id <n>`	Task variant index (default: 0)
`--agent <name>`	Agent to use (from `.cua/agents.yaml`)
`--agent-import-path <path>`	Import path for custom agent
`--model <model>`	Model to use (e.g., `anthropic/claude-sonnet-4-20250514`)
`--oracle`	Run with oracle solution (no agent)
`--max-steps <n>`	Maximum steps per task (default: 100)
`--image <name>`	Use specific base image
`--platform <type>`	Platform type: `linux-docker`, `windows-qemu`, `android-qemu`
`--vnc-port <port>`	Expose VNC on host port (auto-allocated if not specified)
`--api-port <port>`	Expose API on host port (auto-allocated if not specified)
`--output-dir <dir>`	Output directory (default: `~/.local/share/cua-bench/runs/<run_id>/`)

Dataset Options

cb run dataset <path> [options]

Option	Description
`--max-parallel <n>`	Maximum parallel task runners (default: 4)
`--max-variants <n>`	Maximum variants per task (default: all)
`--task-filter <pattern>`	Filter tasks by name pattern (glob)
`--agent <name>`	Agent to use (from `.cua/agents.yaml`)
`--model <model>`	Model to use
`--oracle`	Run with oracle solution (no agent)
`--max-steps <n>`	Maximum steps per task (default: 100)
`--image <name>`	Use specific base image
`--platform <type>`	Platform type
`--output-dir <dir>`	Output directory for results

Examples

# Run single task (async by default - returns immediately)
cb run task tasks/click_button --agent cua-agent --model anthropic/claude-sonnet-4-20250514

# Wait for task completion (synchronous)
cb run task tasks/click_button --agent cua-agent --wait

# Run with oracle solution
cb run task tasks/click_button --variant-id 0 --oracle --wait

# Auto-allocate debug ports for live viewing
cb run task tasks/click_button --agent cua-agent --debug
# Output shows: Auto-allocated debug ports: VNC=8006, API=5000

# Run Windows task with specific image
cb run task tasks/winarena --image windows-qemu --agent cua-agent

# Run all tasks in a dataset (4 parallel)
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

# Run dataset with filtering
cb run dataset datasets/cua-bench-basic --task-filter "click*" --max-variants 1

# Monitor runs (async tasks)
cb run list                    # List all runs
cb run watch <run_id>          # Watch progress in real-time
cb run status <run_id>         # Check status
cb run logs <run_id>           # View logs
cb run logs <run_id> --tail 100  # View last 100 lines
cb run stop <run_id>           # Stop a run

Interact Commands

Interactive task mode with visible browser.

cb interact <task> [options]

Option	Description
`--variant-id <id>`	Task variant index (default: 0)
`--dataset <name>`	Dataset name to resolve task from registry
`--dataset-path <path>`	Path to dataset directory containing multiple tasks
`--oracle`	Run the solution after setup
`--max-steps <n>`	Maximum number of env.step() calls before stopping
`--screenshot <path>`	Save screenshot to file
`--trace-out <path>`	If set, start tracing and save dataset to this path on exit
`--view`	Open a trace viewer when done
`--no-wait`	Skip the interactive prompt (useful for SSH/CI testing)

Task Commands

Manage task environments.

cb task <action> [options]

Command	Description
`cb task info <path>`	Show task details (provider, variants)
`cb task list [path]`	List tasks in a directory
`cb task create [path]`	Scaffold a new task environment
`cb task generate "<prompt>"`	Generate a task using Claude

Generate Options

cb task generate "<description>" [options]

Option	Description
`--output <path>`, `-o`	Output directory path. If not provided, auto-generates from prompt.
`--no-interaction`	Skip prompts and run Claude non-interactively

Trace Commands

View and manage trace datasets.

cb trace <action> <id>

Command	Description
`cb trace view <id>`	View a single trace in browser (accepts run_id or session_id)
`cb trace grid <id>`	View all traces in a run as a grid (accepts run_id)

Dataset Commands

Manage datasets and build from outputs.

cb dataset <action> [options]

Command	Description
`cb dataset list`	List available datasets from registry
`cb dataset build <outputs>`	Build a dataset from batch outputs

Build Options

cb dataset build <outputs> [options]

Option	Description
`--save-dir <dir>`	Output directory for datasets
`--mode <mode>`	Processor mode: aguvis-stage-1, gui-r1
`--push-to-hub`	Push to Hugging Face Hub
`--repo-id <id>`	Hugging Face repo ID
`--private`	Create private repo on Hub

Prune Commands

Clean up cua-bench data, images, and Docker resources.

cb prune [options]

When called without options, shows an interactive overview of storage usage. Use flags to clean specific resources.

Option	Description
`--all`, `-a`	Remove everything (images, overlays, runs, docker)
`--images`	Remove only stored images
`--overlays`	Remove only task overlays
`--runs`	Remove only run logs and registry
`--docker`	Remove only docker containers/images
`--dry-run`	Show what would be deleted without deleting
`--force`, `-f`	Skip confirmation prompts

Examples

# Show storage overview
cb prune

# Remove all data with confirmation
cb prune --all

# Remove all data without confirmation
cb prune --all --force

# Preview what would be deleted
cb prune --all --dry-run

# Remove only run logs
cb prune --runs

# Remove docker resources
cb prune --docker

Was this page helpful?

CLI Reference

Image Commands

Create Options

List Options

Shell Options

Platform Commands

Status Commands

Run Commands

Task Options

Dataset Options

Examples

Interact Commands

Task Commands

Generate Options

Trace Commands

Dataset Commands

Build Options

Prune Commands

Examples

On this page