Base Images

Base images are reproducible environments for running benchmarks. Each image type provides a consistent, isolated desktop or mobile environment. Images are protected by overlays during benchmark runs, ensuring each task starts from a clean state.

Image Types

Type	Command	Description	Requirements
`linux-docker`	`cb image create linux-docker`	Linux GUI container (XFCE)	Docker
`linux-qemu`	`cb image create linux-qemu`	Linux VM with QEMU/KVM	Docker + KVM
`windows-qemu`	`cb image create windows-qemu --download-iso`	Windows 11 VM	Docker + KVM
`android-qemu`	`cb image create android-qemu`	Android VM	Docker + KVM
`macos-lume`	`cb image create macos-lume`	macOS VM via Lume	Apple Silicon

Quick Setup

linux-docker (Fastest)

Container-based Linux GUI. No KVM required, works anywhere Docker runs.

# Create image (~1 minute)
cb image create linux-docker

# Start interactive shell (protected by overlay)
cb image shell linux-docker

# Access via browser at http://localhost:8006
# Press Ctrl+C to stop

windows-qemu (Windows Arena)

Full Windows 11 VM with QEMU/KVM. Required for Windows Arena benchmarks.

# Create image (~1-2 hours, downloads Windows 11 ISO)
cb image create windows-qemu --download-iso

# With WinArena benchmark apps pre-installed (Chrome, LibreOffice, VLC, etc.)
cb image create windows-qemu --download-iso --winarena-apps

# With custom resources
cb image create windows-qemu --download-iso --memory 16G --cpus 8 --disk 128G

# Run in background (detached)
cb image create windows-qemu --download-iso --detach

Requirements

Windows QEMU requires an x86_64 Linux host with KVM support. For cloud VMs, ensure nested virtualization is enabled.

macos-lume (Apple Silicon only)

macOS VM using Apple's Virtualization framework via Lume.

# Install Lume first
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"

# Create macOS image
cb image create macos-lume --version sonoma

Interactive Shells

Use cb image shell to interactively explore or debug images. By default, shells use an overlay to protect the golden image.

Protected Mode (Default)

# Start shell with overlay protection
cb image shell windows-qemu

# Access VNC at http://localhost:8006
# Any changes are discarded when you exit

Writable Mode (Image Authoring)

Use --writable when you need to permanently modify an image (e.g., installing apps, configuring settings):

# Clone image first (recommended workflow)
cb image clone windows-qemu windows-custom

# Modify the clone
cb image shell windows-custom --writable
# Install apps, configure settings...
# Changes persist to the image

# Use your custom image in benchmarks
cb run task my-task --image windows-custom

Warning

--writable mode directly modifies the golden image. Always clone first to preserve your original base image.

Shell Options

cb image shell <name> [options]

Option	Description
`--writable`	Modify golden image directly (dangerous!)
`--vnc-port <port>`	VNC port (auto-allocated if not specified)
`--api-port <port>`	API port (auto-allocated if not specified)
`--memory <size>`	Memory (default: 8G)
`--cpus <count>`	CPU cores (default: 8)
`--no-kvm`	Disable KVM acceleration
`--detach`, `-d`	Run in background

Auto Port Allocation

Ports are automatically allocated to avoid conflicts when not specified. The CLI finds available ports starting from the defaults (8006 for VNC, 5000 for API).

Image Management

Images are stored in ~/.local/share/cua-bench/images/. Use the cb image commands to manage them:

# List all images
cb image list
cb image list --format json

# Show detailed info
cb image info windows-qemu

# Clone for customization
cb image clone windows-qemu windows-custom

# Delete an image
cb image delete windows-custom

Port Mappings

Different image types use different internal ports:

Image Type	VNC Port (internal)	API Port (internal)
`linux-docker`	6901	8000
`windows-qemu`	8006	5000
`linux-qemu`	8006	5000
`android-qemu`	8006	5000

When starting shells, you can map these to custom external ports:

cb image shell linux-docker --vnc-port 8080 --api-port 9000

Storage Structure

Cua-Bench follows the XDG Base Directory specification:

~/.local/share/cua-bench/           # Data directory
├── images/                         # Base images (golden)
│   ├── linux-docker/
│   ├── windows-qemu/
│   │   ├── windows.boot            # Marker file (setup complete)
│   │   └── data/                   # VM disk files
│   └── ...
├── runs/                           # Task run outputs (default location)
│   └── <run_id>/                   # Each run gets its own directory
│       ├── run.log                 # Execution logs
│       └── ...                     # Results, traces, etc.
├── overlays/                       # Task overlays (ephemeral)
│   ├── shell-windows-qemu/         # Shell overlay
│   └── task-abc123/                # Task run overlay
└── windows.iso                     # Downloaded Windows ISO

~/.local/state/cua-bench/           # State directory
├── images.json                     # Image registry
└── runs.json                       # Run session tracking

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              cua-bench                                   │
├─────────────────────────────────┬───────────────────────────────────────┤
│           LOCAL                 │              CLOUD                     │
│   (Development & Authoring)     │         (Execution & Scale)            │
├─────────────────────────────────┼───────────────────────────────────────┤
│                                 │                                        │
│  cb image create windows-qemu   │  cb run dataset --on cloud             │
│  cb image shell                 │  Parallel execution                    │
│  cb interact                    │  Agent orchestration                   │
│  cb image clone                 │  Result aggregation                    │
│                                 │                                        │
│  ↓                              │  ↑                                     │
│  Create base images             │  Run evals at scale                    │
│  Develop benchmark tasks        │  100s of parallel VMs                  │
│  Debug & iterate                │  Automatic infrastructure              │
│                                 │                                        │
└─────────────────────────────────┴───────────────────────────────────────┘

Local is the authoring environment for images and tasks. Cloud handles scale.

Image Protection

During benchmark runs, golden images are protected by an overlay system:

Golden image mounted read-only at /golden
Worker overlay mounted writable at /storage
QEMU creates a QCOW2 overlay on top

This ensures:

Each task starts from a pristine state
Golden images are never accidentally modified
Failed tasks don't contaminate subsequent runs

Troubleshooting

KVM not available

# Check if KVM is available
ls -la /dev/kvm

# If not available on cloud VM, enable nested virtualization
# or use linux-docker which doesn't require KVM

Docker not running

# Start Docker
sudo systemctl start docker

# Or on macOS
open -a Docker

Port already in use

Ports are now auto-allocated by default, so conflicts are rare. If you need specific ports:

# Use custom ports
cb image shell linux-docker --vnc-port 9000 --api-port 9001

Was this page helpful?

Base Images

On this page