Base Images
Setting up reproducible VM environments for benchmarks
Base images are reproducible environments for running benchmarks. Each image type provides a consistent, isolated desktop or mobile environment. Images are protected by overlays during benchmark runs, ensuring each task starts from a clean state.
Image Types
| Type | Command | Description | Requirements |
|---|---|---|---|
linux-docker | cb image create linux-docker | Linux GUI container (XFCE) | Docker |
linux-qemu | cb image create linux-qemu | Linux VM with QEMU/KVM | Docker + KVM |
windows-qemu | cb image create windows-qemu --download-iso | Windows 11 VM | Docker + KVM |
android-qemu | cb image create android-qemu | Android VM | Docker + KVM |
macos-lume | cb image create macos-lume | macOS VM via Lume | Apple Silicon |
Quick Setup
linux-docker (Fastest)
Container-based Linux GUI. No KVM required, works anywhere Docker runs.
# Create image (~1 minute)
cb image create linux-docker
# Start interactive shell (protected by overlay)
cb image shell linux-docker
# Access via browser at http://localhost:8006
# Press Ctrl+C to stopwindows-qemu (Windows Arena)
Full Windows 11 VM with QEMU/KVM. Required for Windows Arena benchmarks.
# Create image (~1-2 hours, downloads Windows 11 ISO)
cb image create windows-qemu --download-iso
# With WinArena benchmark apps pre-installed (Chrome, LibreOffice, VLC, etc.)
cb image create windows-qemu --download-iso --winarena-apps
# With custom resources
cb image create windows-qemu --download-iso --memory 16G --cpus 8 --disk 128G
# Run in background (detached)
cb image create windows-qemu --download-iso --detachRequirements
Windows QEMU requires an x86_64 Linux host with KVM support. For cloud VMs, ensure nested virtualization is enabled.
macos-lume (Apple Silicon only)
macOS VM using Apple's Virtualization framework via Lume.
# Install Lume first
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
# Create macOS image
cb image create macos-lume --version sonomaInteractive Shells
Use cb image shell to interactively explore or debug images. By default, shells use an overlay to protect the golden image.
Protected Mode (Default)
# Start shell with overlay protection
cb image shell windows-qemu
# Access VNC at http://localhost:8006
# Any changes are discarded when you exitWritable Mode (Image Authoring)
Use --writable when you need to permanently modify an image (e.g., installing apps, configuring settings):
# Clone image first (recommended workflow)
cb image clone windows-qemu windows-custom
# Modify the clone
cb image shell windows-custom --writable
# Install apps, configure settings...
# Changes persist to the image
# Use your custom image in benchmarks
cb run task my-task --image windows-customWarning
--writable mode directly modifies the golden image. Always clone first to preserve your original base image.
Shell Options
cb image shell <name> [options]| Option | Description |
|---|---|
--writable | Modify golden image directly (dangerous!) |
--vnc-port <port> | VNC port (auto-allocated if not specified) |
--api-port <port> | API port (auto-allocated if not specified) |
--memory <size> | Memory (default: 8G) |
--cpus <count> | CPU cores (default: 8) |
--no-kvm | Disable KVM acceleration |
--detach, -d | Run in background |
Auto Port Allocation
Ports are automatically allocated to avoid conflicts when not specified. The CLI finds available ports starting from the defaults (8006 for VNC, 5000 for API).
Image Management
Images are stored in ~/.local/share/cua-bench/images/. Use the cb image commands to manage them:
# List all images
cb image list
cb image list --format json
# Show detailed info
cb image info windows-qemu
# Clone for customization
cb image clone windows-qemu windows-custom
# Delete an image
cb image delete windows-customPort Mappings
Different image types use different internal ports:
| Image Type | VNC Port (internal) | API Port (internal) |
|---|---|---|
linux-docker | 6901 | 8000 |
windows-qemu | 8006 | 5000 |
linux-qemu | 8006 | 5000 |
android-qemu | 8006 | 5000 |
When starting shells, you can map these to custom external ports:
cb image shell linux-docker --vnc-port 8080 --api-port 9000Storage Structure
Cua-Bench follows the XDG Base Directory specification:
~/.local/share/cua-bench/ # Data directory
├── images/ # Base images (golden)
│ ├── linux-docker/
│ ├── windows-qemu/
│ │ ├── windows.boot # Marker file (setup complete)
│ │ └── data/ # VM disk files
│ └── ...
├── runs/ # Task run outputs (default location)
│ └── <run_id>/ # Each run gets its own directory
│ ├── run.log # Execution logs
│ └── ... # Results, traces, etc.
├── overlays/ # Task overlays (ephemeral)
│ ├── shell-windows-qemu/ # Shell overlay
│ └── task-abc123/ # Task run overlay
└── windows.iso # Downloaded Windows ISO
~/.local/state/cua-bench/ # State directory
├── images.json # Image registry
└── runs.json # Run session trackingArchitecture
┌─────────────────────────────────────────────────────────────────────────┐
│ cua-bench │
├─────────────────────────────────┬───────────────────────────────────────┤
│ LOCAL │ CLOUD │
│ (Development & Authoring) │ (Execution & Scale) │
├─────────────────────────────────┼───────────────────────────────────────┤
│ │ │
│ cb image create windows-qemu │ cb run dataset --on cloud │
│ cb image shell │ Parallel execution │
│ cb interact │ Agent orchestration │
│ cb image clone │ Result aggregation │
│ │ │
│ ↓ │ ↑ │
│ Create base images │ Run evals at scale │
│ Develop benchmark tasks │ 100s of parallel VMs │
│ Debug & iterate │ Automatic infrastructure │
│ │ │
└─────────────────────────────────┴───────────────────────────────────────┘Local is the authoring environment for images and tasks. Cloud handles scale.
Image Protection
During benchmark runs, golden images are protected by an overlay system:
- Golden image mounted read-only at
/golden - Worker overlay mounted writable at
/storage - QEMU creates a QCOW2 overlay on top
This ensures:
- Each task starts from a pristine state
- Golden images are never accidentally modified
- Failed tasks don't contaminate subsequent runs
Troubleshooting
KVM not available
# Check if KVM is available
ls -la /dev/kvm
# If not available on cloud VM, enable nested virtualization
# or use linux-docker which doesn't require KVMDocker not running
# Start Docker
sudo systemctl start docker
# Or on macOS
open -a DockerPort already in use
Ports are now auto-allocated by default, so conflicts are rare. If you need specific ports:
# Use custom ports
cb image shell linux-docker --vnc-port 9000 --api-port 9001Was this page helpful?