Cua-BenchGuideAdvanced

Base Images

Setting up reproducible VM environments for benchmarks

Base images are reproducible environments for running benchmarks. Each image type provides a consistent, isolated desktop or mobile environment. Images are protected by overlays during benchmark runs, ensuring each task starts from a clean state.

Image Types

TypeCommandDescriptionRequirements
linux-dockercb image create linux-dockerLinux GUI container (XFCE)Docker
linux-qemucb image create linux-qemuLinux VM with QEMU/KVMDocker + KVM
windows-qemucb image create windows-qemu --download-isoWindows 11 VMDocker + KVM
android-qemucb image create android-qemuAndroid VMDocker + KVM
macos-lumecb image create macos-lumemacOS VM via LumeApple Silicon

Quick Setup

linux-docker (Fastest)

Container-based Linux GUI. No KVM required, works anywhere Docker runs.

# Create image (~1 minute)
cb image create linux-docker

# Start interactive shell (protected by overlay)
cb image shell linux-docker

# Access via browser at http://localhost:8006
# Press Ctrl+C to stop

windows-qemu (Windows Arena)

Full Windows 11 VM with QEMU/KVM. Required for Windows Arena benchmarks.

# Create image (~1-2 hours, downloads Windows 11 ISO)
cb image create windows-qemu --download-iso

# With WinArena benchmark apps pre-installed (Chrome, LibreOffice, VLC, etc.)
cb image create windows-qemu --download-iso --winarena-apps

# With custom resources
cb image create windows-qemu --download-iso --memory 16G --cpus 8 --disk 128G

# Run in background (detached)
cb image create windows-qemu --download-iso --detach

Requirements

Windows QEMU requires an x86_64 Linux host with KVM support. For cloud VMs, ensure nested virtualization is enabled.

macos-lume (Apple Silicon only)

macOS VM using Apple's Virtualization framework via Lume.

# Install Lume first
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"

# Create macOS image
cb image create macos-lume --version sonoma

Interactive Shells

Use cb image shell to interactively explore or debug images. By default, shells use an overlay to protect the golden image.

Protected Mode (Default)

# Start shell with overlay protection
cb image shell windows-qemu

# Access VNC at http://localhost:8006
# Any changes are discarded when you exit

Writable Mode (Image Authoring)

Use --writable when you need to permanently modify an image (e.g., installing apps, configuring settings):

# Clone image first (recommended workflow)
cb image clone windows-qemu windows-custom

# Modify the clone
cb image shell windows-custom --writable
# Install apps, configure settings...
# Changes persist to the image

# Use your custom image in benchmarks
cb run task my-task --image windows-custom

Warning

--writable mode directly modifies the golden image. Always clone first to preserve your original base image.

Shell Options

cb image shell <name> [options]
OptionDescription
--writableModify golden image directly (dangerous!)
--vnc-port <port>VNC port (auto-allocated if not specified)
--api-port <port>API port (auto-allocated if not specified)
--memory <size>Memory (default: 8G)
--cpus <count>CPU cores (default: 8)
--no-kvmDisable KVM acceleration
--detach, -dRun in background

Auto Port Allocation

Ports are automatically allocated to avoid conflicts when not specified. The CLI finds available ports starting from the defaults (8006 for VNC, 5000 for API).

Image Management

Images are stored in ~/.local/share/cua-bench/images/. Use the cb image commands to manage them:

# List all images
cb image list
cb image list --format json

# Show detailed info
cb image info windows-qemu

# Clone for customization
cb image clone windows-qemu windows-custom

# Delete an image
cb image delete windows-custom

Port Mappings

Different image types use different internal ports:

Image TypeVNC Port (internal)API Port (internal)
linux-docker69018000
windows-qemu80065000
linux-qemu80065000
android-qemu80065000

When starting shells, you can map these to custom external ports:

cb image shell linux-docker --vnc-port 8080 --api-port 9000

Storage Structure

Cua-Bench follows the XDG Base Directory specification:

~/.local/share/cua-bench/           # Data directory
├── images/                         # Base images (golden)
│   ├── linux-docker/
│   ├── windows-qemu/
│   │   ├── windows.boot            # Marker file (setup complete)
│   │   └── data/                   # VM disk files
│   └── ...
├── runs/                           # Task run outputs (default location)
│   └── <run_id>/                   # Each run gets its own directory
│       ├── run.log                 # Execution logs
│       └── ...                     # Results, traces, etc.
├── overlays/                       # Task overlays (ephemeral)
│   ├── shell-windows-qemu/         # Shell overlay
│   └── task-abc123/                # Task run overlay
└── windows.iso                     # Downloaded Windows ISO

~/.local/state/cua-bench/           # State directory
├── images.json                     # Image registry
└── runs.json                       # Run session tracking

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              cua-bench                                   │
├─────────────────────────────────┬───────────────────────────────────────┤
│           LOCAL                 │              CLOUD                     │
│   (Development & Authoring)     │         (Execution & Scale)            │
├─────────────────────────────────┼───────────────────────────────────────┤
│                                 │                                        │
│  cb image create windows-qemu   │  cb run dataset --on cloud             │
│  cb image shell                 │  Parallel execution                    │
│  cb interact                    │  Agent orchestration                   │
│  cb image clone                 │  Result aggregation                    │
│                                 │                                        │
│  ↓                              │  ↑                                     │
│  Create base images             │  Run evals at scale                    │
│  Develop benchmark tasks        │  100s of parallel VMs                  │
│  Debug & iterate                │  Automatic infrastructure              │
│                                 │                                        │
└─────────────────────────────────┴───────────────────────────────────────┘

Local is the authoring environment for images and tasks. Cloud handles scale.

Image Protection

During benchmark runs, golden images are protected by an overlay system:

  1. Golden image mounted read-only at /golden
  2. Worker overlay mounted writable at /storage
  3. QEMU creates a QCOW2 overlay on top

This ensures:

  • Each task starts from a pristine state
  • Golden images are never accidentally modified
  • Failed tasks don't contaminate subsequent runs

Troubleshooting

KVM not available

# Check if KVM is available
ls -la /dev/kvm

# If not available on cloud VM, enable nested virtualization
# or use linux-docker which doesn't require KVM

Docker not running

# Start Docker
sudo systemctl start docker

# Or on macOS
open -a Docker

Port already in use

Ports are now auto-allocated by default, so conflicts are rare. If you need specific ports:

# Use custom ports
cb image shell linux-docker --vnc-port 9000 --api-port 9001

Was this page helpful?