Recent newsAnnouncing the First Multi-Player Computer-Use — Live from ClawConRead

Data · Benchmarks · Training

100K examples for $5 — then benchmark and train.

HTML simulators across 10+ OS themes, gym-style RL with shaped rewards, deterministic verifiers, and exportable trajectory data.

from cua_bench import GymEnvironment

env = GymEnvironment("messaging-app", provider="simulated")
obs, info = env.reset()

# Generate data across OS themes
for theme in ["macos", "windows", "ubuntu", "android"]:
    env.set_theme(theme)
    obs, reward, done, info = env.step(action)

CuaBench 2.0 results

RankModelScore
1Claude Haiku 4.568.4%
2Claude Sonnet 4.559.1%
3UI-TARS-258.0%
4GPT-5.257.8%
5Gemini CUA54.2%

Generate data, benchmark, and train

10+ OS themes

HTML simulators render pixel-perfect UIs across macOS, Windows, Ubuntu, Android, and more.

$5 per 100K examples

Record a trajectory once, replot it to every theme. 10,000x cheaper than manual annotation.

Gym-style RL

OpenAI Gym interface with shaped rewards — 10x faster than sparse-reward setups.

Deterministic verifiers

Programmatic reward functions with oracle solutions for reproducible evals.

Trajectory export

Export runs as Parquet with screenshots, actions, rewards, and traces.

4 benchmark datasets

cua-bench-basic, cua-bench-real, OSWorld, and WindowsAgentArena.