Cua-BenchGuideExamples

Create a Universal Task

Build a cross-platform Minesweeper game that works on QEMU, simulated desktop, and Docker

In this tutorial, you'll create a Minesweeper game task using the Universal GUI API. This task is universal - it runs across QEMU, simulated desktop, and Docker environments without modification.

Time: ~15 minutes Prerequisites: cua-bench installed

What You'll Build

A fully playable Minesweeper game where agents must click cells to reveal them and flag mines. You'll learn:

  • Using launch_window() to create GUI apps
  • Communicating with your GUI via JavaScript
  • Creating task variants with different difficulty levels
  • Evaluating game state for rewards

Step 1: Create Task Directory

mkdir -p tasks/minesweeper/gui
cd tasks/minesweeper

Step 2: Create the Task Configuration

Create main.py:

import cua_bench as cb
from pathlib import Path

pid = None

@cb.tasks_config(split="train")
def load():
    """Generate Minesweeper variants with different difficulty levels."""
    game_configs = [
        (8, 8, 10),   # Easy
        (10, 10, 15), # Medium
        (12, 12, 20), # Hard
    ]

    return [
        cb.Task(
            description=f'Play Minesweeper on a {rows}x{cols} grid with {mines} mines. Click cells to reveal them, right-click to flag mines. Win by revealing all non-mine cells.',
            metadata={
                "rows": rows,
                "cols": cols,
                "mines": mines,
            },
            computer={
                "provider": "simulated",  # Use simulated desktop for preview
                "setup_config": {
                    "os_type": "win11",
                    "width": 800,
                    "height": 800,
                }
            }
        )
        for rows, cols, mines in game_configs
    ]

@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
    """Launch the Minesweeper GUI window."""
    global pid

    # Get game parameters from task config
    rows = task_cfg.metadata["rows"]
    cols = task_cfg.metadata["cols"]
    mines = task_cfg.metadata["mines"]

    # Calculate window size based on grid size
    window_width = cols * 30 + 100
    window_height = rows * 30 + 200

    # Launch window with the game GUI
    pid = await session.launch_window(
        html=(Path(__file__).parent / "gui/index.html").read_text('utf-8'),
        title="Minesweeper",
        width=window_width,
        height=window_height,
    )

    # Initialize the game with the task configuration
    if pid is not None:
        await session.execute_javascript(pid, f"window.initGame({rows}, {cols}, {mines})")

@cb.evaluate_task(split="train")
async def evaluate(task_cfg: cb.Task, session: cb.DesktopSession) -> list[float]:
    """Check if the agent won or provide partial credit."""
    global pid

    if pid is None:
        return [0.0]

    # Check game state via JavaScript
    game_won = await session.execute_javascript(pid, "window.__gameWon")
    game_lost = await session.execute_javascript(pid, "window.__gameLost")

    # Win: 1.0, Loss: 0.0
    if game_won is True:
        return [1.0]
    elif game_lost is True:
        return [0.0]
    else:
        # Partial credit based on revealed cells
        rows = task_cfg.metadata["rows"]
        cols = task_cfg.metadata["cols"]
        mines = task_cfg.metadata["mines"]

        revealed_count = await session.execute_javascript(pid, "window.game.revealedCount")
        total_non_mines = rows * cols - mines

        if revealed_count and total_non_mines > 0:
            return [revealed_count / total_non_mines]
        return [0.0]

@cb.solve_task(split="train")
async def solve(task_cfg: cb.Task, session: cb.DesktopSession):
    """Demonstrate a simple solution strategy."""
    global pid
    import asyncio

    if pid is None:
        return

    rows = task_cfg.metadata["rows"]
    cols = task_cfg.metadata["cols"]

    # Simple strategy: click corner first, then scan left-to-right
    await session.click_element(pid, '[data-row="0"][data-col="0"]')
    await asyncio.sleep(0.5)

    # Continue clicking unrevealed cells
    for r in range(rows):
        for c in range(cols):
            game_state = await session.execute_javascript(pid, "window.__gameState")
            if game_state != "playing":
                break

            is_revealed = await session.execute_javascript(
                pid, f"window.game.revealed[{r}][{c}]"
            )
            is_flagged = await session.execute_javascript(
                pid, f"window.game.flagged[{r}][{c}]"
            )

            if not is_revealed and not is_flagged:
                await session.click_element(pid, f'[data-row="{r}"][data-col="{c}"]')
                await asyncio.sleep(0.2)

        if game_state != "playing":
            break

Step 3: Create the HTML GUI

Create gui/index.html with the Minesweeper game:

<div class="p-4 flex flex-col items-center">
    <div class="mb-4 text-center">
        <h1 class="text-2xl font-bold mb-2">Minesweeper</h1>
        <div class="flex gap-4 justify-center items-center mb-2">
            <div class="text-lg font-mono">🚩 <span id="flag-count">0</span></div>
            <button id="reset-btn" class="px-4 py-2 bg-gray-200 hover:bg-gray-300 rounded">😊 New Game</button>
            <div class="text-lg font-mono">⏱️ <span id="timer">0</span></div>
        </div>
        <div id="status" class="text-sm font-semibold"></div>
    </div>
    <div id="game-board" class="inline-block border-4 border-gray-400 bg-gray-300"></div>

    <style>
        .cell {
            width: 30px;
            height: 30px;
            border: 2px outset #999;
            background: #c0c0c0;
            display: inline-flex;
            align-items: center;
            justify-content: center;
            font-weight: bold;
            font-size: 16px;
            cursor: pointer;
            user-select: none;
        }
        .cell.revealed {
            border: 1px solid #999;
            background: #e0e0e0;
            cursor: default;
        }
        .cell.mine { background: #ff6b6b; }
        .cell.num-1 { color: blue; }
        .cell.num-2 { color: green; }
        .cell.num-3 { color: red; }
    </style>

    <script>
        class Minesweeper {
            constructor(rows, cols, mines) {
                this.rows = rows;
                this.cols = cols;
                this.totalMines = mines;
                this.board = [];
                this.revealed = [];
                this.flagged = [];
                this.revealedCount = 0;
                this.initBoard();
                this.render();
                window.__gameState = 'playing';
                window.__gameWon = false;
                window.__gameLost = false;
            }

            initBoard() {
                for (let r = 0; r < this.rows; r++) {
                    this.board[r] = [];
                    this.revealed[r] = [];
                    this.flagged[r] = [];
                    for (let c = 0; c < this.cols; c++) {
                        this.board[r][c] = 0;
                        this.revealed[r][c] = false;
                        this.flagged[r][c] = false;
                    }
                }
            }

            placeMines(excludeRow, excludeCol) {
                let placed = 0;
                while (placed < this.totalMines) {
                    const r = Math.floor(Math.random() * this.rows);
                    const c = Math.floor(Math.random() * this.cols);
                    if ((r === excludeRow && c === excludeCol) || this.board[r][c] === -1) {
                        continue;
                    }
                    this.board[r][c] = -1;
                    placed++;
                }

                // Calculate numbers
                for (let r = 0; r < this.rows; r++) {
                    for (let c = 0; c < this.cols; c++) {
                        if (this.board[r][c] !== -1) {
                            this.board[r][c] = this.countAdjacentMines(r, c);
                        }
                    }
                }
            }

            countAdjacentMines(row, col) {
                let count = 0;
                for (let dr = -1; dr <= 1; dr++) {
                    for (let dc = -1; dc <= 1; dc++) {
                        if (dr === 0 && dc === 0) continue;
                        const r = row + dr, c = col + dc;
                        if (r >= 0 && r < this.rows && c >= 0 && c < this.cols && this.board[r][c] === -1) {
                            count++;
                        }
                    }
                }
                return count;
            }

            reveal(row, col) {
                if (this.revealed[row][col] || this.flagged[row][col]) return;

                if (!this.firstClickDone) {
                    this.placeMines(row, col);
                    this.firstClickDone = true;
                }

                this.revealed[row][col] = true;
                this.revealedCount++;

                if (this.board[row][col] === -1) {
                    this.lose();
                    return;
                }

                if (this.board[row][col] === 0) {
                    for (let dr = -1; dr <= 1; dr++) {
                        for (let dc = -1; dc <= 1; dc++) {
                            const r = row + dr, c = col + dc;
                            if (r >= 0 && r < this.rows && c >= 0 && c < this.cols && !this.revealed[r][c]) {
                                this.reveal(r, c);
                            }
                        }
                    }
                }

                this.checkWin();
                this.render();
            }

            checkWin() {
                if (this.revealedCount === this.rows * this.cols - this.totalMines) {
                    window.__gameState = 'won';
                    window.__gameWon = true;
                    document.getElementById('status').textContent = '🎉 You Won!';
                }
            }

            lose() {
                window.__gameState = 'lost';
                window.__gameLost = true;
                document.getElementById('status').textContent = '💥 Game Over!';
            }

            render() {
                const board = document.getElementById('game-board');
                board.innerHTML = '';
                board.style.display = 'grid';
                board.style.gridTemplateColumns = `repeat(${this.cols}, 30px)`;

                for (let r = 0; r < this.rows; r++) {
                    for (let c = 0; c < this.cols; c++) {
                        const cell = document.createElement('div');
                        cell.className = 'cell';
                        cell.dataset.row = r;
                        cell.dataset.col = c;

                        if (this.revealed[r][c]) {
                            cell.classList.add('revealed');
                            if (this.board[r][c] === -1) {
                                cell.textContent = '💣';
                                cell.classList.add('mine');
                            } else if (this.board[r][c] > 0) {
                                cell.textContent = this.board[r][c];
                                cell.classList.add(`num-${this.board[r][c]}`);
                            }
                        } else if (this.flagged[r][c]) {
                            cell.textContent = '🚩';
                        }

                        cell.addEventListener('click', () => this.reveal(r, c));
                        board.appendChild(cell);
                    }
                }
            }
        }

        window.initGame = function(rows, cols, mines) {
            window.game = new Minesweeper(rows, cols, mines);
        };
    </script>
</div>

Step 4: Run the Task

Preview Interactively

cb interact tasks/minesweeper --variant-id 0

A browser window will open showing the Minesweeper game in a simulated desktop environment.

Run with Oracle

cb interact tasks/minesweeper --oracle --variant-id 0

Run with Agent

export ANTHROPIC_API_KEY=sk-...
cb run task tasks/minesweeper --agent cua-agent --model anthropic/claude-sonnet-4-20250514

Key Concepts

Window Management: launch_window() creates GUI apps in your tasks

JavaScript Communication: Use execute_javascript() to read/write game state

Task Variants: One codebase generates multiple difficulty levels

Partial Rewards: Reward based on progress, not just win/loss

Why Universal?

This task works across all platforms:

  • Simulated Desktop: Set provider: "simulated" for lightweight browser preview
  • Docker: Use real Linux desktop environments
  • QEMU: Run on actual Windows/Linux VMs

The same HTML GUI and task code runs everywhere.

Next Steps

Was this page helpful?