Generate Tasks with AI

In this tutorial, you'll use AI to generate complete task environments from natural language descriptions. This is the fastest way to create new tasks.

Time: ~10 minutes Prerequisites: Claude Code OAuth token

What You'll Build

An AI-generated color picker task where the agent must select a specific color. You'll learn:

Using cb task generate to create tasks
Reviewing and understanding generated code
Iterating on AI-generated tasks
Manual refinement techniques

How It Works

The cb task generate command:

Creates a starter task template
Launches Claude Code in interactive mode
Claude modifies the template based on your description
You review, test, and refine the result

┌──────────────────────────────────────────────────────────────┐
│  cb task generate "Create a color picker..."                 │
│                           │                                  │
│                           ▼                                  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Starter Template                                       │ │
│  │  ├── main.py (decorators)                              │ │
│  │  ├── gui/index.html                                    │ │
│  │  └── pyproject.toml                                    │ │
│  └────────────────────────────────────────────────────────┘ │
│                           │                                  │
│                           ▼                                  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Claude Code                                            │ │
│  │  Modifies template based on your prompt                │ │
│  └────────────────────────────────────────────────────────┘ │
│                           │                                  │
│                           ▼                                  │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Generated Task                                         │ │
│  │  ├── main.py (customized)                              │ │
│  │  ├── gui/index.html (customized)                       │ │
│  │  └── pyproject.toml (metadata)                         │ │
│  └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Step 1: Set Up Authentication

Get your Claude Code OAuth token and set it as an environment variable:

export CLAUDE_CODE_OAUTH_TOKEN=your-token-here

Getting a Token

Contact your administrator or visit the Claude Code documentation to obtain an OAuth token.

Step 2: Generate a Task

Run the generate command with a description:

cb task generate "Create a color picker task where the user must select the color red from a palette of 6 colors"

Claude Code will start an interactive session. You'll see it:

Read the starter template
Modify main.py with task logic
Create gui/index.html with the color picker UI
Update pyproject.toml with metadata

Step 3: Review Generated Files

After generation, examine the created files:

ls tasks/color-picker-env/
# main.py  gui/  pyproject.toml

Generated main.py

The generated main.py should look something like:

import cua_bench as cb
from pathlib import Path

pid = None

@cb.tasks_config(split="train")
def load():
    return [
        cb.Task(
            description="Select the red color from the color palette.",
            metadata={"target_color": "red"},
            computer={
                "provider": "simulated",
                "setup_config": {
                    "os_type": "macos",
                    "width": 512,
                    "height": 512,
                }
            }
        )
    ]


@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
    global pid
    html_content = (Path(__file__).parent / "gui/index.html").read_text()
    pid = await session.launch_window(
        html=html_content,
        title="Color Picker",
        width=400,
        height=300,
    )


@cb.evaluate_task(split="train")
async def evaluate(task_cfg: cb.Task, session: cb.DesktopSession) -> list[float]:
    global pid
    if pid is None:
        return [0.0]

    selected = await session.execute_javascript(pid, "window.__selectedColor")
    target = task_cfg.metadata.get("target_color", "red")

    return [1.0] if selected == target else [0.0]


@cb.solve_task(split="train")
async def solve(task_cfg: cb.Task, session: cb.DesktopSession):
    global pid
    if pid is None:
        return

    await session.click_element(pid, '[data-color="red"]')

Generated gui/index.html

<!DOCTYPE html>
<html>
<head>
    <style>
        body {
            font-family: system-ui;
            display: flex;
            flex-direction: column;
            align-items: center;
            padding: 20px;
        }

        h2 { margin-bottom: 20px; }

        .palette {
            display: grid;
            grid-template-columns: repeat(3, 1fr);
            gap: 10px;
        }

        .color-box {
            width: 80px;
            height: 80px;
            border: 3px solid #ccc;
            border-radius: 8px;
            cursor: pointer;
        }

        .color-box:hover { transform: scale(1.05); }
        .color-box.selected { border-color: #000; }
    </style>
</head>
<body>
    <h2>Select the red color</h2>
    <div class="palette">
        <div class="color-box" data-color="red" style="background: #e74c3c;"></div>
        <div class="color-box" data-color="blue" style="background: #3498db;"></div>
        <div class="color-box" data-color="green" style="background: #2ecc71;"></div>
        <div class="color-box" data-color="yellow" style="background: #f1c40f;"></div>
        <div class="color-box" data-color="purple" style="background: #9b59b6;"></div>
        <div class="color-box" data-color="orange" style="background: #e67e22;"></div>
    </div>

    <script>
        window.__selectedColor = null;

        document.querySelectorAll('.color-box').forEach(box => {
            box.addEventListener('click', function() {
                document.querySelectorAll('.color-box').forEach(b =>
                    b.classList.remove('selected')
                );
                this.classList.add('selected');
                window.__selectedColor = this.dataset.color;
            });
        });
    </script>
</body>
</html>

Step 4: Test the Generated Task

Run Interactively

cb interact tasks/color-picker-env --variant-id 0

Try clicking different colors and see how the UI responds.

Run with Oracle Solution

cb interact tasks/color-picker-env --oracle --no-wait

Note: For headless servers, use xvfb-run:
xvfb-run -a cb interact tasks/color-picker-env --oracle --no-wait

Run with Agent

cb interact tasks/color-picker-env --variant-id 0
# Agent will be prompted interactively

Step 5: Iterate on the Task

If the generated task isn't quite right, you can:

Option A: Regenerate with Better Prompt

cb task generate "Create a color picker with 9 colors in a 3x3 grid. The target color should be randomly selected from the palette. Show the target color name prominently above the grid."

Option B: Continue in Claude Code

Run the generate command again and ask Claude to modify:

cb task generate "Modify the existing color picker to add a timer that gives bonus points for fast selections"

Edit the generated files directly:

# Add more task variants
@cb.tasks_config(split="train")
def load():
    colors = ["red", "blue", "green", "yellow", "purple", "orange"]

    return [
        cb.Task(
            description=f"Select the {color} color from the palette.",
            metadata={"target_color": color},
            computer={...}
        )
        for color in colors
    ]

Step 6: Generate More Complex Tasks

Multi-Step Task

cb task generate "Create a todo list app where the user must:
1. Add a task called 'Buy groceries'
2. Mark it as complete
3. Delete the task
Track which steps are completed for partial rewards."

Form-Based Task

cb task generate "Create a registration form with fields for name, email, and password.
The agent must fill in the form with valid data and submit it.
Validate that email contains @ and password is at least 8 characters."

Game Task

cb task generate "Create a simple memory card matching game with 8 cards (4 pairs).
The agent must match all pairs. Give partial reward based on number of pairs found."

Tips for Good Prompts

Be Specific

# Good
cb task generate "Create a dropdown menu with 5 options (Apple, Banana, Cherry, Date, Elderberry). The agent must select 'Cherry'."

# Too vague
cb task generate "Create a dropdown task"

Include Success Criteria

# Good
cb task generate "Create a slider that goes from 0 to 100. The agent must set it to exactly 75. Give reward of 1.0 for exact match, 0.5 for within 5 points."

# Missing criteria
cb task generate "Create a slider task"

Specify Visual Requirements

# Good
cb task generate "Create a dark-themed calculator with large buttons (at least 50x50 pixels) and clear labels."

# No visual guidance
cb task generate "Create a calculator"

What Claude Can and Can't Do

Claude CAN:

Create HTML/CSS/JavaScript UIs
Implement game logic
Write evaluation functions
Create multiple task variants
Add partial reward calculations

Claude CANNOT:

Access external APIs at generation time
Create native application tasks (use templates for those)
Generate images (use placeholder colors/text)
Test the task (you need to do this)

Complete Workflow Example

# 1. Generate initial task
cb task generate "Create a simple quiz with 3 multiple choice questions about Python"

# 2. Test interactively
cb interact tasks/python-quiz-env --variant-id 0

# 3. Verify oracle works
cb interact tasks/python-quiz-env --oracle --no-wait
# On headless servers: xvfb-run -a cb interact tasks/python-quiz-env --oracle --no-wait

# 4. Test with agent
cb interact tasks/python-quiz-env --variant-id 0

# 5. Manually refine if needed
# Edit tasks/python-quiz-env/main.py

# 6. Verify changes still work
cb interact tasks/python-quiz-env --oracle --no-wait

Next Steps

Learn about Task Structure in depth
Explore Simulated Desktop for visual variants
See Creating Private Tasks for advanced patterns

Was this page helpful?