Generate Tasks with AI
Use Claude to create tasks from natural language descriptions
In this tutorial, you'll use AI to generate complete task environments from natural language descriptions. This is the fastest way to create new tasks.
Time: ~10 minutes Prerequisites: Claude Code OAuth token
What You'll Build
An AI-generated color picker task where the agent must select a specific color. You'll learn:
- Using
cb task generateto create tasks - Reviewing and understanding generated code
- Iterating on AI-generated tasks
- Manual refinement techniques
How It Works
The cb task generate command:
- Creates a starter task template
- Launches Claude Code in interactive mode
- Claude modifies the template based on your description
- You review, test, and refine the result
┌──────────────────────────────────────────────────────────────┐
│ cb task generate "Create a color picker..." │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Starter Template │ │
│ │ ├── main.py (decorators) │ │
│ │ ├── gui/index.html │ │
│ │ └── pyproject.toml │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Claude Code │ │
│ │ Modifies template based on your prompt │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Generated Task │ │
│ │ ├── main.py (customized) │ │
│ │ ├── gui/index.html (customized) │ │
│ │ └── pyproject.toml (metadata) │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘Step 1: Set Up Authentication
Get your Claude Code OAuth token and set it as an environment variable:
export CLAUDE_CODE_OAUTH_TOKEN=your-token-hereGetting a Token
Contact your administrator or visit the Claude Code documentation to obtain an OAuth token.
Step 2: Generate a Task
Run the generate command with a description:
cb task generate "Create a color picker task where the user must select the color red from a palette of 6 colors"Claude Code will start an interactive session. You'll see it:
- Read the starter template
- Modify
main.pywith task logic - Create
gui/index.htmlwith the color picker UI - Update
pyproject.tomlwith metadata
Step 3: Review Generated Files
After generation, examine the created files:
ls tasks/color-picker-env/
# main.py gui/ pyproject.tomlGenerated main.py
The generated main.py should look something like:
import cua_bench as cb
from pathlib import Path
pid = None
@cb.tasks_config(split="train")
def load():
return [
cb.Task(
description="Select the red color from the color palette.",
metadata={"target_color": "red"},
computer={
"provider": "simulated",
"setup_config": {
"os_type": "macos",
"width": 512,
"height": 512,
}
}
)
]
@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
global pid
html_content = (Path(__file__).parent / "gui/index.html").read_text()
pid = await session.launch_window(
html=html_content,
title="Color Picker",
width=400,
height=300,
)
@cb.evaluate_task(split="train")
async def evaluate(task_cfg: cb.Task, session: cb.DesktopSession) -> list[float]:
global pid
if pid is None:
return [0.0]
selected = await session.execute_javascript(pid, "window.__selectedColor")
target = task_cfg.metadata.get("target_color", "red")
return [1.0] if selected == target else [0.0]
@cb.solve_task(split="train")
async def solve(task_cfg: cb.Task, session: cb.DesktopSession):
global pid
if pid is None:
return
await session.click_element(pid, '[data-color="red"]')Generated gui/index.html
<!DOCTYPE html>
<html>
<head>
<style>
body {
font-family: system-ui;
display: flex;
flex-direction: column;
align-items: center;
padding: 20px;
}
h2 { margin-bottom: 20px; }
.palette {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 10px;
}
.color-box {
width: 80px;
height: 80px;
border: 3px solid #ccc;
border-radius: 8px;
cursor: pointer;
}
.color-box:hover { transform: scale(1.05); }
.color-box.selected { border-color: #000; }
</style>
</head>
<body>
<h2>Select the red color</h2>
<div class="palette">
<div class="color-box" data-color="red" style="background: #e74c3c;"></div>
<div class="color-box" data-color="blue" style="background: #3498db;"></div>
<div class="color-box" data-color="green" style="background: #2ecc71;"></div>
<div class="color-box" data-color="yellow" style="background: #f1c40f;"></div>
<div class="color-box" data-color="purple" style="background: #9b59b6;"></div>
<div class="color-box" data-color="orange" style="background: #e67e22;"></div>
</div>
<script>
window.__selectedColor = null;
document.querySelectorAll('.color-box').forEach(box => {
box.addEventListener('click', function() {
document.querySelectorAll('.color-box').forEach(b =>
b.classList.remove('selected')
);
this.classList.add('selected');
window.__selectedColor = this.dataset.color;
});
});
</script>
</body>
</html>Step 4: Test the Generated Task
Run Interactively
cb interact tasks/color-picker-env --variant-id 0Try clicking different colors and see how the UI responds.
Run with Oracle Solution
cb interact tasks/color-picker-env --oracle --no-waitNote: For headless servers, use
xvfb-run:xvfb-run -a cb interact tasks/color-picker-env --oracle --no-wait
Run with Agent
cb interact tasks/color-picker-env --variant-id 0
# Agent will be prompted interactivelyStep 5: Iterate on the Task
If the generated task isn't quite right, you can:
Option A: Regenerate with Better Prompt
cb task generate "Create a color picker with 9 colors in a 3x3 grid. The target color should be randomly selected from the palette. Show the target color name prominently above the grid."Option B: Continue in Claude Code
Run the generate command again and ask Claude to modify:
cb task generate "Modify the existing color picker to add a timer that gives bonus points for fast selections"Option C: Manual Refinement
Edit the generated files directly:
# Add more task variants
@cb.tasks_config(split="train")
def load():
colors = ["red", "blue", "green", "yellow", "purple", "orange"]
return [
cb.Task(
description=f"Select the {color} color from the palette.",
metadata={"target_color": color},
computer={...}
)
for color in colors
]Step 6: Generate More Complex Tasks
Multi-Step Task
cb task generate "Create a todo list app where the user must:
1. Add a task called 'Buy groceries'
2. Mark it as complete
3. Delete the task
Track which steps are completed for partial rewards."Form-Based Task
cb task generate "Create a registration form with fields for name, email, and password.
The agent must fill in the form with valid data and submit it.
Validate that email contains @ and password is at least 8 characters."Game Task
cb task generate "Create a simple memory card matching game with 8 cards (4 pairs).
The agent must match all pairs. Give partial reward based on number of pairs found."Tips for Good Prompts
Be Specific
# Good
cb task generate "Create a dropdown menu with 5 options (Apple, Banana, Cherry, Date, Elderberry). The agent must select 'Cherry'."
# Too vague
cb task generate "Create a dropdown task"Include Success Criteria
# Good
cb task generate "Create a slider that goes from 0 to 100. The agent must set it to exactly 75. Give reward of 1.0 for exact match, 0.5 for within 5 points."
# Missing criteria
cb task generate "Create a slider task"Specify Visual Requirements
# Good
cb task generate "Create a dark-themed calculator with large buttons (at least 50x50 pixels) and clear labels."
# No visual guidance
cb task generate "Create a calculator"What Claude Can and Can't Do
Claude CAN:
- Create HTML/CSS/JavaScript UIs
- Implement game logic
- Write evaluation functions
- Create multiple task variants
- Add partial reward calculations
Claude CANNOT:
- Access external APIs at generation time
- Create native application tasks (use templates for those)
- Generate images (use placeholder colors/text)
- Test the task (you need to do this)
Complete Workflow Example
# 1. Generate initial task
cb task generate "Create a simple quiz with 3 multiple choice questions about Python"
# 2. Test interactively
cb interact tasks/python-quiz-env --variant-id 0
# 3. Verify oracle works
cb interact tasks/python-quiz-env --oracle --no-wait
# On headless servers: xvfb-run -a cb interact tasks/python-quiz-env --oracle --no-wait
# 4. Test with agent
cb interact tasks/python-quiz-env --variant-id 0
# 5. Manually refine if needed
# Edit tasks/python-quiz-env/main.py
# 6. Verify changes still work
cb interact tasks/python-quiz-env --oracle --no-waitNext Steps
- Learn about Task Structure in depth
- Explore Simulated Desktop for visual variants
- See Creating Private Tasks for advanced patterns
Was this page helpful?