Create a Native Task
Build a file-based question-answering task using native OS commands and file manipulation
In this tutorial, you'll create a task that uses native OS capabilities to set up files and evaluate completion. This task uses native provider - it runs on real Docker/QEMU environments with full shell access.
Time: ~10 minutes Prerequisites: cua-bench installed, Docker running
What You'll Build
A question-answering task where agents must:
- Read a question from a text file on the Desktop
- Write their answer to another text file
- Get evaluated based on file contents
You'll learn:
- Using
run_command()for OS-level operations - Setting up files and directories
- Evaluating tasks by reading files
- Working with native Linux/Windows environments
When to Use Native Tasks
Use native tasks when:
- You need to test real OS interactions (file system, registry, processes)
- The task requires installing or using real applications
- You need shell access for complex setup
- You're building WinArena-style tasks
Use Universal GUI instead when:
- You're building custom interfaces or games
- You want cross-platform compatibility
- You don't need real OS features
- You want faster, lightweight environments
Step 1: Create Task Directory
mkdir -p tasks/question_answer
cd tasks/question_answerStep 2: Create the Task Configuration
Create main.py:
import cua_bench as cb
from pathlib import Path
@cb.tasks_config(split="train")
def load():
"""Define task variants with different questions."""
qa_pairs = [
{
"question": "What is the capital of France?",
"answer": "Paris"
},
{
"question": "What is 2 + 2?",
"answer": "4"
},
{
"question": "What color is the sky on a clear day?",
"answer": "blue"
},
{
"question": "What is the largest planet in our solar system?",
"answer": "Jupiter"
}
]
return [
cb.Task(
description=f'Open the question.txt file on the Desktop, read the question, and write your answer in answer.txt on the Desktop. Question: {qa["question"]}',
metadata={
"question": qa["question"],
"answer": qa["answer"]
},
computer={
"provider": "native", # Requires Docker/QEMU
"setup_config": {
"os_type": "linux", # or "windows"
"width": 1024,
"height": 768
}
}
)
for qa in qa_pairs
]
@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
"""Initialize the task environment with files."""
# Get the question from metadata
question = task_cfg.metadata["question"]
# Create Desktop directory if it doesn't exist
await session.run_command("mkdir -p ~/Desktop")
# Write the question to a text file on the Desktop
await session.run_command(f'echo "{question}" > ~/Desktop/question.txt')
# Ensure answer.txt doesn't exist yet
await session.run_command("rm -f ~/Desktop/answer.txt")
@cb.evaluate_task(split="train")
async def evaluate(task_cfg: cb.Task, session: cb.DesktopSession) -> list[float]:
"""Return reward based on task completion."""
expected_answer = task_cfg.metadata["answer"].strip().lower()
# Check if answer.txt exists and read it
result = await session.run_command("cat ~/Desktop/answer.txt 2>/dev/null || echo ''")
# Extract the actual answer from the command result
user_answer = result["stdout"].strip().lower() if result else ""
# Check if the answer matches (case-insensitive, stripped)
if user_answer == expected_answer:
return [1.0]
else:
return [0.0]
@cb.solve_task(split="train")
async def solve(task_cfg: cb.Task, session: cb.DesktopSession):
"""Demonstrate the solution."""
answer = task_cfg.metadata["answer"]
# Open the question file to read it (demonstration)
await session.run_command("cat ~/Desktop/question.txt")
# Write the answer to answer.txt
await session.run_command(f'echo "{answer}" > ~/Desktop/answer.txt')Step 3: Understanding run_command()
The run_command() method executes shell commands and returns a dict:
result = await session.run_command("cat ~/Desktop/answer.txt")
# Result structure:
{
"stdout": str, # Command output
"stderr": str, # Error output
"return_code": int, # Exit code (0 = success)
"success": bool # True if return_code == 0
}
# Access the output:
output = result["stdout"]
errors = result["stderr"]
exit_code = result["return_code"]
succeeded = result["success"]Common Patterns
Create directories:
await session.run_command("mkdir -p ~/Desktop/MyFolder")Write files:
await session.run_command('echo "content" > ~/Desktop/file.txt')Read files:
result = await session.run_command("cat ~/Desktop/file.txt")
content = result["stdout"]Check if file exists:
result = await session.run_command("test -f ~/Desktop/file.txt && echo 'yes' || echo 'no'")
exists = result["stdout"].strip() == "yes"Install applications (Linux):
await session.run_command("sudo apt update && sudo apt install -y firefox")Launch applications:
await session.run_command("firefox &") # Background processStep 4: Run the Task
Preview Interactively
cb interact tasks/question_answer --variant-id 0A VNC viewer will open showing the native Linux desktop environment.
Run with Oracle
cb interact tasks/question_answer --oracle --variant-id 0Run with Agent
export ANTHROPIC_API_KEY=sk-...
cb run task tasks/question_answer --agent cua-agent --model anthropic/claude-sonnet-4-20250514Advanced: Windows Tasks
For Windows-specific tasks, use os_type: "windows":
@cb.tasks_config(split="train")
def load():
return [
cb.Task(
description='Open Notepad and write "Hello World" to a file',
computer={
"provider": "native",
"setup_config": {
"os_type": "windows", # Windows VM
"width": 1920,
"height": 1080
}
}
)
]
@cb.setup_task(split="train")
async def start(task_cfg: cb.Task, session: cb.DesktopSession):
# Windows commands use PowerShell/CMD syntax
await session.run_command('powershell -Command "New-Item -Path C:\\Users\\Public\\Desktop -ItemType Directory -Force"')
# Launch Notepad
await session.run_command("notepad.exe")Windows Command Patterns
PowerShell commands:
result = await session.run_command('powershell -Command "Get-ChildItem C:\\Users"')Registry operations:
await session.run_command('reg add "HKCU\\Software\\MyApp" /v Setting /t REG_SZ /d "value"')File operations:
await session.run_command('powershell -Command "Get-Content C:\\path\\file.txt"')File API Alternative
For simple file operations, use the dedicated file methods instead of run_command():
# Read file
content = await session.read_file("/path/to/file.txt")
# Write file
await session.write_file("/path/to/file.txt", "content")
# Check if file exists
exists = await session.file_exists("/path/to/file.txt")
# List directory
files = await session.list_dir("/path/to/dir")These methods are cleaner than shell commands for basic file operations.
Key Concepts
Native Provider: Runs on real Docker (Linux) or QEMU (Windows) environments
Shell Commands: Use run_command() for OS-level operations
Return Type: Always returns a dict with stdout, stderr, return_code, success
File Setup: Use setup to prepare the environment before agents interact
Evaluation: Read files/check state to determine task completion
Performance Considerations
Native tasks are slower than Universal GUI tasks because they:
- Require Docker/QEMU startup time
- Run full OS environments
- Execute real shell commands
Optimization tips:
- Use Universal GUI when possible
- Batch multiple commands with
&& - Avoid unnecessary file operations
- Use file API methods for simple reads/writes
Troubleshooting
Docker not running:
Error: Cannot connect to Docker daemonSolution: Start Docker Desktop
Permission denied:
# Add sudo for operations requiring permissions
await session.run_command("sudo apt install -y package")Command not found:
# Install the package first
await session.run_command("sudo apt update && sudo apt install -y package-name")Windows path escaping:
# Use raw strings or double backslashes
await session.run_command('powershell -Command "Get-Item C:\\\\Users"')Next Steps
- Learn about Native Provider for advanced OS features
- See Session API Reference for all available methods
- Build a Custom Agent to solve tasks
- Compare with Universal Tasks to choose the right approach
Was this page helpful?