Examples
Build a Custom Agent
Add your own agent to cua-bench
Build a Custom Agent
Add a custom agent to cua-bench by extending BaseAgent and registering it.
1. Create your agent file
Create cua_bench/agents/my_agent.py:
import base64
from pathlib import Path
from typing import TYPE_CHECKING
from . import register_agent
from .base import AgentResult, BaseAgent, FailureMode
if TYPE_CHECKING:
from ..computers import DesktopSession
@register_agent("my-agent")
class MyAgent(BaseAgent):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.model = kwargs.get("model", "claude-sonnet-4-20250514")
self.max_steps = int(kwargs.get("max_steps", 50))
@staticmethod
def name() -> str:
return "my-agent"
async def perform_task(
self,
task_description: str,
session: "DesktopSession",
logging_dir: Path | None = None,
tracer=None,
) -> AgentResult:
instruction = self._render_instruction(task_description)
total_input = 0
total_output = 0
for step in range(self.max_steps):
screenshot_bytes = await session.screenshot()
if logging_dir:
(logging_dir / f"step_{step}.png").write_bytes(screenshot_bytes)
# Your logic: send screenshot to an LLM, parse action, etc.
# action = await self._decide(instruction, screenshot_bytes)
# if action is None:
# break
# await session.execute_action(action)
return AgentResult(
total_input_tokens=total_input,
total_output_tokens=total_output,
failure_mode=FailureMode.NONE,
)2. Register the import
Add one line to cua_bench/agents/__init__.py:
from .my_agent import MyAgent # noqa: E402This triggers @register_agent("my-agent") at import time.
3. Run it
cb run task tasks/my-task --agent my-agent
cb run task tasks/my-task --agent my-agent --agent-kwarg model=gpt-4o --agent-kwarg max_steps=1004. Develop without rebuilding the Docker image
Use --with to mount your local cua-bench (or any local package) into the agent container:
cb run task tasks/my-task --agent my-agent --with ../libs/cua-benchThis mounts the path into the container and runs pip install on it before the agent starts. You can repeat --with for multiple packages.
Reference
BaseAgent interface
| Method / Property | Description |
|---|---|
name() (static, abstract) | Return the agent name string |
perform_task(task_description, session, logging_dir, tracer) (abstract) | Run the agent loop, return AgentResult |
version | Optional version string (set via kwargs) |
_render_instruction(text) | Render through a Jinja2 prompt_template if one was provided |
AgentResult
@dataclass
class AgentResult:
total_input_tokens: int = 0
total_output_tokens: int = 0
failure_mode: FailureMode = FailureMode.UNSET # NONE, UNKNOWN, MAX_STEPS_EXCEEDEDAvailable actions
All importable from cua_bench (e.g. from cua_bench import ClickAction):
| Action | Fields |
|---|---|
ClickAction | x, y |
RightClickAction | x, y |
DoubleClickAction | x, y |
MiddleClickAction | x, y |
DragAction | from_x, from_y, to_x, to_y, duration=1.0 |
MoveToAction | x, y, duration=0.0 |
ScrollAction | direction="up", amount=100 |
TypeAction | text |
KeyAction | key |
HotkeyAction | keys (list) |
WaitAction | seconds=1.0 |
DoneAction | (none) |
kwargs
All --agent-kwarg values from the CLI arrive as strings in **kwargs. Cast as needed (e.g. int(kwargs.get("max_steps", 50))). Built-in kwargs handled by BaseAgent: version, prompt_template.
Was this page helpful?