Examples

Build a Custom Agent

Add your own agent to cua-bench

Build a Custom Agent

Add a custom agent to cua-bench by extending BaseAgent and registering it.

1. Create your agent file

Create cua_bench/agents/my_agent.py:

import base64
from pathlib import Path
from typing import TYPE_CHECKING

from . import register_agent
from .base import AgentResult, BaseAgent, FailureMode

if TYPE_CHECKING:
    from ..computers import DesktopSession


@register_agent("my-agent")
class MyAgent(BaseAgent):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = kwargs.get("model", "claude-sonnet-4-20250514")
        self.max_steps = int(kwargs.get("max_steps", 50))

    @staticmethod
    def name() -> str:
        return "my-agent"

    async def perform_task(
        self,
        task_description: str,
        session: "DesktopSession",
        logging_dir: Path | None = None,
        tracer=None,
    ) -> AgentResult:
        instruction = self._render_instruction(task_description)

        total_input = 0
        total_output = 0

        for step in range(self.max_steps):
            screenshot_bytes = await session.screenshot()

            if logging_dir:
                (logging_dir / f"step_{step}.png").write_bytes(screenshot_bytes)

            # Your logic: send screenshot to an LLM, parse action, etc.
            # action = await self._decide(instruction, screenshot_bytes)

            # if action is None:
            #     break
            # await session.execute_action(action)

        return AgentResult(
            total_input_tokens=total_input,
            total_output_tokens=total_output,
            failure_mode=FailureMode.NONE,
        )

2. Register the import

Add one line to cua_bench/agents/__init__.py:

from .my_agent import MyAgent  # noqa: E402

This triggers @register_agent("my-agent") at import time.

3. Run it

cb run task tasks/my-task --agent my-agent
cb run task tasks/my-task --agent my-agent --agent-kwarg model=gpt-4o --agent-kwarg max_steps=100

4. Develop without rebuilding the Docker image

Use --with to mount your local cua-bench (or any local package) into the agent container:

cb run task tasks/my-task --agent my-agent --with ../libs/cua-bench

This mounts the path into the container and runs pip install on it before the agent starts. You can repeat --with for multiple packages.

Reference

BaseAgent interface

Method / PropertyDescription
name() (static, abstract)Return the agent name string
perform_task(task_description, session, logging_dir, tracer) (abstract)Run the agent loop, return AgentResult
versionOptional version string (set via kwargs)
_render_instruction(text)Render through a Jinja2 prompt_template if one was provided

AgentResult

@dataclass
class AgentResult:
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    failure_mode: FailureMode = FailureMode.UNSET  # NONE, UNKNOWN, MAX_STEPS_EXCEEDED

Available actions

All importable from cua_bench (e.g. from cua_bench import ClickAction):

ActionFields
ClickActionx, y
RightClickActionx, y
DoubleClickActionx, y
MiddleClickActionx, y
DragActionfrom_x, from_y, to_x, to_y, duration=1.0
MoveToActionx, y, duration=0.0
ScrollActiondirection="up", amount=100
TypeActiontext
KeyActionkey
HotkeyActionkeys (list)
WaitActionseconds=1.0
DoneAction(none)

kwargs

All --agent-kwarg values from the CLI arrive as strings in **kwargs. Cast as needed (e.g. int(kwargs.get("max_steps", 50))). Built-in kwargs handled by BaseAgent: version, prompt_template.

Was this page helpful?