Build a Custom Agent

Add a custom agent to cua-bench by extending BaseAgent and registering it.

1. Create your agent file

Create cua_bench/agents/my_agent.py:

import base64
from pathlib import Path
from typing import TYPE_CHECKING

from . import register_agent
from .base import AgentResult, BaseAgent, FailureMode

if TYPE_CHECKING:
    from ..computers import DesktopSession


@register_agent("my-agent")
class MyAgent(BaseAgent):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.model = kwargs.get("model", "claude-sonnet-4-20250514")
        self.max_steps = int(kwargs.get("max_steps", 50))

    @staticmethod
    def name() -> str:
        return "my-agent"

    async def perform_task(
        self,
        task_description: str,
        session: "DesktopSession",
        logging_dir: Path | None = None,
        tracer=None,
    ) -> AgentResult:
        instruction = self._render_instruction(task_description)

        total_input = 0
        total_output = 0

        for step in range(self.max_steps):
            screenshot_bytes = await session.screenshot()

            if logging_dir:
                (logging_dir / f"step_{step}.png").write_bytes(screenshot_bytes)

            # Your logic: send screenshot to an LLM, parse action, etc.
            # action = await self._decide(instruction, screenshot_bytes)

            # if action is None:
            #     break
            # await session.execute_action(action)

        return AgentResult(
            total_input_tokens=total_input,
            total_output_tokens=total_output,
            failure_mode=FailureMode.NONE,
        )

2. Register the import

Add one line to cua_bench/agents/__init__.py:

from .my_agent import MyAgent  # noqa: E402

This triggers @register_agent("my-agent") at import time.

3. Run it

cb run task tasks/my-task --agent my-agent
cb run task tasks/my-task --agent my-agent --agent-kwarg model=gpt-4o --agent-kwarg max_steps=100

4. Develop without rebuilding the Docker image

Use --with to mount your local cua-bench (or any local package) into the agent container:

cb run task tasks/my-task --agent my-agent --with ../libs/cua-bench

This mounts the path into the container and runs pip install on it before the agent starts. You can repeat --with for multiple packages.

Reference

BaseAgent interface

Method / Property	Description
`name()` (static, abstract)	Return the agent name string
`perform_task(task_description, session, logging_dir, tracer)` (abstract)	Run the agent loop, return `AgentResult`
`version`	Optional version string (set via `kwargs`)
`_render_instruction(text)`	Render through a Jinja2 `prompt_template` if one was provided

AgentResult

@dataclass
class AgentResult:
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    failure_mode: FailureMode = FailureMode.UNSET  # NONE, UNKNOWN, MAX_STEPS_EXCEEDED

Available actions

All importable from cua_bench (e.g. from cua_bench import ClickAction):

Action	Fields
`ClickAction`	`x, y`
`RightClickAction`	`x, y`
`DoubleClickAction`	`x, y`
`MiddleClickAction`	`x, y`
`DragAction`	`from_x, from_y, to_x, to_y, duration=1.0`
`MoveToAction`	`x, y, duration=0.0`
`ScrollAction`	`direction="up", amount=100`
`TypeAction`	`text`
`KeyAction`	`key`
`HotkeyAction`	`keys` (list)
`WaitAction`	`seconds=1.0`
`DoneAction`	(none)

All --agent-kwarg values from the CLI arrive as strings in **kwargs. Cast as needed (e.g. int(kwargs.get("max_steps", 50))). Built-in kwargs handled by BaseAgent: version, prompt_template.

Was this page helpful?

Build a Custom Agent

On this page