CuaReferenceAgent SDK

Agent SDK

Python API reference for the Agent SDK

The Agent SDK (cua-agent) provides the Python interface for building computer-use agents. This reference covers the ComputerAgent class, callbacks, tools, and response types.

Installation

pip install cua-agent

ComputerAgent

The main class for creating agents that can autonomously operate computers.

from agent import ComputerAgent

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer]
)

async for result in agent.run("Open Firefox and search for Cua"):
    print(result.text)

Constructor Parameters

ParameterTypeDefaultDescription
modelstrRequiredModel identifier (see VLMs)
toolslistRequiredTools the agent can use
api_keystrProvider env varAPI key for the model provider
callbackslist[]List of callback handlers
instructionsstrNoneCustom instructions prepended to prompts
verbosityintNoneLogging level (e.g., logging.INFO)
max_trajectory_budgetfloatNoneMaximum cost in dollars
only_n_most_recent_imagesintNoneLimit retained screenshots
trajectory_dirstr | dictNoneDirectory for saving trajectories

Methods

run(task, chat_history)

Execute a task autonomously. Returns an async generator of results.

async for result in agent.run("Click the submit button"):
    if result.text:
        print(result.text)
    if result.action:
        print(f"Action: {result.action}")
ParameterTypeDefaultDescription
taskstrRequiredNatural language task description
chat_historylist[]Previous conversation messages

Yields: AgentResult objects containing model responses and actions.

run_to_completion(task, chat_history)

Execute a task and return only the final result.

result = await agent.run_to_completion("Open the calculator app")
print(result.text)

Returns: AgentResult - The final result after task completion.


AgentResult

Returned by agent.run() for each iteration of the agent loop.

async for result in agent.run("Search for documentation"):
    # Check what happened this iteration
    if result.text:
        print(f"Agent said: {result.text}")

    if result.action:
        print(f"Action type: {result.action.type}")

    if result.usage:
        print(f"Tokens used: {result.usage.total_tokens}")

Properties

PropertyTypeDescription
textstr | NoneText response from the model
actionAction | NoneComputer action taken
screenshotPIL.Image | NoneScreenshot after action
usageUsage | NoneToken and cost information
errorstr | NoneError message if action failed
doneboolTrue if agent considers task complete

Action Types

The action property contains details about computer actions:

result.action.type          # "click", "type", "key", "scroll", etc.
result.action.coordinate    # [x, y] for click actions
result.action.text          # Text for type actions
result.action.key           # Key for key press actions

Usage Tracking

The Usage object tracks token consumption and costs.

async for result in agent.run("Complete this form"):
    if result.usage:
        print(f"Input tokens: {result.usage.input_tokens}")
        print(f"Output tokens: {result.usage.output_tokens}")
        print(f"Total tokens: {result.usage.total_tokens}")
        print(f"Cost: ${result.usage.response_cost:.4f}")

Properties

PropertyTypeDescription
input_tokensintTokens in the prompt
output_tokensintTokens in the response
total_tokensintTotal tokens used
response_costfloatCost in dollars
cache_creation_input_tokensintTokens written to cache
cache_read_input_tokensintTokens read from cache

Callbacks

Callbacks hook into the agent lifecycle for logging, cost tracking, and custom behavior.

Built-in Callbacks

LoggingCallback

Log agent events with configurable verbosity.

from agent.callbacks import LoggingCallback
import logging

agent = ComputerAgent(
    model="...",
    tools=[computer],
    callbacks=[LoggingCallback(level=logging.DEBUG)]
)
ParameterTypeDefaultDescription
levelintlogging.INFOLogging level

BudgetManagerCallback

Track costs and stop when budget is exceeded.

from agent.callbacks import BudgetManagerCallback

agent = ComputerAgent(
    model="...",
    tools=[computer],
    callbacks=[BudgetManagerCallback(
        max_budget=10.0,
        reset_after_each_run=True,
        raise_error=False
    )]
)
ParameterTypeDefaultDescription
max_budgetfloatRequiredMaximum cost in dollars
reset_after_each_runboolTrueReset budget per run
raise_errorboolFalseRaise exception vs graceful stop

ImageRetentionCallback

Limit screenshot history to prevent context overflow.

from agent.callbacks import ImageRetentionCallback

agent = ComputerAgent(
    model="...",
    tools=[computer],
    callbacks=[ImageRetentionCallback(only_n_most_recent_images=3)]
)
ParameterTypeDefaultDescription
only_n_most_recent_imagesintRequiredMax screenshots to retain

TrajectorySaverCallback

Save complete agent conversations for debugging.

from agent.callbacks import TrajectorySaverCallback

agent = ComputerAgent(
    model="...",
    tools=[computer],
    callbacks=[TrajectorySaverCallback(
        trajectory_dir="trajectories",
        reset_on_run=True,
        screenshot_dir="screenshots"
    )]
)
ParameterTypeDefaultDescription
trajectory_dirstrRequiredBase directory for trajectories
reset_on_runboolTrueCreate new trajectory per run
screenshot_dirstrNoneSeparate directory for screenshots

PromptInstructionsCallback

Prepend custom instructions to every LLM call.

from agent.callbacks import PromptInstructionsCallback

agent = ComputerAgent(
    model="...",
    tools=[computer],
    callbacks=[PromptInstructionsCallback("Always confirm before clicking")]
)

Creating Custom Callbacks

Extend AsyncCallbackHandler to create custom callbacks:

from agent.callbacks.base import AsyncCallbackHandler

class MyCallback(AsyncCallbackHandler):
    async def on_run_start(self, kwargs, old_items):
        """Called when agent.run() begins"""
        print("Starting run...")

    async def on_run_continue(self, kwargs, old_items, new_items) -> bool:
        """Called before each iteration. Return False to stop."""
        return True

    async def on_llm_start(self, messages):
        """Preprocess messages before LLM call."""
        return messages

    async def on_llm_end(self, messages):
        """Postprocess messages after LLM call."""
        return messages

    async def on_usage(self, usage):
        """Called with usage stats after each LLM call."""
        print(f"Cost: ${usage.response_cost:.4f}")

    async def on_computer_call_start(self, item):
        """Called before a computer action."""
        pass

    async def on_computer_call_end(self, item, result):
        """Called after a computer action."""
        pass

    async def on_screenshot(self, screenshot, name):
        """Called when a screenshot is taken."""
        pass

    async def on_run_end(self, kwargs, old_items, new_items):
        """Called when agent.run() completes."""
        print("Run complete!")

Callback Lifecycle Order

  1. on_run_start - Once at the beginning
  2. For each iteration:
    • on_run_continue - Check if should continue
    • on_llm_start - Before LLM call
    • on_api_start - Before API request
    • on_api_end - After API response
    • on_usage - With usage stats
    • on_llm_end - After LLM processing
    • on_responses - With model responses
    • on_text / on_computer_call_start / on_computer_call_end - Per response item
    • on_screenshot - When screenshots are taken
  3. on_run_end - Once at the end

Tools

Built-in Tools

Computer

The primary tool for full computer control.

from computer import Computer

computer = Computer(
    os_type="linux",
    provider_type="docker",
    image="trycua/cua-xfce:latest"
)
await computer.run()

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer]
)

BrowserTool

Specialized tool for web automation with browser-only models.

from agent.tools import BrowserTool

browser = BrowserTool(interface=computer)

agent = ComputerAgent(
    model="google/gemini-2.5-flash",
    tools=[browser]
)

See Browser Tool for available actions.

Custom Function Tools

Add Python functions as tools:

def calculate(a: int, b: int) -> int:
    """Calculate the sum of two integers"""
    return a + b

async def fetch_data(url: str) -> str:
    """Fetch data from a URL"""
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.text

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer, calculate, fetch_data]
)

Sandboxed Tools

Run tools inside the sandbox with the @sandboxed decorator:

from computer.helpers import sandboxed

@sandboxed()
def read_sandbox_file(path: str) -> str:
    """Read a file from inside the sandbox"""
    with open(path, 'r') as f:
        return f.read()

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer, read_sandbox_file]
)

BaseTool Class

For full control over tool schema:

from agent.tools import BaseTool, register_tool

@register_tool("database_query")
class DatabaseQueryTool(BaseTool):
    def __init__(self, connection_string: str):
        self.connection = connection_string

    @property
    def description(self) -> str:
        return "Execute a read-only SQL query"

    @property
    def parameters(self) -> dict:
        return {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "SQL SELECT query to execute"
                }
            },
            "required": ["query"]
        }

    def call(self, params, **kwargs):
        query = params["query"] if isinstance(params, dict) else params
        # Execute and return results
        return {"rows": [...]}

ToolError

Raise ToolError for recoverable errors:

from agent.tools import ToolError

def divide(a: float, b: float) -> float:
    """Divide a by b"""
    if b == 0:
        raise ToolError("Cannot divide by zero")
    return a / b

The model sees the error message and can adjust its approach.


Model Providers

Model Format

Models are specified as provider/model-name:

# Anthropic
agent = ComputerAgent(model="anthropic/claude-sonnet-4-5-20250929", ...)

# OpenAI
agent = ComputerAgent(model="openai/computer-use-preview", ...)

# Google
agent = ComputerAgent(model="google/gemini-2.5-flash", ...)

# Local models via Ollama
agent = ComputerAgent(model="ollama/ui-tars:latest", ...)

Composed Models

Use + to combine models for different capabilities:

# UI-TARS for grounding, Claude for planning
agent = ComputerAgent(
    model="ollama/ui-tars:latest+anthropic/claude-sonnet-4-5-20250929",
    tools=[computer]
)

# Qwen for grounding, GPT-4 for planning
agent = ComputerAgent(
    model="ollama/qwen2.5-vl:latest+openai/gpt-4o",
    tools=[computer]
)

Environment Variables

VariableDescription
ANTHROPIC_API_KEYAPI key for Anthropic models
OPENAI_API_KEYAPI key for OpenAI models
GOOGLE_API_KEYAPI key for Google models
OLLAMA_HOSTHost for Ollama (default: localhost:11434)

Chat History

Pass previous messages to maintain context:

from agent.types import Message

history = [
    Message(role="user", content="Open the browser"),
    Message(role="assistant", content="I've opened Firefox."),
]

async for result in agent.run(
    "Now search for Python tutorials",
    chat_history=history
):
    print(result.text)

Message Format

from agent.types import Message

# User message
user_msg = Message(role="user", content="Click the button")

# Assistant message
assistant_msg = Message(role="assistant", content="I clicked the submit button.")

# System message (for instructions)
system_msg = Message(role="system", content="Be concise in responses.")

Error Handling

Handle errors during agent execution:

from agent import ComputerAgent
from agent.errors import AgentError, BudgetExceededError

try:
    async for result in agent.run("Complete the form"):
        if result.error:
            print(f"Action failed: {result.error}")
        print(result.text)
except BudgetExceededError:
    print("Budget limit reached")
except AgentError as e:
    print(f"Agent error: {e}")

Error Types

ErrorDescription
AgentErrorBase class for agent errors
BudgetExceededErrorCost limit exceeded
ModelErrorModel API error
ToolErrorTool execution error

Trajectory Configuration

Configure trajectory saving with a dict:

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer],
    trajectory_dir={
        "trajectory_dir": "trajectories",
        "reset_on_run": False,         # Continue same trajectory across runs
        "screenshot_dir": "screenshots" # Save screenshots separately
    }
)
OptionDefaultDescription
trajectory_dirRequiredBase directory
reset_on_runTrueCreate new ID per run
screenshot_dirNoneSeparate screenshot directory

Was this page helpful?