Agent Loops

A corresponding Jupyter Notebook is available for this documentation.

An agent loop runs iteratively until a task is complete:

Observe → Take a screenshot
Reason → VLM decides the next action
Act → Execute (click, type, scroll)
Repeat → Until done

To run an agent loop simply do:

from agent import ComputerAgent
import asyncio
from computer import Computer


async def take_screenshot():
    async with Computer(
        os_type="linux",
        provider_type="cloud",
        name="your-sandbox-name",
        api_key="your-api-key"
    ) as computer:

        agent = ComputerAgent(
            model="anthropic/claude-sonnet-4-5-20250929",
            tools=[computer],
            max_trajectory_budget=5.0
        )

        messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]

        async for result in agent.run(messages):
            for item in result["output"]:
                if item["type"] == "message":
                    print(item["content"][0]["text"])


if __name__ == "__main__":
    asyncio.run(take_screenshot())

For a list of supported models and configurations, see the Vision Language Models page.

Configuration

Before running an agent, set the appropriate API key as an environment variable:

# For cloud sandboxes and Cua VLM Router
export CUA_API_KEY="your-cua-api-key"

# For direct provider access (BYOK - Bring Your Own Key)
export ANTHROPIC_API_KEY="your-anthropic-key"
export OPENAI_API_KEY="your-openai-key"

Use CUA_API_KEY when using Cua's cloud infrastructure or VLM Router. Use provider-specific keys when connecting directly to Anthropic or OpenAI.

Sending Tasks

You can send tasks to the agent as a simple string or as a message list for multi-turn conversations:

# Single task
async for result in agent.run("Open Firefox and go to google.com"):
    process(result)

# Multi-turn conversation (continues from previous context)
messages = [
    {"role": "user", "content": "Take a screenshot"},
    {"role": "assistant", "content": "Done. I can see a desktop with Firefox open."},
    {"role": "user", "content": "Now click the search bar"}
]
async for result in agent.run(messages):
    process(result)

Multi-turn conversations are useful when you need the agent to build on previous actions or when implementing human-in-the-loop workflows.

Processing Results

Each iteration yields a result containing the agent's outputs and token usage:

async for result in agent.run(messages):
    for item in result["output"]:
        match item["type"]:
            case "message":
                # Agent's text response
                print(item["content"][0]["text"])
            case "computer_call":
                # Action the agent is taking (screenshot, click, type)
                print(f"Action: {item['action']['type']}")
            case "computer_call_output":
                # Result of the action (e.g., screenshot image)
                print(f"Output received for {item['call_id']}")

    # Track costs
    print(f"Cost so far: ${result['usage']['response_cost']:.4f}")

Streaming Responses

For real-time feedback, enable streaming to receive partial results as they're generated:

async for result in agent.run(messages, stream=True):
    for item in result["output"]:
        if item["type"] == "message":
            # Print text as it arrives
            print(item["content"][0]["text"], end="", flush=True)

Handling Errors

The agent respects the max_trajectory_budget you set. When the budget is exceeded, the agent raises an exception:

from agent.exceptions import BudgetExceededException

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer],
    max_trajectory_budget=5.0  # Max $5 per run
)

try:
    async for result in agent.run(messages):
        process(result)
except BudgetExceededException:
    print("Task exceeded budget limit")

This prevents runaway costs when agents get stuck in loops or take longer than expected.

Was this page helpful?