Build Modular Computer-Use Agents with Cua in 2025

Computer-use agents are on the verge of transforming how we interact with technology. These autonomous AI agents can observe a screen, reason about a task, and use a mouse and keyboard to control graphical user interfaces (GUIs)—from desktops and browsers to complex software. They act as intelligent partners, automating digital work in the most intuitive way possible.

As we stand in November 2025, the next great leap isn't just about building more powerful, monolithic models. It's about building smarter, more reliable frameworks. At Cua, we're leading this charge by providing an open, modular, and scalable framework designed to build and deploy sophisticated Cua Agents. Our tools empower you to move beyond single-model limitations and orchestrate teams of specialized AI agents that are robust, efficient, and ready for production.

Why a Modular Approach is Essential for AI Agents

For years, the pursuit of artificial general intelligence (AGI) was dominated by the idea of a single, all-knowing model. However, the agentic AI space is rapidly moving away from this monolithic approach toward modular systems [4]. Relying on one model for everything—planning, vision, and action—is like hiring a single person to be your company's CEO, accountant, and lead engineer. It doesn't scale, and it's prone to failure.

Modular design, inspired by proven software engineering principles like microservices, offers a more powerful path forward [2]. By breaking down complex problems, we can assign different parts of a task to specialized models, each excelling at its specific function. This architectural evolution brings several key advantages:

  • Improved Reliability: Isolate failures to individual components instead of having the entire system crash.
  • Easier Debugging: Pinpoint issues within specific modules, making agents significantly easier to debug and improve [1].
  • Greater Flexibility: Swap models in and out as new, better ones become available without rewriting your entire system [3].
  • Enhanced Performance: Combine a top-tier planning model with a hyper-accurate vision model to achieve results superior to any single model.

This modular philosophy is the foundation of the Cua framework, enabling developers to build the next generation of truly effective Computer-use agents.

The Cua Framework: Actionable Steps for Building Modular Agents

With our open-source Agent and Computer SDKs, we've made building modular agents not just possible, but practical. Here's how you can start building with our framework today.

Step 1: Unify Your Models with the Agent SDK

One of the biggest headaches in building agents is the sheer variety of model APIs. Each model expects different inputs, produces different outputs, and has its own unique way of representing coordinates or actions.

Our Agent library solves this. The ComputerAgent class provides a unified interface that abstracts away these complexities. You simply choose a model from our list of All-in-one CUA Models—or any model compatible with LiteLLM—and our framework handles the rest. No more custom parsers or input formatters.

python
# This works the same for Anthropic, OpenAI, or a local Hugging Face model from agent import ComputerAgent agent = ComputerAgent( model="anthropic/claude-4.5-sonnet-20250929", # or any other supported model tools=[computer] )

Step 2: Create Composite Agents for Specialized Roles

This is where the magic of modularity truly shines. We realized that you don't need one model to do everything. As we detailed in our Composite Agents blog post, you can combine a "planning" model (great at reasoning and task breakdown) with a "grounding" model (excellent at visual perception and UI interaction).

With Cua, creating a Composite Agent is as simple as using a + sign.

python
# Create a composite agent # The grounding model (GTA1-7B) handles vision # The planning model (GPT-4o) handles reasoning agent = ComputerAgent( model="huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-4o", tools=[computer] )

This powerful feature lets you pair a large, general-purpose reasoning model like GPT-4o with a highly specialized vision model, giving it App-Use capabilities it never had on its own.

Step 3: Deploy and Scale with Cua Containers

Running AI agents that have full control of a mouse and keyboard directly on your local machine is risky. A single bug can break your environment. To build production-ready agents, you need isolation and scalability.

That's why we built Cua Containers. Think of our Cua Cloud Sandbox as Docker for Computer-use agents. With a simple API call, you can spin up a fully configured Computer instance in the cloud, complete with a desktop environment, browser, and all the tools your agent needs.

  • Isolation: Each agent runs in a secure, sandboxed environment.
  • Scalability: Run one agent or a hundred in parallel without managing any infrastructure.
  • Portability: Your workflow runs the same way every time, regardless of your local machine's hardware.

Cua Agents in Action: From Simple Tasks to Parallel Workflows

The modularity of the Cua framework allows agents to tackle a wide range of tasks. Here are a few examples of what you can build.

Example 1: Automating a GitHub Workflow

A Cua Agent can perform a complete developer workflow, from identifying an issue to creating a pull request. This example uses a single agent to interact with a web browser and a local terminal within a Cua Container.

Tasks include:

  1. Navigate to a GitHub repository.
  2. Read the title of the most recent open issue.
  3. Clone the repository into the Computer instance.
  4. Create a new branch named after the issue.
  5. Make a simple code change to address the issue.
  6. Commit the changes and create a pull request.

Example 2: Running Parallel Web Scraping Agents

With Cua Cloud Sandbox, you can easily run multiple agents in parallel. The following code snippet deploys three separate agents to scrape headlines from different news websites simultaneously.

python
import asyncio from computer import Computer, VMProviderType from agent import ComputerAgent async def scrape_website(site_name, url): """Scrape a website using a cloud agent.""" computer = Computer( os_type="linux", api_key=os.getenv("CUA_API_KEY"), name=f"scraper-{site_name}", provider_type=VMProviderType.CLOUD, ) agent = ComputerAgent( model="openai/gpt-4o", save_trajectory=True, tools=[computer] ) task = f"Navigate to {url} and extract the main headlines." async for result in agent.run(task): print(f"[{site_name}] Response: {result.get('text')}") async def parallel_scraping(): """Scrape multiple websites in parallel.""" sites = [ ("ArXiv", "https://arxiv.org"), ("HackerNews", "https://news.ycombinator.com"), ("TechCrunch", "https://techcrunch.com") ] # Run all scraping tasks in parallel tasks = [scrape_website(name, url) for name, url in sites] await asyncio.gather(*tasks) # Run parallel scraping asyncio.run(parallel_scraping())

Get Started with Your First Modular Agent in 5 Minutes

Ready to build? You can launch your first modular agent in just a few minutes.

  1. Get Your API Key Sign up for a free account at trycua.com to get your Cua Cloud API key.

  2. Install the SDK Install our open-source Python libraries.

bash
pip install cua-agent cua-computer
  1. Write and Run Your Agent The following code creates a Composite Agent and runs it in a Cua Cloud Sandbox to perform a simple web search.
python
import asyncio import os from computer import Computer, VMProviderType from agent import ComputerAgent async def run_modular_agent(): # Step 1: Create a remote Computer instance with Cua Cloud computer = Computer( os_type="linux", api_key=os.getenv("CUA_API_KEY"), name="my-modular-agent-container", provider_type=VMProviderType.CLOUD, ) # Step 2: Create a Composite Agent agent = ComputerAgent( model="huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+anthropic/claude-4.5-sonnet-20250929", tools=[computer] ) # Step 3: Run a task task = "Open a web browser, search for 'latest news in AI agent frameworks', and list the top 3 results." async for result in agent.run(task): print(result.get('text')) # Run the agent asyncio.run(run_modular_agent())

Join the Future of Computer Automation

The era of modular, scalable, and reliable Computer-use agents is here. With Cua's open framework, you have the tools to build powerful automations that were previously out of reach.

Ready to dive deeper? Explore our documentation on customizing your ComputerAgent.

Citations

[1] https://dev.to/exploredataaiml/building-intelligent-ai-agents-with-modular-reinforcement-learning-323c

[2] https://vectorize.io/blog/designing-agentic-ai-systems-part-2-modularity

[3] https://linkedin.com/posts/brijpandeyji_building-truly-modular-agentic-ai-systems-activity-7356924814625751040-WQOp

[4] https://databricks.com/blog/ai-agent-systems