Cua: Open Source Framework for Automating Desktop Tasks

The Next Leap in AI: From Answering to Acting

For years, AI has excelled at answering our questions. We ask, and it provides information. But what if AI could go a step further? What if it could act on those answers? Imagine an AI that doesn't just tell you how to complete a task on your computer but actually does it for you—seeing your screen, clicking buttons, and typing text just like a person would.

This new frontier is the realm of "computer-use agents," or CUAs. These intelligent agents are designed to handle complex digital workflows on your behalf. To make building these agents fast, simple, and safe, we created Cua, a powerful open source computer use agent framework.

What is a Computer-Use Agent?

A computer-use agent (CUA) is an AI system built to interact with software through its graphical user interface (GUI)—the same way you do. Think of it as an AI that can see and use any application on your desktop or in your browser.

This is a major evolution from traditional automation tools like scripts or macros. While those tools are effective for repetitive, unchanging tasks, they are rigid and often break with the smallest update to a user interface. CUAs, on the other hand, use advanced vision-language models to perceive and understand the screen. This gives them a human-like ability to adapt to changes, locate buttons even if they move, and navigate dynamic digital environments. This capability is revolutionizing automation, with major players in the AI space actively developing their own agents [5].

The Core Challenge: Why Building Agents Can Be Risky

Giving an AI direct control over your computer sounds powerful, but it also comes with significant security and stability risks. What happens if an untrained or buggy agent misunderstands a command? It could accidentally delete important files, change critical system settings, or completely break your development environment.

One developer shared their experience, which perfectly illustrates the problem: "VM for Agents. Just today, my agent setup broke my computer, preventing disk writing (which also means most programs won't start)." This risk is a major barrier holding back the widespread development and adoption of computer-use agents.

Introducing Cua: A Secure Sandbox for AI Agents

We designed Cua to solve this safety problem head-on. Cua, which stands for Containers for Computer-Use AI Agents, is a complete, open-source framework for developing, containerizing, and deploying CUAs safely across Windows, macOS, and Linux.

The core of our platform is the use of secure, isolated containers (like Docker and virtual machines) that act as a "sandbox" for the agent. This means the AI operates in a controlled environment, completely separate from your main operating system. It can click, type, and browse freely inside the sandbox without any risk of affecting your personal files or system settings. Cua provides the secure foundation you need to build with confidence. The entire project is open source and available for you to explore on GitHub [3].

Get Started with Cua in Minutes

We believe that building powerful AI agents shouldn't be complicated. Cua is designed for a fast and intuitive developer experience, allowing you to create your first open source computer use agent quickly in just a few simple steps.

Step 1: Install the Cua Agent SDK

Getting started is as simple as installing our Python package. Open your terminal and run the following command:

bash
pip install cua-agent

Step 2: Write Your First Agent

With just a few lines of Python, you can initialize and run your first agent. This code sets up a sandboxed computer instance, assigns an AI model to control it, and gives it a simple task.

python
from cua_agent import Computer, ComputerAgent from cua_agent.llm import ask_llm # Initialize a sandboxed computer computer = Computer() # Give it a task task = "Take a screenshot of the desktop." # Create the agent and run the task agent = ComputerAgent(computer, ask_llm) agent.run(task)

In this example, the agent will launch in its secure environment and perform the requested action—in this case, taking a screenshot.

Step 3: Set Up Your Computer Environment

Cua offers several options for running your sandboxed computer environment, so you can choose the one that best fits your needs. You can find detailed setup instructions in our quickstart guide.

  • Cua Cloud (Recommended): The easiest way to get started. No local setup is required.

  • Local Docker: Run a container on any OS (Windows, macOS, Linux).

  • Windows Sandbox: Use the native Windows sandboxing feature.

  • Local macOS VM: Run a virtual machine directly on your Mac.

Step 4: Choose Your AI Model

Cua is model-agnostic, giving you the freedom to choose the best AI for your project. Our framework integrates with LiteLLM, which provides a universal interface for a wide range of Large Language Models (LLMs). This means you can easily use models from:

  • Anthropic (Claude series)

  • Google (Gemini series)

  • OpenAI (GPT series)

  • Various open-source models

Cua's Key Features for Building Robust Agents

Beyond the basics, Cua is a full-featured platform with tools designed to help you build, scale, and deploy sophisticated AI agents.

Unified and Scalable Infrastructure

We provide all the core components you need as open source, including our Agent SDK, Computer SDK, macOS virtualization tools, and Docker images. For projects that require unlimited scale, our cloud-powered sandbox offers a simple API, zero infrastructure management, and pay-as-you-go billing. You can choose between Linux and Windows sandboxes to match your target environment.

Model Context Protocol (MCP) Integration

For developers who want to integrate Cua with existing clients, our platform supports the Model Context Protocol (MCP). This standard allows different AI applications to communicate seamlessly. For example, you could run Cua agents directly through an application like the Claude Desktop app, helping to streamline complex workflows and unlock new possibilities for desktop AI [4].

App-Use: Focused Control for Specialized Tasks

We are also developing an experimental feature called "App-Use," which restricts an agent's control to a single application window. This allows you to create lightweight, highly specialized agents that can run in parallel without interfering with each other. Imagine one agent researching information in a web browser while another simultaneously drafts a report in a word processor—all managed safely and efficiently.

Cua in the Open-Source Ecosystem

The developer community's excitement around computer-use agents is growing rapidly, and we were thrilled when Cua recently became the #1 trending repository on GitHub. We are proud to be a key player in this vibrant ecosystem.

We believe in the power of open collaboration and encourage developers to explore the landscape to make informed decisions. Other fantastic projects are also pushing the boundaries of what's possible, such as OpenCUA, which is building open foundations for agents [1], and LangGraph's CUA implementation [2].

Conclusion: Build the Future of Automation with Cua

Cua offers a powerful, secure, and easy-to-use open-source framework for building the next generation of AI automation. By providing secure sandboxing, rapid development tools, and complete model flexibility, we empower developers to create robust automation solutions safely and efficiently.

The future of automation is here, and it's interactive, intelligent, and open. We invite you to join our community and start building with Cua today.

Ready to create your first agent? Dive into our documentation to learn more and get started.

Citations

[1] https://github.com/xlang-ai/OpenCUA

[2] https://github.com/langchain-ai/langgraph-cua-py

[3] https://github.com/trycua/cua

[4] https://skywork.ai/skypage/en/unlocking-desktop-ai-cua-mcp-servers/1981181639448776704

[5] https://workos.com/blog/anthropics-computer-use-versus-openais-computer-using-agent-cua