Introduction
Cua is an open-source framework for building Computer-Use Agents - AI systems that see, understand, and interact with desktop applications through vision and action, just like humans do.
Why Cua?
Cua gives you everything you need to automate any desktop application without brittle selectors or APIs.
Some highlights include:
- Model flexibility - Connect to 100+ LLM providers through liteLLM's standard interface. Use models from Anthropic, OpenAI, Google, and more - or run them locally with Ollama, Hugging Face, or MLX.
- Composed agents - Mix and match grounding models with planning models for optimal performance. Use specialized models like GTA, OpenCUA, or OmniParser for UI element detection paired with powerful reasoning models like Claude or GPT-4.
- Cross-platform sandboxes - Run agents safely in isolated environments. Choose from Docker containers, macOS VMs with Lume, Windows Sandbox, or deploy to Cua Cloud with production-ready infrastructure.
- Computer SDK - Control any application with a PyAutoGUI-like API. Click, type, scroll, take screenshots, manage windows, read/write files - everything you need for desktop automation.
- Agent SDK - Build autonomous agents with trajectory tracing, prompt caching, cost tracking, and budget controls. Test agents on industry-standard benchmarks like OSWorld-Verified with one line of code.
- Human-in-the-loop - Pause agent execution and await user input or approval before continuing. Use the
human/humanmodel string to let humans control the agent directly. - Production essentials - Ship reliable agents with built-in PII anonymization, cost tracking, trajectory logging, and integration with observability platforms like Laminar and HUD.
What can you build?
- RPA automation that works with any application - even legacy software without APIs.
- Form-filling agents that handle complex multi-step web workflows.
- Testing automation that adapts to UI changes without brittle selectors.
- Data extraction from desktop applications and document processing.
- Cross-application workflows that combine multiple tools and services.
- Research agents that browse, read, and synthesize information from the web.
Explore real-world examples in our blog posts.
Get started
Follow the Quickstart guide for step-by-step setup with Python or TypeScript.
If you're new to computer-use agents, check out our tutorials, examples, and notebooks to start building with Cua today.
Quickstart
Get up and running in 3 steps with Python or TypeScript.
Agent Loops
Learn how agents work and how to build your own.
Computer SDK
Control desktop applications with the Computer SDK.
Example Use Cases
See Cua in action with real-world examples.
We can't wait to see what you build with Cua ✨
Was this page helpful?