Architecture
How Cua Driver and Cua Sandbox give an external agent a computer-use interface: one on a real machine, one in a disposable VM.
Cua has two separate ways to expose a computer to an external agent. In both cases, the agent sends computer-use operations and receives observations such as screenshots, window lists, accessibility trees, and command results. The operating-system details stay behind a driver or server. The agent, model, or orchestration code remains outside that boundary and decides what to do next.
Two paths to a computer
The real-machine path uses Cua Driver. The computer is an existing macOS, Windows, or Linux machine. Cua Driver runs on that machine and exposes OS-level operations through MCP over stdio and through a CLI. This path is for operating real native applications in the user's environment.
The disposable-computer path uses Cua Sandbox. The computer is a cloud or local VM or container created for the task. Cua Sandbox starts and controls that environment, then talks to a computer-server running inside it over HTTP and WebSocket. This path is for isolated workspaces that can be created, used, and discarded.
These paths are not two backends behind one shared SDK. Cua Driver and Cua Sandbox are separate products. They both provide a computer-use surface, but they use different transports, runtimes, and deployment targets.
Cua Driver: the real machine
Cua Driver is a single Rust binary that operates a real machine. It exposes operations for window discovery, screenshots, accessibility-tree walking, input dispatch, and background operation that does not take over the user's active workspace. Tools can call it through MCP over stdio, or users can call it directly through the CLI.
Under the hood, Cua Driver uses the native platform APIs for each operating system. On macOS it uses AX plus CoreGraphics. On Windows it uses UIA. On Linux it uses AT-SPI. Those APIs provide the bridge from a model-agnostic computer-use request to concrete OS behavior, such as finding windows, reading accessibility state, capturing pixels, and sending input.
Because it runs on the real machine, Cua Driver inherits that machine's installed applications, user session, permissions, displays, and files. That is the main reason to use it. The tradeoff is that the computer is not disposable, so the caller is responsible for choosing operations that are appropriate for the user's active environment.
Cua Sandbox: a disposable computer
Cua Sandbox creates a disposable cloud or local computer. The environment can be an isolated VM or container, depending on the backend. The Python Sandbox SDK creates and controls that sandbox, then communicates with the computer-server running inside it over REST and WebSocket APIs.
The computer-server is the component inside the sandbox that exposes the computer-use surface for that disposable machine. It receives operations over HTTP and WebSocket, performs them inside the VM or container, and returns observations. The Python sandbox stack talks to this server only over HTTP and WebSocket. It does not invoke Cua Driver and it does not use MCP.
Cua Sandbox can target several backends, including Lume and Lumier for macOS VMs, cloud environments, Windows Sandbox, QEMU, Hyper-V, and Docker. The backend determines where the disposable computer runs and what isolation boundary it uses, but the external shape remains the same: create a sandbox, connect to its computer-server, operate the computer, then tear it down when the task is finished.
The model boundary
Cua Driver and computer-server are model-agnostic. They receive operations and return observations. They do not call language models, plan tasks, decide intent, or choose the next action. Cua does not ship a model.
That boundary keeps OS interaction separate from reasoning. The user brings an agent or model, and that system decides what operation to send next based on the observations it receives. Cua provides the computer-use interface on either a real machine through Cua Driver, or a disposable machine through Cua Sandbox and computer-server.