Computer SDK
Python API reference for the Computer SDK
The Computer SDK (cua-computer) provides the Python interface for creating and controlling sandboxed desktop environments. This reference covers the core classes, methods, and types you'll use when working with computers programmatically.
Installation
pip install cua-computerCore Classes
Computer
The main class for creating and managing sandboxed desktop environments.
from computer import Computer
computer = Computer(
os_type="linux",
provider_type="docker",
image="trycua/cua-xfce:latest"
)
await computer.run()Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
os_type | str | Required | Operating system type: "linux", "macos", or "windows" |
provider_type | str | Required | Provider type: "docker", "lume", "cloud", "qemu", "windows-sandbox", or "host" |
image | str | None | Container/VM image to use (provider-specific) |
name | str | "" | Optional name for the computer instance |
display | str | dict | "1024x768" | Display resolution (can be string like "1920x1080" or dict like {"width": 1920, "height": 1080}) |
memory | str | "8GB" | Memory allocation |
cpu | str | "4" | Number of CPU cores |
shared_directories | list[str] | None | List of host directories to share with the computer |
storage | str | None | Path to persistent storage |
ephemeral | bool | False | Use ephemeral storage (data lost on stop) |
api_key | str | None | API key for cloud provider (defaults to CUA_API_KEY env var) |
host | str | "localhost" | Host address for provider connection |
timeout | int | 100 | Connection timeout in seconds |
telemetry_enabled | bool | True | Enable telemetry |
Methods
run()
Start the computer and establish connection.
await computer.run()Returns once the computer is ready to accept commands.
stop()
Stop the computer and release resources.
await computer.stop()restart()
Restart the computer.
await computer.restart()disconnect()
Disconnect from the computer without stopping it.
await computer.disconnect()get_ip()
Get the IP address of the computer.
ip = await computer.get_ip()Returns: str - IP address
update(cpu, memory)
Update computer resources (cloud provider only).
await computer.update(cpu="8", memory="16GB")Python Execution Methods
python_exec(func, *args, **kwargs)
Execute a Python function in the computer's Python environment.
def calculate(x, y):
return x + y
result = await computer.python_exec(calculate, 5, 10)
# result = 15python_exec_background(func, *args, requirements=None, **kwargs)
Execute a Python function in the background.
def long_running_task():
import time
time.sleep(60)
return "done"
task_id = await computer.python_exec_background(long_running_task)Returns: int - Task ID for tracking
pip_install(requirements)
Install Python packages in the computer.
await computer.pip_install(["requests", "pandas==2.0.0"])Virtual Environment Methods
venv_install(venv_name, requirements)
Install packages in a virtual environment.
await computer.venv_install("my_env", ["requests", "pandas"])venv_cmd(venv_name, command)
Run a shell command in a virtual environment.
result = await computer.venv_cmd("my_env", "pip list")
print(result.stdout)Returns: CommandResult with stdout, stderr, returncode
venv_exec(venv_name, func, *args, **kwargs)
Execute a Python function in a virtual environment.
def process_data(x):
import pandas as pd
return pd.DataFrame(x).to_dict()
result = await computer.venv_exec("my_env", process_data, [1, 2, 3])venv_exec_background(venv_name, func, *args, requirements=None, **kwargs)
Execute a Python function in a virtual environment in the background.
task_id = await computer.venv_exec_background("my_env", long_task)Returns: int - Task ID
See Sandboxed Python for detailed usage.
Browser Automation
playwright_exec(command, params=None)
Execute Playwright browser automation commands.
result = await computer.playwright_exec("goto", {"url": "https://example.com"})ComputerInterface
The interface for interacting with the computer's display, keyboard, and mouse. Accessed via computer.interface.
interface = computer.interfaceAll interface methods accept an optional delay parameter to add a pause after the action:
await computer.interface.left_click(500, 300, delay=0.5)Mouse Actions
left_click(x=None, y=None, delay=None)
Perform a left mouse click. If coordinates are omitted, clicks at current cursor position.
await computer.interface.left_click(500, 300)
await computer.interface.left_click() # Click at current position| Parameter | Type | Description |
|---|---|---|
x | int | None | X coordinate (optional) |
y | int | None | Y coordinate (optional) |
delay | float | None | Delay in seconds after action |
right_click(x=None, y=None, delay=None)
Perform a right mouse click.
await computer.interface.right_click(500, 300)double_click(x=None, y=None, delay=None)
Perform a double-click.
await computer.interface.double_click(500, 300)mouse_down(x=None, y=None, button="left", delay=None)
Press and hold a mouse button.
await computer.interface.mouse_down(100, 100, button="left")| Parameter | Type | Description |
|---|---|---|
button | str | Mouse button: "left", "right", or "middle" |
mouse_up(x=None, y=None, button="left", delay=None)
Release a mouse button.
await computer.interface.mouse_up(500, 500, button="left")move_cursor(x, y, delay=None)
Move the mouse cursor to the specified coordinates.
await computer.interface.move_cursor(500, 300)drag_to(x, y, button="left", duration=0.5, delay=None)
Drag from the current cursor position to the specified coordinates.
# Move cursor to start position first
await computer.interface.move_cursor(100, 100)
# Then drag to end position
await computer.interface.drag_to(500, 500, duration=1.0)| Parameter | Type | Description |
|---|---|---|
x | int | Ending X coordinate |
y | int | Ending Y coordinate |
button | str | Mouse button to use: "left", "right", or "middle" |
duration | float | Duration of drag in seconds |
drag(path, button="left", duration=0.5, delay=None)
Drag along a path of coordinates.
path = [(100, 100), (200, 150), (300, 200), (400, 250)]
await computer.interface.drag(path, duration=2.0)| Parameter | Type | Description |
|---|---|---|
path | list[tuple[int, int]] | List of (x, y) coordinate tuples |
scroll(x, y, delay=None)
Scroll by the specified amounts. Positive y scrolls up, negative scrolls down.
# Scroll down (negative y)
await computer.interface.scroll(0, -3)
# Scroll up (positive y)
await computer.interface.scroll(0, 3)
# Scroll right (positive x)
await computer.interface.scroll(3, 0)| Parameter | Type | Description |
|---|---|---|
x | int | Horizontal scroll amount (positive = right, negative = left) |
y | int | Vertical scroll amount (positive = up, negative = down) |
scroll_down(clicks=1, delay=None) / scroll_up(clicks=1, delay=None)
Convenience methods for vertical scrolling.
await computer.interface.scroll_down(3) # Scroll down 3 clicks
await computer.interface.scroll_up(2) # Scroll up 2 clicksKeyboard Actions
type_text(text, delay=None)
Type text using the keyboard.
await computer.interface.type_text("Hello, World!")| Parameter | Type | Description |
|---|---|---|
text | str | Text to type |
press(key, delay=None)
Press a single key.
from computer.interface.models import Key
# Using Key enum (recommended)
await computer.interface.press(Key.ENTER)
await computer.interface.press(Key.PAGE_DOWN)
# Using string (also supported)
await computer.interface.press("enter")| Parameter | Type | Description |
|---|---|---|
key | Key | str | Key to press (use Key enum or string) |
hotkey(*keys, delay=None)
Press a key combination.
from computer.interface.models import Key
# Copy (Ctrl+C)
await computer.interface.hotkey(Key.CTRL, Key.C)
# Paste (Ctrl+V)
await computer.interface.hotkey(Key.CTRL, Key.V)
# Save (Ctrl+S)
await computer.interface.hotkey(Key.CTRL, Key.S)
# Quit (Cmd+Q on macOS)
await computer.interface.hotkey(Key.COMMAND, Key.Q)key_down(key, delay=None) / key_up(key, delay=None)
Press and hold or release a key.
# Hold shift while clicking
await computer.interface.key_down(Key.SHIFT)
await computer.interface.left_click(500, 300)
await computer.interface.key_up(Key.SHIFT)Supported Keys (Key enum):
from computer.interface.models import Key
# Navigation
Key.PAGE_DOWN, Key.PAGE_UP, Key.HOME, Key.END
Key.LEFT, Key.RIGHT, Key.UP, Key.DOWN
# Special
Key.RETURN, Key.ENTER # Same key
Key.ESCAPE, Key.ESC # Same key
Key.TAB, Key.SPACE, Key.BACKSPACE, Key.DELETE
# Modifiers
Key.ALT, Key.CTRL, Key.SHIFT
Key.WIN # Windows key
Key.COMMAND # macOS Cmd key
Key.OPTION # macOS Option key
# Function keys
Key.F1, Key.F2, Key.F3, Key.F4, Key.F5, Key.F6
Key.F7, Key.F8, Key.F9, Key.F10, Key.F11, Key.F12
# Letters and numbers can be strings or Key.A, Key.B, etc.Screen Methods
screenshot(boxes=None, box_color="#FF0000", box_thickness=2, scale_factor=1.0)
Capture the current screen with optional box overlays.
# Basic screenshot
screenshot_bytes = await computer.interface.screenshot()
# Screenshot with bounding boxes
boxes = [
{"x": 100, "y": 100, "width": 200, "height": 150},
{"x": 400, "y": 300, "width": 100, "height": 100}
]
screenshot_bytes = await computer.interface.screenshot(
boxes=boxes,
box_color="#00FF00",
box_thickness=3
)
# Screenshot scaled down to 50%
screenshot_bytes = await computer.interface.screenshot(scale_factor=0.5)Returns: bytes - Raw image data (PNG format)
get_screen_size()
Get the screen dimensions.
size = await computer.interface.get_screen_size()
width = size["width"]
height = size["height"]Returns: dict[str, int] - Dictionary with "width" and "height" keys
get_cursor_position()
Get the current cursor position.
pos = await computer.interface.get_cursor_position()
x = pos["x"]
y = pos["y"]Returns: dict[str, int] - Dictionary with "x" and "y" keys
Coordinate Conversion
to_screen_coordinates(x, y)
Convert screenshot coordinates to screen coordinates.
screen_x, screen_y = await computer.interface.to_screen_coordinates(100, 100)Returns: tuple[float, float] - Screen coordinates
to_screenshot_coordinates(x, y)
Convert screen coordinates to screenshot coordinates.
ss_x, ss_y = await computer.interface.to_screenshot_coordinates(1920, 1080)Returns: tuple[float, float] - Screenshot coordinates
Clipboard Methods
copy_to_clipboard()
Get the current clipboard contents.
text = await computer.interface.copy_to_clipboard()Returns: str - Clipboard text content
set_clipboard(text)
Set the clipboard contents.
await computer.interface.set_clipboard("Text to copy")| Parameter | Type | Description |
|---|---|---|
text | str | Text to set in clipboard |
Shell Methods
run_command(command)
Execute a shell command in the computer.
result = await computer.interface.run_command("ls -la")
print(result.stdout)
print(result.stderr)
print(result.returncode)| Parameter | Type | Description |
|---|---|---|
command | str | Shell command to execute |
Returns: CommandResult with properties:
stdout: str- Standard outputstderr: str- Standard errorreturncode: int- Exit code (0 = success)
File Methods
read_text(path, encoding="utf-8")
Read a text file from the computer.
content = await computer.interface.read_text("/home/user/file.txt")Returns: str - File contents
write_text(path, content, encoding="utf-8")
Write text content to a file.
await computer.interface.write_text("/home/user/file.txt", "Hello!")read_bytes(path, offset=0, length=None)
Read a file as bytes with optional seeking.
# Read entire file
data = await computer.interface.read_bytes("/home/user/image.png")
# Read 1024 bytes starting at offset 512
data = await computer.interface.read_bytes("/home/user/file.bin", offset=512, length=1024)Returns: bytes - File contents
write_bytes(path, content, append=False)
Write binary content to a file.
await computer.interface.write_bytes("/home/user/image.png", image_bytes)
# Append to file
await computer.interface.write_bytes("/home/user/log.bin", data, append=True)file_exists(path) / directory_exists(path)
Check if a file or directory exists.
if await computer.interface.file_exists("/home/user/file.txt"):
print("File exists")
if await computer.interface.directory_exists("/home/user/documents"):
print("Directory exists")Returns: bool - True if exists
get_file_size(path)
Get the size of a file in bytes.
size = await computer.interface.get_file_size("/home/user/file.txt")Returns: int - File size in bytes
list_dir(path)
List contents of a directory.
files = await computer.interface.list_dir("/home/user")
for file in files:
print(file)Returns: list[str] - List of file/directory names
create_dir(path) / delete_dir(path) / delete_file(path)
Create or delete files and directories.
await computer.interface.create_dir("/home/user/new_folder")
await computer.interface.delete_file("/home/user/old_file.txt")
await computer.interface.delete_dir("/home/user/old_folder")Window Management
launch(application, args=None)
Launch an application.
await computer.interface.launch("xfce4-terminal")
await computer.interface.launch("firefox", ["--private-window"])Returns: int | None - Window ID if available
open(uri)
Open a URL or file with the default application.
await computer.interface.open("https://www.google.com")
await computer.interface.open("/home/user/document.pdf")get_current_window_id()
Get the active window ID.
window_id = await computer.interface.get_current_window_id()Returns: int | str - Window ID
get_application_windows(app_name)
Get window IDs for an application.
windows = await computer.interface.get_application_windows("firefox")
for window_id in windows:
print(window_id)Returns: list[int | str] - List of window IDs
get_window_name(window_id) / get_window_title(window_id)
Get the title of a window.
title = await computer.interface.get_window_name(window_id)Returns: str - Window title
get_window_size(window_id) / window_size(window_id)
Get window dimensions.
width, height = await computer.interface.get_window_size(window_id)Returns: tuple[int, int] - Width and height in pixels
set_window_size(window_id, width, height)
Set window dimensions.
await computer.interface.set_window_size(window_id, 1200, 800)get_window_position(window_id)
Get window position on screen.
x, y = await computer.interface.get_window_position(window_id)Returns: tuple[int, int] - X and Y coordinates
set_window_position(window_id, x, y)
Set window position on screen.
await computer.interface.set_window_position(window_id, 100, 100)maximize_window(window_id) / minimize_window(window_id)
Change window state.
await computer.interface.maximize_window(window_id)
await computer.interface.minimize_window(window_id)activate_window(window_id)
Bring a window to focus.
await computer.interface.activate_window(window_id)close_window(window_id)
Close a window.
await computer.interface.close_window(window_id)Accessibility
get_accessibility_tree()
Get the accessibility tree for the current screen.
tree = await computer.interface.get_accessibility_tree()Returns: dict - Accessibility tree structure with UI element information
get_active_window_bounds()
Get the bounds of the active window.
bounds = await computer.interface.get_active_window_bounds()
x = bounds["x"]
y = bounds["y"]
width = bounds["width"]
height = bounds["height"]Returns: dict[str, int] - Dictionary with "x", "y", "width", "height"
Advanced Methods
get_desktop_environment()
Get the desktop environment name.
de = await computer.interface.get_desktop_environment()
# Returns "XFCE", "GNOME", "KDE", etc.Returns: str - Desktop environment name
set_wallpaper(path)
Set the desktop wallpaper.
await computer.interface.set_wallpaper("/home/user/wallpaper.jpg")playwright_exec(command, params=None)
Execute Playwright browser automation commands.
result = await computer.interface.playwright_exec("goto", {"url": "https://example.com"})Returns: dict - Command result
Tracing
The tracing subsystem records computer interactions. Accessed via computer.tracing.
tracing = computer.tracingMethods
start(options)
Start recording interactions.
await computer.tracing.start({
"name": "my-workflow",
"screenshots": True,
"api_calls": True,
"accessibility_tree": False,
"metadata": True
})| Option | Type | Default | Description |
|---|---|---|---|
name | str | Auto-generated | Custom trace name |
screenshots | bool | True | Capture screenshots |
api_calls | bool | True | Log interface calls |
accessibility_tree | bool | False | Record accessibility trees |
metadata | bool | True | Enable custom metadata |
stop(options)
Stop recording and save the trace.
trace_path = await computer.tracing.stop({
"format": "zip", # or "dir"
"path": "/custom/path.zip" # optional
})Returns: str - Path to saved trace
add_metadata(key, value)
Add custom metadata to the trace.
await computer.tracing.add_metadata("workflow", "login-flow")
await computer.tracing.add_metadata("step", "entering-credentials")Provider Types
Different providers offer different capabilities and trade-offs.
Docker Provider
computer = Computer(
os_type="linux",
provider_type="docker",
image="trycua/cua-xfce:latest"
)Best for Linux sandboxes with fast startup. Requires Docker to be installed.
Lume Provider
computer = Computer(
os_type="macos",
provider_type="lume",
name="my-macos-vm"
)For macOS virtual machines on Apple Silicon. Requires Lume to be installed.
Cloud Provider
from computer import Computer
computer = Computer(
os_type="linux",
provider_type="cloud",
api_key="your-api-key" # or set CUA_API_KEY env var
)For managed cloud sandboxes. See CloudProvider for management API.
Windows Sandbox Provider
computer = Computer(
os_type="windows",
provider_type="windows-sandbox"
)For Windows sandboxes on Windows hosts. Requires Windows Sandbox feature enabled.
QEMU Provider
computer = Computer(
os_type="linux",
provider_type="qemu",
image="/path/to/disk.qcow2"
)For full VM emulation with QEMU. Supports any guest OS.
Host Provider
computer = Computer(
os_type="macos", # or current host OS
provider_type="host"
)Directly controls the host machine. Use with caution.
CloudProvider
The CloudProvider class enables programmatic management of cloud sandboxes.
from computer.providers.cloud.provider import CloudProvider
# Automatically reads CUA_API_KEY from environment
provider = CloudProvider(verbose=False)
async with provider:
vms = await provider.list_vms()Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | CUA_API_KEY env var | API key for authentication |
verbose | bool | False | Enable verbose logging |
Methods
list_vms()
List all sandboxes.
async with provider:
vms = await provider.list_vms()
for vm in vms:
print(f"{vm['name']}: {vm['status']}")get_vm(name)
Get details for a specific sandbox.
info = await provider.get_vm("my-vm-name")run_vm(name)
Start a sandbox.
resp = await provider.run_vm("my-vm-name")
# {"name": "my-vm-name", "status": "starting"}stop_vm(name)
Stop a sandbox.
resp = await provider.stop_vm("my-vm-name")
# {"name": "my-vm-name", "status": "stopping"}restart_vm(name)
Restart a sandbox.
resp = await provider.restart_vm("my-vm-name")
# {"name": "my-vm-name", "status": "restarting"}Sandbox Status Values
| Status | Description |
|---|---|
pending | Deployment in progress |
running | Active and accessible |
stopped | Stopped but not terminated |
terminated | Permanently destroyed |
failed | Deployment or operation failed |
HTTP API
You can also manage sandboxes via HTTP:
# List sandboxes
curl -H "Authorization: Bearer $CUA_API_KEY" \
"https://api.cua.ai/v1/vms"
# Start sandbox
curl -X POST -H "Authorization: Bearer $CUA_API_KEY" \
"https://api.cua.ai/v1/vms/my-vm-name/start"
# Stop sandbox
curl -X POST -H "Authorization: Bearer $CUA_API_KEY" \
"https://api.cua.ai/v1/vms/my-vm-name/stop"
# Restart sandbox
curl -X POST -H "Authorization: Bearer $CUA_API_KEY" \
"https://api.cua.ai/v1/vms/my-vm-name/restart"Types
OSType
from computer import OSType
OSType.LINUX # "linux"
OSType.MACOS # "macos"
OSType.WINDOWS # "windows"ProviderType
from computer import ProviderType
ProviderType.DOCKER # "docker"
ProviderType.LUME # "lume"
ProviderType.CLOUD # "cloud"
ProviderType.QEMU # "qemu"
ProviderType.WINDOWS_SANDBOX # "windows-sandbox"
ProviderType.HOST # "host"Key
Enum for keyboard keys with cross-platform support.
from computer.interface.models import Key
# Use in keyboard methods
await computer.interface.press(Key.ENTER)
await computer.interface.hotkey(Key.CTRL, Key.C)Available Keys:
| Category | Keys |
|---|---|
| Navigation | PAGE_DOWN, PAGE_UP, HOME, END, LEFT, RIGHT, UP, DOWN |
| Special | RETURN/ENTER, ESCAPE/ESC, TAB, SPACE, BACKSPACE, DELETE |
| Modifiers | ALT, CTRL, SHIFT, WIN, COMMAND, OPTION |
| Function | F1 through F12 |
| Letters | A through Z |
| Numbers | N0 through N9 |
CommandResult
Result from run_command() calls.
result = await computer.interface.run_command("echo hello")
result.stdout # "hello\n"
result.stderr # ""
result.returncode # 0| Property | Type | Description |
|---|---|---|
stdout | str | Standard output |
stderr | str | Standard error |
returncode | int | Exit code (0 = success) |
Environment Variables
| Variable | Description |
|---|---|
CUA_API_KEY | API key for cloud provider |
CUA_REGION | Default region for cloud provider |
DOCKER_HOST | Custom Docker host for Docker provider |
LUME_HOST | Custom Lume API host (default: localhost:7777) |
Context Manager Usage
The Computer class supports async context managers for automatic cleanup:
from computer import Computer
async with Computer(
os_type="linux",
provider_type="docker",
image="trycua/cua-xfce:latest"
) as computer:
await computer.interface.type_text("Hello!")
# Computer automatically stopped on exitCommon Patterns
Click and Type
# Click a text field and type
await computer.interface.left_click(500, 300)
await computer.interface.type_text("Hello, World!")
await computer.interface.press(Key.ENTER)Drag and Drop
# Method 1: Using drag_to
await computer.interface.move_cursor(100, 100)
await computer.interface.drag_to(500, 500)
# Method 2: Using mouse_down/up
await computer.interface.mouse_down(100, 100)
await computer.interface.move_cursor(500, 500)
await computer.interface.mouse_up()Keyboard Shortcuts
from computer.interface.models import Key
# Copy
await computer.interface.hotkey(Key.CTRL, Key.C)
# Paste
await computer.interface.hotkey(Key.CTRL, Key.V)
# Select All
await computer.interface.hotkey(Key.CTRL, Key.A)
# Undo
await computer.interface.hotkey(Key.CTRL, Key.Z)
# macOS uses Command key
await computer.interface.hotkey(Key.COMMAND, Key.Q) # QuitFile Operations
# Read a text file
content = await computer.interface.read_text("/home/user/config.json")
data = json.loads(content)
# Write a text file
await computer.interface.write_text("/home/user/output.txt", "Results")
# Read binary file
image_data = await computer.interface.read_bytes("/home/user/photo.jpg")
# Write binary file
await computer.interface.write_bytes("/home/user/output.png", image_bytes)
# Check if file exists
if await computer.interface.file_exists("/home/user/data.csv"):
content = await computer.interface.read_text("/home/user/data.csv")Was this page helpful?