CLI Reference
Command Line Interface reference for Cua Driver
cua-driver is a single-binary CLI. Two naming conventions divide its surface:
- Tool names are
snake_case(launch_app,get_window_state,click). Invoke them ascua-driver <tool> '<JSON-args>'— the CLI routes through the sameToolRegistrythe MCP server uses. - Management subcommands are
kebab-case(list-tools,describe,mcp-config). These never take JSON args.
Different separators mean no ambiguity. Unknown first-positional args dispatch to the call subcommand automatically, so cua-driver list_apps is shorthand for cua-driver call list_apps.
Quick start
# Start the persistent daemon (required for element_index workflows).
open -n -g -a CuaDriver --args serve
# Drive an app.
cua-driver launch_app '{"bundle_id":"com.apple.calculator"}'
cua-driver get_window_state '{"pid":844,"window_id":10725}'
cua-driver click '{"pid":844,"window_id":10725,"element_index":14}'
# Stop the daemon.
cua-driver stopTool dispatch
cua-driver call
Invoke any MCP tool from the shell.
cua-driver call <tool-name> '<JSON-args>'
# Shorthand — any unknown first positional arg auto-prefixes `call`:
cua-driver <tool-name> '<JSON-args>'Arguments:
<tool-name>— Name of the tool to invoke. Runcua-driver list-toolsfor the full list.<json-args>— JSON object matching the tool'sinputSchema. Omit when stdin is a pipe (JSON is read from stdin) or when the tool takes no arguments.
Flags:
--raw— Print the rawCallTool.ResultJSON (content + structuredContent + isError) instead of unwrappingstructuredContent.--image-out <path>— Write the first image content block from the response topath(PNG bytes). The default text formatter would drop image content otherwise.--compact— Emit minified JSON instead of pretty-printed.--no-daemon— Skip the running daemon and run the tool in-process. Element-indexed workflows fail without a daemon because the per-pid cache dies between CLI invocations.--socket <path>— Override the daemon Unix socket path.
Examples:
cua-driver call list_apps
cua-driver call launch_app '{"bundle_id":"com.apple.finder"}'
echo '{"pid":844,"window_id":1234}' | cua-driver call get_window_state
cua-driver get_window_state '{"pid":844,"window_id":1234}' --image-out /tmp/shot.pngcua-driver list-tools
List every MCP tool exposed by the driver with a one-line summary.
cua-driver list-toolsFlags: --no-daemon, --socket <path>.
cua-driver describe
Print a tool's full description and JSON input schema.
cua-driver describe <tool-name>Flags: --compact, --no-daemon, --socket <path>.
Daemon management
cua-driver serve
Run cua-driver as a long-running daemon on a Unix domain socket. Required for any workflow that uses element_index dispatch: the per-pid element cache lives in-process and survives only between CLI calls routed to the same daemon.
# Recommended form — routes through LaunchServices for correct TCC context.
open -n -g -a CuaDriver --args serve
# Alternate — auto-relaunches via open when TCC context is wrong.
cua-driver serve &Flags:
--socket <path>— Override the Unix socket path. Default:~/Library/Caches/cua-driver/cua-driver.sock.--no-relaunch— Stay in the current process instead of relaunching via LaunchServices. Also toggleable viaCUA_DRIVER_NO_RELAUNCH=1. Use when the calling context already has the right TCC responsibility.
cua-driver stop
Ask the running daemon to exit gracefully. Polls for the socket file to vanish (up to 2s) as proof of clean shutdown.
cua-driver stopFlags: --socket <path>.
cua-driver status
Report whether a daemon is currently reachable. Probes by sending a trivial list request — connecting alone doesn't prove the peer speaks the protocol.
cua-driver status
# cua-driver daemon is running
# socket: /Users/you/Library/Caches/cua-driver/cua-driver.sock
# pid: 12345Flags: --socket <path>, --pid-file <path>.
cua-driver mcp
Run the stdio MCP server. MCP clients (Claude Code, Cursor, custom SDK clients) spawn this on demand.
cua-driver mcpNo flags. Use cua-driver mcp-config to generate a paste-able client config snippet.
cua-driver mcp-config
Print MCP server config or a client-specific install command.
Flags:
--client <name>(optional): one ofclaude,codex,cursor,openclaw,opencode,hermes,pi. Omit for the generic JSON snippet that any MCP client accepts.
Generic JSON (default):
cua-driver mcp-config
# {
# "mcpServers": {
# "cua-driver": {
# "command": "/Applications/CuaDriver.app/Contents/MacOS/cua-driver",
# "args": ["mcp"]
# }
# }
# }Client-specific install commands:
| Client | Output |
|---|---|
claude | claude mcp add --transport stdio cua-driver -- <path> mcp |
codex | codex mcp add cua-driver -- <path> mcp |
openclaw | openclaw mcp set cua-driver '{...}' (CLI registry) |
cursor | JSON for ~/.cursor/mcp.json (no CLI) |
opencode | JSON snippet for opencode.json (type: "local") |
hermes | YAML for ~/.hermes/config.yaml, then /reload-mcp inside Hermes |
pi | Points at the CLI path (Pi has no MCP — shell-out to cua-driver directly) |
Pipe to pbcopy to put any output on the clipboard, or pass the printed claude / codex / openclaw commands straight to a shell.
Trajectory recording
cua-driver recording start
Enable the trajectory recorder. Every subsequent action-tool call (click, right_click, scroll, type_text, type_text_chars, press_key, hotkey, set_value) writes a numbered turn folder under <output-dir>.
cua-driver recording start ~/cua-trajectories/demo1Arguments:
<output-dir>— Directory to write turn folders into. Expands~; created if missing.
Flags:
--video-experimental— Also capture the main display to<output-dir>/recording.mp4via SCStream (H.264, 30fps, no audio, no cursor). Experimental.--socket <path>— Override the daemon socket path.
Requires a running daemon.
cua-driver recording stop
Disable recording. Prints the captured turn count and directory.
cua-driver recording stop
# Recording disabled (23 turns captured in /Users/you/cua-trajectories/demo1)cua-driver recording status
Report whether recording is currently enabled.
cua-driver recording status
# Recording: enabled
# Output dir: /Users/you/cua-trajectories/demo1
# Next turn: 24cua-driver recording render
Render a recording directory to a zoomed-on-click MP4. Post-processes the captured recording.mp4, cursor.jsonl, and turn-*/action.json files.
cua-driver recording render ~/cua-rec --output /tmp/out.mp4
cua-driver recording render ~/cua-rec --output /tmp/baseline.mp4 --no-zoom
cua-driver recording render ~/cua-rec --output /tmp/out.mp4 --scale 2.5Arguments:
<input-dir>— Recording directory (containssession.json,recording.mp4,cursor.jsonl,turn-*/).
Flags:
--output <path>— Destination MP4 path. Overwrites any existing file.--no-zoom— Skip the zoom curve and re-encode the input as-is. Useful as a baseline check.--scale <factor>— Zoom factor applied to each click event. Default2.0. Set to1.0to disable zoom;2.0is 2× magnification.
recording render runs in the CLI process directly — it does not require a running daemon.
Configuration
cua-driver config
Read and write persistent settings at ~/Library/Application Support/Cua Driver/config.json.
cua-driver config # print full config
cua-driver config get <key>
cua-driver config set <key> <value>
cua-driver config reset # overwrite with defaultsSupported keys:
capture_mode—som|ax|vision. Defaultsom.max_image_dimension— integer. PNG long-side cap. Default 1568.agent_cursor.enabled— boolean.agent_cursor.motion.start_handle— number in [0, 1].agent_cursor.motion.end_handle— number in [0, 1].agent_cursor.motion.arc_size— number (fraction of path length).agent_cursor.motion.arc_flow— number in [-1, 1].agent_cursor.motion.spring— number in [0.3, 1.0].
Flags on set: --socket <path>.
Writes route through the daemon when one's reachable so live state (e.g. AgentCursor.shared) picks up the change without a restart.
cua-driver config telemetry
Manage anonymous telemetry.
cua-driver config telemetry status
cua-driver config telemetry enable
cua-driver config telemetry disableEnvironment override: CUA_DRIVER_TELEMETRY_ENABLED=0|1.
cua-driver config updates
Manage automatic updates.
cua-driver config updates status
cua-driver config updates enable
cua-driver config updates disableEnvironment override: CUA_DRIVER_AUTO_UPDATE_ENABLED=0|1.
Diagnostics
cua-driver diagnose
Print a paste-able state report for support. Covers: running-process identity (path, bundle id, pid, cdhash), TCC probe results, install layout (/Applications/CuaDriver.app + codesign info, ~/.local/bin/cua-driver symlink resolution), TCC database rows for com.trycua.driver, and config + state paths.
cua-driver diagnoseUse this when filing an issue about permissions or install problems.
cua-driver update
Check GitHub for a newer cua-driver release and optionally apply it.
cua-driver update # report what's available
cua-driver update --apply # download and install the latest releaseFlags: --apply — download and apply the update without prompting. Without this flag, update only reports the available version and the install command.
cua-driver doctor
Clean up stale install bits left from older cua-driver versions (≤ v0.0.5) — the legacy weekly LaunchAgent and the /usr/local/bin/cua-driver-update companion script that v0.0.6 dropped.
cua-driver doctorIdempotent and safe to re-run. Prints Nothing to clean — install is up to date. when there's nothing to do. The ~/Library/LaunchAgents/... plist is removed automatically; the /usr/local/bin/cua-driver-update script (root-owned) is surfaced as a sudo rm -f command for the user to run manually.
Global options
Available on all commands:
--help— Show help information.--version— Show version number.
Tool inventory
The following MCP tools are callable via cua-driver <tool>. For input schemas and response shapes, see the MCP tools reference.
Discovery and permissions
list_apps— running + installed apps with pid / bundle id / active state.list_windows— every layer-0 top-level window (including off-screen).check_permissions— Accessibility + Screen Recording TCC status.get_screen_size— main display size in points + scale factor.get_cursor_position— current mouse cursor position.get_accessibility_tree— lightweight desktop snapshot (apps + visible windows).
App lifecycle
launch_app— launch hidden; returns pid + windows array.
Snapshot
get_window_state— per-window AX tree + screenshot. Populates the element_index cache.screenshot— raw ScreenCaptureKit capture. Full display or single window.zoom— native-resolution crop of a previously captured window region.
Mouse
click— left-click byelement_indexor(x, y).double_click— double-click byelement_index(AXOpen) or(x, y).right_click— right-click byelement_index(AXShowMenu) or(x, y).move_cursor— warp the real cursor to(x, y).
Keyboard
type_text— insert text viaAXSelectedText. Pid-scoped.type_text_chars— character-by-character viaCGEvent.postToPid. Reaches Chromium/Electron inputs.press_key— single key press. Pid-scoped.hotkey— modifier combo (e.g.["cmd","c"]). Pid-scoped.
Element attributes
set_value— write an element'sAXValuedirectly. For sliders, steppers, and text fields.scroll— synthesize PageUp/PageDown/arrow keystrokes against the target pid.
Agent cursor overlay
set_agent_cursor_enabled— toggle the visual overlay.set_agent_cursor_motion— tune the Bezier-arc + spring motion knobs.get_agent_cursor_state— read the current overlay configuration.
Config
get_config— report persistent config as JSON.set_config— write a single dotted-path config key.
Recording
set_recording— toggle the trajectory recorder.get_recording_state— report recorder state.replay_trajectory— re-invoke every turn's tool call in lexical order.
Was this page helpful?