FAQ
Frequently asked questions about Cua Driver
Common gotchas and questions. For the full action loop and tool semantics, see the CLI reference and MCP tools reference.
The action loop
Why do I get No cached AX state?
Element-indexed actions read an in-memory cache keyed on (pid, window_id). The cache is populated by get_window_state and replaced on every snapshot.
Two common causes:
- You didn't call
get_window_statein the current turn. - You called it with a different
window_idthan the one in the action.
# Populate the cache for this exact (pid, window_id).
cua-driver get_window_state '{"pid":844,"window_id":10725}'
# Then act with the same window_id.
cua-driver click '{"pid":844,"window_id":10725,"element_index":14}'If you're running one-shot CLI invocations without a daemon, the cache lives for one process lifetime. Start the daemon first:
open -n -g -a CuaDriver --args serveWhy did my screenshot come back empty (has_screenshot: false)?
The window capture raced against a close, or the window has no backing store yet. Re-snapshot. If it persists, pick a different window_id via list_windows.
The AX tree is tiny. What's happening?
Check the capture mode:
cua-driver config get capture_modeDefault is som (tree + screenshot). If it reads vision, get_window_state omits the tree by design (PNG only). Switch back to som to get both:
cua-driver config set capture_mode somIf the mode is som or ax and the tree is still small, the target uses custom rendering (Blender, Unity, Electron with AX disabled). For Chromium/Electron, retry get_window_state once — the tree populates on the second call. For canvas-backed apps, reach for pixel clicks instead.
Window state
My keyboard commit (Return, Space, Tab) on a minimized window silently no-ops.
Minimized windows receive AX reads and AX-dispatched clicks normally, but keyboard commits fail because AX focus doesn't propagate to renderer focus on a minimized window. You hear the macOS system-alert beep, or nothing happens.
Workarounds in order of preference:
- Use
set_valueto write the field's entire value directly. Bypasses keyboard commits. - AX-click a commit-equivalent button (Go, Submit, checkbox). Clicks route through
AXPressand don't need renderer focus. - Last resort: ask the user to un-minimize the window. Don't deminiaturize programmatically — layout-disrupting on many apps.
My backgrounded SwiftUI app (System Settings) returns an almost-empty AX tree.
Windows on another Space often strip their AX tree to the menu bar on SwiftUI apps. AppKit apps are usually fine.
get_window_state returns off_space: true plus window_space_ids when this happens, so you can detect it. Solutions:
- Ask the user to Mission-Control back to the Space that holds the target.
- Drive the app through in-window toolbar buttons (which often stay exposed) rather than deep nested controls.
- Accept the limitation for the current session.
Browsers and Electron
Right-click on Chromium web content fires as a left-click.
A known Chromium renderer-IPC limit: the filter coerces synthetic right-click subtype to left on every non-HID-tap path. Use right_click({pid, element_index}) on AX-addressable targets (links, buttons, toolbar items). For web content itself (right-clicking an image or selection), there is no backgrounded path today. See Limits for the full note.
Pixel click on a YouTube video doesn't play or pause.
HTML5's click-to-play handler rejects some synthetic click paths. Use keyboard instead:
cua-driver press_key '{"pid":<pid>,"key":"k"}' # YouTube play/pause
cua-driver press_key '{"pid":<pid>,"key":"space"}' # generic video play/pauseKeyboard events travel through a different auth envelope and reach the page.
How do I navigate to a URL in Chrome without stealing focus?
Pass the URL to launch_app:
cua-driver launch_app '{"bundle_id":"com.google.Chrome","urls":["https://trycua.com"]}'The URL opens in a new window via Chrome's application(_:open:) delegate. The driver's focus-restore guard catches Chrome's internal activation and clobbers the frontmost back to what it was before the call.
Don't use hotkey ⌘L to focus the omnibox. Even when delivered to a backgrounded pid, ⌘L steals
focus because the receiving app interprets "user wants to type here" as activation intent.
The agent cursor
How do I disable the visual cursor overlay?
cua-driver set_agent_cursor_enabled '{"enabled":false}'Or via config:
cua-driver config set agent_cursor.enabled falseThe overlay only renders when the driver has an AppKit run loop (inside cua-driver serve or cua-driver mcp). One-shot CLI invocations skip it entirely.
Can I make the cursor move faster?
Tune the motion knobs:
cua-driver set_agent_cursor_motion '{"glide_duration_ms":300}'See set_agent_cursor_motion in the MCP tools reference for every knob.
Permissions
check_permissions says NOT granted but I granted both.
TCC checks the calling process, not CuaDriver.app. Inside IDE terminals (Claude Code, Cursor, VS Code, Conductor), the shell inherits the IDE's TCC responsibility chain. So running cua-driver check_permissions in one of those shells reads against the IDE's bundle, not com.trycua.driver.
Start the daemon first, which runs through LaunchServices under the CuaDriver bundle:
open -n -g -a CuaDriver --args serve
cua-driver check_permissions # forwards to the daemon — authoritative answerI keep seeing the permissions dialog on every launch.
macOS is attributing the process to a different bundle id than the one you granted. Run cua-driver diagnose and share the output when filing an issue. It reports cdhash, team id, and which bundle TCC matched against.
Config and telemetry
Where does config live?
~/Library/Application Support/Cua Driver/config.jsonRead and write via cua-driver config:
cua-driver config # show full config
cua-driver config get capture_mode
cua-driver config set capture_mode som
cua-driver config reset # overwrite with defaultsHow do I opt out of telemetry?
cua-driver config telemetry disableOr set CUA_DRIVER_TELEMETRY_ENABLED=0 in the environment for a one-off override.
Telemetry records anonymous subcommand usage (cua_driver_api_click, cua_driver_serve, etc). No command arguments, file paths, or personal information are collected.
Testing
Are there tests I can run against my install?
The project includes Python integration tests under libs/cua-driver/tests/ that exercise the real cua-driver stdio server against unittest. They run as part of scripts/test.sh:
cd libs/cua-driver
./scripts/test.shFor a quick manual smoke check, the Calculator test from libs/cua-driver/Skills/cua-driver/TESTS.md is a good five-minute run: launch Calculator hidden, snapshot, click 17 × 23 by element index, re-snapshot, verify the display reads 391 and Calculator never came to the foreground.
Was this page helpful?