FAQ
Frequently asked questions about Cua Driver
Common gotchas and questions. For the full action loop and tool semantics, see the CLI reference and MCP tools reference.
The action loop
Why do I get No cached AX state?
Element-indexed actions read an in-memory cache keyed on (pid, window_id). The cache is populated by get_window_state and replaced on every snapshot.
Two common causes:
- You didn't call
get_window_statein the current turn. - You called it with a different
window_idthan the one in the action.
# Populate the cache for this exact (pid, window_id).
cua-driver get_window_state '{"pid":844,"window_id":10725}'
# Then act with the same window_id.
cua-driver click '{"pid":844,"window_id":10725,"element_index":14}'If you're running one-shot CLI invocations without a daemon, the cache lives for one process lifetime. Start the daemon first:
open -n -g -a CuaDriver --args serveWhy did my screenshot come back empty (has_screenshot: false)?
The window capture raced against a close, or the window has no backing store yet. Re-snapshot. If it persists, pick a different window_id via list_windows.
screenshot or get_window_state fails with "ScreenCaptureKit refused this window" / "Could not start streaming".
Known macOS 26.4.x ScreenCaptureKit regression on physical Macs (SCStreamError code -3801, sometimes localized — e.g. Japanese "オーディオ/ビデオの取り込みがうまくいかなかったため、ストリーミングを開始できませんでした"). The driver already:
- Retries the SCK call once after a brief delay (covers transient failures).
- Falls back to the legacy
CGWindowListCreateImagepath (works on many windows the SCK regression breaks).
If both refuse, the error surfaces with an actionable hint. Workarounds in order of preference:
-
Try a different
window_idon the same app. Usually only one specific window is hit. -
Switch to AX-only capture for that workflow. Element-indexed clicks don't need pixels:
cua-driver config set capture_mode axget_window_statethen returns the AX tree without attempting a screenshot, andclick({pid, window_id, element_index: N})works as before. -
Re-snapshot a moment later. The failure is sometimes transient.
get_window_state does not hard-fail on this error: the AX tree still ships in the response with a warning line, so element-indexed clicks keep working even when the screenshot is unavailable. The standalone screenshot tool does hard-fail (no AX tree to fall back to).
The AX tree is tiny. What's happening?
Check the capture mode:
cua-driver config get capture_modeDefault is som (tree + screenshot). If it reads vision, get_window_state omits the tree by design (PNG only). Switch back to som to get both:
cua-driver config set capture_mode somIf the mode is som or ax and the tree is still small, the target uses custom rendering (Blender, Unity, Electron with AX disabled). For Chromium/Electron, retry get_window_state once — the tree populates on the second call. For canvas-backed apps, reach for pixel clicks instead.
Window state
My keyboard commit (Return, Space, Tab) on a minimized window silently no-ops.
Minimized windows receive AX reads and AX-dispatched clicks normally, but keyboard commits fail because AX focus doesn't propagate to renderer focus on a minimized window. You hear the macOS system-alert beep, or nothing happens.
Workarounds in order of preference:
- Use
set_valueto write the field's entire value directly. Bypasses keyboard commits. - AX-click a commit-equivalent button (Go, Submit, checkbox). Clicks route through
AXPressand don't need renderer focus. - Last resort: ask the user to un-minimize the window. Don't deminiaturize programmatically — layout-disrupting on many apps.
My backgrounded SwiftUI app (System Settings) returns an almost-empty AX tree.
Windows on another Space often strip their AX tree to the menu bar on SwiftUI apps. AppKit apps are usually fine.
get_window_state returns off_space: true plus window_space_ids when this happens, so you can detect it. Solutions:
- Ask the user to Mission-Control back to the Space that holds the target.
- Drive the app through in-window toolbar buttons (which often stay exposed) rather than deep nested controls.
- Accept the limitation for the current session.
Browsers and Electron
Right-click on Chromium web content fires as a left-click.
A known Chromium renderer-IPC limit: the filter coerces synthetic right-click subtype to left on every non-HID-tap path. Use right_click({pid, element_index}) on AX-addressable targets (links, buttons, toolbar items). For web content itself (right-clicking an image or selection), there is no backgrounded path today. See Limits for the full note.
Pixel click on a YouTube video doesn't play or pause.
HTML5's click-to-play handler rejects some synthetic click paths. Use keyboard instead:
cua-driver press_key '{"pid":<pid>,"key":"k"}' # YouTube play/pause
cua-driver press_key '{"pid":<pid>,"key":"space"}' # generic video play/pauseKeyboard events travel through a different auth envelope and reach the page.
How do I navigate to a URL in Chrome without stealing focus?
Pass the URL to launch_app:
cua-driver launch_app '{"bundle_id":"com.google.Chrome","urls":["https://trycua.com"]}'The URL opens in a new window via Chrome's application(_:open:) delegate. The driver's focus-restore guard catches Chrome's internal activation and clobbers the frontmost back to what it was before the call.
Don't use hotkey ⌘L to focus the omnibox. Even when delivered to a backgrounded pid, ⌘L steals
focus because the receiving app interprets "user wants to type here" as activation intent.
The agent cursor
How do I disable the visual cursor overlay?
cua-driver set_agent_cursor_enabled '{"enabled":false}'Or via config:
cua-driver config set agent_cursor.enabled falseThe overlay only renders when the driver has an AppKit run loop (inside cua-driver serve or cua-driver mcp). One-shot CLI invocations skip it entirely.
Can I make the cursor move faster?
Tune the motion knobs:
cua-driver set_agent_cursor_motion '{"glide_duration_ms":300}'See set_agent_cursor_motion in the MCP tools reference for every knob.
Permissions
check_permissions says NOT granted but I granted both.
TCC checks the calling process, not CuaDriver.app. Inside IDE terminals (Claude Code, Cursor, VS Code, Conductor), the shell inherits the IDE's TCC responsibility chain. So running cua-driver check_permissions in one of those shells reads against the IDE's bundle, not com.trycua.driver.
Start the daemon first, which runs through LaunchServices under the CuaDriver bundle:
open -n -g -a CuaDriver --args serve
cua-driver check_permissions # forwards to the daemon — authoritative answercua-driver mcp from an IDE terminal can't see Accessibility.
This is the same TCC-attribution issue as the previous question, applied to the stdio MCP server. cua-driver mcp detects it and auto-launches the daemon via open -n -g -a CuaDriver --args serve, then proxies every MCP tool call through the daemon's Unix socket. From the MCP client's perspective nothing changes — the same stdio server, the same tool names, the same response shapes — but every AX probe now hits a process that LaunchServices attributed to CuaDriver.app. No Python bridge needed. Force the in-process path with --no-daemon-relaunch (or CUA_DRIVER_MCP_NO_RELAUNCH=1) if you really want it, e.g. when mcp is launched from CuaDriver.app directly.
I keep seeing the permissions dialog on every launch.
macOS is attributing the process to a different bundle id than the one you granted. Run cua-driver diagnose and share the output when filing an issue. It reports cdhash, team id, and which bundle TCC matched against.
Config and telemetry
Where does config live?
~/Library/Application Support/Cua Driver/config.jsonRead and write via cua-driver config:
cua-driver config # show full config
cua-driver config get capture_mode
cua-driver config set capture_mode som
cua-driver config reset # overwrite with defaultsHow do I opt out of telemetry?
cua-driver config telemetry disableOr set CUA_DRIVER_TELEMETRY_ENABLED=0 in the environment for a one-off override.
Telemetry records anonymous subcommand usage (cua_driver_api_click, cua_driver_serve, etc). No command arguments, file paths, or personal information are collected.
Testing
Are there tests I can run against my install?
The project includes Python integration tests under libs/cua-driver/tests/ that exercise the real cua-driver stdio server against unittest. They run as part of scripts/test.sh:
cd libs/cua-driver
./scripts/test.shFor a quick manual smoke check, the Calculator test from libs/cua-driver/Skills/cua-driver/TESTS.md is a good five-minute run: launch Calculator hidden, snapshot, click 17 × 23 by element index, re-snapshot, verify the display reads 391 and Calculator never came to the foreground.
Was this page helpful?