Integrations
Connect cua-driver to your AI coding agent
cua-driver mcp is a stdio MCP server. Any agent that supports MCP can use it — no extra setup beyond adding it to the agent's MCP config.
Grant Accessibility and Screen Recording permissions to cua-driver before connecting any agent.
Run cua-driver check_permissions to verify.
Claude Code
Standard MCP registration:
claude mcp add --transport stdio cua-driver -- cua-driver mcpVerify:
claude mcp list
# cua-driver: cua-driver mcp (stdio) - ✓ ConnectedClaude Code computer-use compatibility mode
Claude Code vision/computer-use-style flows appear to use the presence of a screenshot tool as a cue for image-grounded operation. If you want that behavior, register the compatibility server instead:
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compatThis mode still exposes the normal CuaDriver tools. The only changed tool is screenshot: it requires pid and window_id, captures that window only, and returns a window-local image coordinate frame. Start with launch_app or list_windows, then call screenshot with the target window.
For this Claude Code vision/computer-use-style path, use MCP rather than shelling out to the CLI. CLI screenshots can still capture windows, but they do not expose the mcp__cua-computer-use__screenshot tool name that Claude Code appears to use as the image-grounding cue.
This does not call Anthropic APIs or expose Anthropic's native computer-use API tool. It is a CuaDriver MCP compatibility mode for Claude Code.
GitHub Copilot CLI
Add to ~/.copilot/mcp-config.json:
{
"mcpServers": {
"cua-driver": {
"type": "local",
"command": "cua-driver",
"args": ["mcp"],
"tools": ["*"]
}
}
}Or interactively inside gh copilot chat:
/mcp addFill in: name=cua-driver, type=STDIO, command=cua-driver, args=mcp. Press Ctrl+S to save.
Codex (OpenAI)
codex mcp add cua-driver -- cua-driver mcpCursor
Generate the config snippet and paste it into ~/.cursor/mcp.json:
cua-driver mcp-config --client cursorGemini CLI
Add to ~/.gemini/settings.json:
{
"mcp": {
"servers": {
"cua-driver": {
"type": "stdio",
"command": "cua-driver",
"args": ["mcp"]
}
}
}
}Tools appear prefixed as mcp_cua-driver_*.
OpenCode
cua-driver mcp-config --client opencodePaste the output into ~/.config/opencode/config.json (global) or opencode.json at the project root.
Always configure cua-driver as an MCP server — never rely on the CLI fallback. If MCP is not wired up, OpenCode calls cua-driver as a shell subprocess. The get_window_state response no longer includes base64 by default, but the screenshot image block is silently dropped — the model receives only the AX tree with no visual context. Use --screenshot-out-file or the screenshot_out_file param to preserve the image when using the CLI path.
Local vision models (Ollama)
If you are using a vision-capable model via Ollama, you must also declare its input modalities in config.json — otherwise OpenCode strips images before they reach the model:
{
"mcp": {
"cua-driver": {
"type": "local",
"command": ["/Users/you/.local/bin/cua-driver", "mcp"],
"enabled": true
}
},
"provider": {
"ollama": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "http://localhost:11434/v1" },
"models": {
"gemma4:26b": {
"modalities": {
"input": ["text", "image"],
"output": ["text"]
}
}
}
}
}
}The modalities field is required because OpenCode's @ai-sdk/openai-compatible provider defaults to text-only when no capabilities are declared. Without it, screenshots are replaced with an error string and never reach the model.
Hermes (NousResearch)
cua-driver mcp-config --client hermesPaste the output into ~/.hermes/config.yaml.
OpenClaw
cua-driver mcp-config --client openclawOther clients
For any client that accepts the standard mcpServers shape:
cua-driver mcp-configOutput:
{
"mcpServers": {
"cua-driver": {
"command": "cua-driver",
"args": ["mcp"]
}
}
}Was this page helpful?