All‑in‑one CUA Models
Models that support full computer-use agent capabilities with ComputerAgent.run()
These models support complete computer-use agent functionality through ComputerAgent.run(). They can understand natural language instructions and autonomously perform sequences of actions to complete tasks.
All agent loops are compatible with any LLM provider supported by LiteLLM.
See Running Models Locally for how to use Hugging Face and MLX models on your own machine.
Gemini CUA
Gemini models with computer-use capabilities:
- Gemini 2.5 CUA:
gemini-2.5-computer-use-preview-10-2025
agent = ComputerAgent("gemini-2.5-computer-use-preview-10-2025", tools=[computer])
async for _ in agent.run("Open Firefox and navigate to github.com"):
passAnthropic CUAs
Claude models with computer-use capabilities:
- Claude 4.5:
claude-sonnet-4-5-20250929,claude-haiku-4-5-20251001 - Claude 4.1:
claude-opus-4-1-20250805 - Claude 4:
claude-opus-4-20250514,claude-sonnet-4-20250514 - Claude 3.7:
claude-3-7-sonnet-20250219
agent = ComputerAgent("claude-sonnet-4-5-20250929", tools=[computer])
async for _ in agent.run("Open Firefox and navigate to github.com"):
passOpenAI CUA Preview
OpenAI's computer-use preview model:
- Computer-use-preview:
computer-use-preview
agent = ComputerAgent("openai/computer-use-preview", tools=[computer])
async for _ in agent.run("Take a screenshot and describe what you see"):
passGLM-4.5V
Zhipu AI's GLM-4.5V vision-language model with computer-use capabilities:
openrouter/z-ai/glm-4.5vhuggingface-local/zai-org/GLM-4.5V
agent = ComputerAgent("openrouter/z-ai/glm-4.5v", tools=[computer])
async for _ in agent.run("Click on the search bar and type 'hello world'"):
passInternVL 3.5
InternVL 3.5 family:
huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}
agent = ComputerAgent("huggingface-local/OpenGVLab/InternVL3_5-1B", tools=[computer])
async for _ in agent.run("Open Firefox and navigate to github.com"):
passQwen3 VL
Qwen3 VL family:
cua/qwen/qwen3-vl-235b(via CUA VLM Router - recommended)
agent = ComputerAgent("cua/qwen/qwen3-vl-235b", tools=[computer])
async for _ in agent.run("Open Firefox and navigate to github.com"):
passUI-TARS 1.5
Unified vision-language model for computer-use:
huggingface-local/ByteDance-Seed/UI-TARS-1.5-7Bhuggingface/ByteDance-Seed/UI-TARS-1.5-7B(requires TGI endpoint)
agent = ComputerAgent("huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", tools=[computer])
async for _ in agent.run("Open the settings menu and change the theme to dark mode"):
passCUAs also support direct click prediction. See Grounding Models for details on predict_click().
For details on agent loop behavior and usage, see Agent Loops.
Was this page helpful?