CUA VLM Router
Intelligent vision-language model routing with cost optimization and unified access
CUA VLM Router
The CUA VLM Router is an intelligent inference API that provides unified access to multiple vision-language model providers through a single API key. It offers cost optimization and detailed observability for production AI applications.
Overview
Instead of managing multiple API keys and provider-specific code, CUA VLM Router acts as a smart cloud gateway that:
- Unifies access to multiple model providers
- Optimizes costs through intelligent routing and provider selection
- Tracks usage and costs with detailed metadata
- Provides observability with routing decisions and attempt logs
- Managed infrastructure - no need to manage provider API keys yourself
Quick Start
1. Get Your API Key
Sign up at cua.ai and get your CUA API key from the dashboard.
2. Set Environment Variable
export CUA_API_KEY="sk_cua-api01_..."3. Use with Agent SDK
from agent import ComputerAgent
from computer import Computer
computer = Computer(os_type="linux", provider_type="docker")
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer],
max_trajectory_budget=5.0
)
messages = [{"role": "user", "content": "Take a screenshot and tell me what's on screen"}]
async for result in agent.run(messages):
for item in result["output"]:
if item["type"] == "message":
print(item["content"][0]["text"])Available Models
The CUA VLM Router currently supports these models:
| Model ID | Provider | Description | Best For |
|---|---|---|---|
cua/anthropic/claude-sonnet-4.5 | Anthropic | Claude Sonnet 4.5 | General-purpose tasks, recommended |
cua/anthropic/claude-haiku-4.5 | Anthropic | Claude Haiku 4.5 | Fast responses, cost-effective |
How It Works
Intelligent Routing
When you make a request to CUA VLM Router:
- Model Resolution: Your model ID (e.g.,
cua/anthropic/claude-sonnet-4.5) is resolved to the appropriate provider - Provider Selection: CUA routes your request to the appropriate model provider
- Response: You receive an OpenAI-compatible response with metadata
API Reference
Base URL
https://inference.cua.ai/v1Authentication
All requests require an API key in the Authorization header:
Authorization: Bearer sk_cua-api01_...Endpoints
List Available Models
GET /v1/modelsResponse:
{
"data": [
{
"id": "anthropic/claude-sonnet-4.5",
"name": "Claude Sonnet 4.5",
"object": "model",
"owned_by": "cua"
}
],
"object": "list"
}Chat Completions
POST /v1/chat/completions
Content-Type: application/jsonRequest:
{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100,
"temperature": 0.7,
"stream": false
}Response:
{
"id": "gen_...",
"object": "chat.completion",
"created": 1763554838,
"model": "anthropic/claude-sonnet-4.5",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 12,
"total_tokens": 22,
"cost": 0.01,
"is_byok": true
}
}Streaming
Set "stream": true to receive server-sent events:
curl -X POST https://inference.cua.ai/v1/chat/completions \
-H "Authorization: Bearer sk_cua-api01_..." \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [{"role": "user", "content": "Count to 5"}],
"stream": true
}'Response (SSE format):
data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{"content":"\n2"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{"content":"\n3\n4\n5"}}],"object":"chat.completion.chunk"}
data: {"id":"gen_...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{...}}Check Balance
GET /v1/balanceResponse:
{
"balance": 211689.85,
"currency": "credits"
}Cost Tracking
CUA VLM Router provides detailed cost information in every response:
Credit System
Requests are billed in credits:
- Credits are deducted from your CUA account balance
- Prices vary by model and usage
- CUA manages all provider API keys and infrastructure
Response Cost Fields
{
"usage": {
"cost": 0.01, // CUA gateway cost in credits
"market_cost": 0.000065 // Actual upstream API cost
}
}Note: CUA VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the Supported Model Providers page for direct provider access via the agent SDK.
Response Metadata
CUA VLM Router includes metadata about routing decisions and costs in the response. This information helps with debugging and monitoring your application's model usage.
Configuration
Environment Variables
# Required: Your CUA API key
export CUA_API_KEY="sk_cua-api01_..."
# Optional: Custom endpoint (defaults to https://inference.cua.ai/v1)
export CUA_BASE_URL="https://custom-endpoint.cua.ai/v1"Python SDK Configuration
from agent import ComputerAgent
# Using environment variables (recommended)
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")
# Or explicit configuration
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
# CUA adapter automatically loads from CUA_API_KEY
)Benefits Over Direct Provider Access
| Feature | CUA VLM Router | Direct Provider (BYOK) |
|---|---|---|
| Single API Key | ✅ One key for all providers | ❌ Multiple keys to manage |
| Managed Infrastructure | ✅ No API key management | ❌ Manage multiple provider keys |
| Usage Tracking | ✅ Unified dashboard | ❌ Per-provider tracking |
| Model Switching | ✅ Change model string only | ❌ Change code + keys |
| Setup Complexity | ✅ One environment variable | ❌ Multiple environment variables |
Error Handling
Common Error Responses
Invalid API Key
{
"detail": "Insufficient credits. Current balance: 0.00 credits"
}Missing Authorization
{
"detail": "Missing Authorization: Bearer token"
}Invalid Model
{
"detail": "Invalid or unavailable model"
}Best Practices
- Check balance periodically using
/v1/balance - Handle rate limits with exponential backoff
- Log generation IDs for debugging
- Set up usage alerts in your CUA dashboard
Examples
Basic Usage
from agent import ComputerAgent
from computer import Computer
computer = Computer(os_type="linux", provider_type="docker")
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5",
tools=[computer]
)
messages = [{"role": "user", "content": "Open Firefox"}]
async for result in agent.run(messages):
print(result)Direct API Call (curl)
curl -X POST https://inference.cua.ai/v1/chat/completions \
-H "Authorization: Bearer ${CUA_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4.5",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"max_tokens": 200
}'With Custom Parameters
agent = ComputerAgent(
model="cua/anthropic/claude-haiku-4.5",
tools=[computer],
max_trajectory_budget=10.0,
temperature=0.7
)Migration from Direct Provider Access
Switching from direct provider access (BYOK) to CUA VLM Router is simple:
Before (Direct Provider Access with BYOK):
# Required: Provider-specific API key
export ANTHROPIC_API_KEY="sk-ant-..."
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer]
)After (CUA VLM Router - Cloud Service):
# Required: CUA API key only (no provider keys needed)
export CUA_API_KEY="sk_cua-api01_..."
agent = ComputerAgent(
model="cua/anthropic/claude-sonnet-4.5", # Add "cua/" prefix
tools=[computer]
)That's it! Same code structure, just different model format. CUA manages all provider infrastructure and credentials for you.
Support
- Documentation: cua.ai/docs
- Discord: Join our community
- Issues: GitHub Issues
Next Steps
- Explore Agent Loops to customize agent behavior
- Learn about Cost Saving Callbacks
- Try Example Use Cases
- Review Supported Model Providers for all options
Was this page helpful?