CUA VLM Router

The CUA VLM Router is an intelligent inference API that provides unified access to multiple vision-language model providers through a single API key. It offers cost optimization and detailed observability for production AI applications.

Overview

Instead of managing multiple API keys and provider-specific code, CUA VLM Router acts as a smart cloud gateway that:

Unifies access to multiple model providers
Optimizes costs through intelligent routing and provider selection
Tracks usage and costs with detailed metadata
Provides observability with routing decisions and attempt logs
Managed infrastructure - no need to manage provider API keys yourself

Quick Start

1. Get Your API Key

2. Set Environment Variable

export CUA_API_KEY="sk_cua-api01_..."

3. Use with Agent SDK

from agent import ComputerAgent
from computer import Computer

computer = Computer(os_type="linux", provider_type="docker")

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer],
    max_trajectory_budget=5.0
)

messages = [{"role": "user", "content": "Take a screenshot and tell me what's on screen"}]

async for result in agent.run(messages):
    for item in result["output"]:
        if item["type"] == "message":
            print(item["content"][0]["text"])

Available Models

The CUA VLM Router currently supports these models:

Model ID	Provider	Description	Best For
`cua/anthropic/claude-sonnet-4.5`	Anthropic	Claude Sonnet 4.5	General-purpose tasks, recommended
`cua/anthropic/claude-haiku-4.5`	Anthropic	Claude Haiku 4.5	Fast responses, cost-effective

How It Works

Intelligent Routing

When you make a request to CUA VLM Router:

Model Resolution: Your model ID (e.g., cua/anthropic/claude-sonnet-4.5) is resolved to the appropriate provider
Provider Selection: CUA routes your request to the appropriate model provider
Response: You receive an OpenAI-compatible response with metadata

API Reference

Base URL

https://inference.cua.ai/v1

Authentication

All requests require an API key in the Authorization header:

Authorization: Bearer sk_cua-api01_...

Endpoints

List Available Models

GET /v1/models

Response:

{
  "data": [
    {
      "id": "anthropic/claude-sonnet-4.5",
      "name": "Claude Sonnet 4.5",
      "object": "model",
      "owned_by": "cua"
    }
  ],
  "object": "list"
}

Chat Completions

POST /v1/chat/completions
Content-Type: application/json

Request:

{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}

Response:

{
  "id": "gen_...",
  "object": "chat.completion",
  "created": 1763554838,
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22,
    "cost": 0.01,
    "is_byok": true
  }
}

Streaming

Set "stream": true to receive server-sent events:

curl -X POST https://inference.cua.ai/v1/chat/completions \
  -H "Authorization: Bearer sk_cua-api01_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Response (SSE format):

data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{"content":"\n2"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{"content":"\n3\n4\n5"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{...}}

Check Balance

GET /v1/balance

Response:

{
  "balance": 211689.85,
  "currency": "credits"
}

Cost Tracking

CUA VLM Router provides detailed cost information in every response:

Credit System

Requests are billed in credits:

Credits are deducted from your CUA account balance
Prices vary by model and usage
CUA manages all provider API keys and infrastructure

Response Cost Fields

{
  "usage": {
    "cost": 0.01,                    // CUA gateway cost in credits
    "market_cost": 0.000065          // Actual upstream API cost
  }
}

Note: CUA VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the Supported Model Providers page for direct provider access via the agent SDK.

# Required: Your CUA API key
export CUA_API_KEY="sk_cua-api01_..."

# Optional: Custom endpoint (defaults to https://inference.cua.ai/v1)
export CUA_BASE_URL="https://custom-endpoint.cua.ai/v1"

Python SDK Configuration

from agent import ComputerAgent

# Using environment variables (recommended)
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")

# Or explicit configuration
agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    # CUA adapter automatically loads from CUA_API_KEY
)

Benefits Over Direct Provider Access

Feature	CUA VLM Router	Direct Provider (BYOK)
Single API Key	✅ One key for all providers	❌ Multiple keys to manage
Managed Infrastructure	✅ No API key management	❌ Manage multiple provider keys
Usage Tracking	✅ Unified dashboard	❌ Per-provider tracking
Model Switching	✅ Change model string only	❌ Change code + keys
Setup Complexity	✅ One environment variable	❌ Multiple environment variables

{
  "detail": "Insufficient credits. Current balance: 0.00 credits"
}

Missing Authorization

{
  "detail": "Missing Authorization: Bearer token"
}

Invalid Model

{
  "detail": "Invalid or unavailable model"
}

Best Practices

Check balance periodically using /v1/balance
Handle rate limits with exponential backoff
Log generation IDs for debugging
Set up usage alerts in your CUA dashboard

Examples

Basic Usage

from agent import ComputerAgent
from computer import Computer

computer = Computer(os_type="linux", provider_type="docker")

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer]
)

messages = [{"role": "user", "content": "Open Firefox"}]

async for result in agent.run(messages):
    print(result)

Direct API Call (curl)

curl -X POST https://inference.cua.ai/v1/chat/completions \
  -H "Authorization: Bearer ${CUA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "max_tokens": 200
  }'

With Custom Parameters

agent = ComputerAgent(
    model="cua/anthropic/claude-haiku-4.5",
    tools=[computer],
    max_trajectory_budget=10.0,
    temperature=0.7
)

Migration from Direct Provider Access

Switching from direct provider access (BYOK) to CUA VLM Router is simple:

Before (Direct Provider Access with BYOK):

# Required: Provider-specific API key
export ANTHROPIC_API_KEY="sk-ant-..."

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer]
)

After (CUA VLM Router - Cloud Service):

# Required: CUA API key only (no provider keys needed)
export CUA_API_KEY="sk_cua-api01_..."

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",  # Add "cua/" prefix
    tools=[computer]
)

That's it! Same code structure, just different model format. CUA manages all provider infrastructure and credentials for you.

Support

Documentation: cua.ai/docs
Discord: Join our community
Issues: GitHub Issues

Next Steps

Explore Agent Loops to customize agent behavior
Learn about Cost Saving Callbacks
Try Example Use Cases
Review Supported Model Providers for all options

Was this page helpful?

CUA VLM Router

CUA VLM Router

Overview

Quick Start

1. Get Your API Key

2. Set Environment Variable

3. Use with Agent SDK

Available Models

How It Works

Intelligent Routing

API Reference

Base URL

Authentication

Endpoints

List Available Models

Chat Completions

Streaming

Check Balance

Cost Tracking

Credit System

Response Cost Fields

Response Metadata

Configuration

Environment Variables

Python SDK Configuration

Benefits Over Direct Provider Access

Error Handling

Common Error Responses

Invalid API Key

Missing Authorization

Invalid Model

Best Practices

Examples

Basic Usage

Direct API Call (curl)

With Custom Parameters

Migration from Direct Provider Access

Support

Next Steps

On this page