Prompt Caching
Reduce costs and latency with provider-specific prompt caching
Prompt caching avoids reprocessing the same prompt content on repeated calls. This reduces costs and latency for long-running tasks or iterative workflows where the context stays mostly the same.
Enabling Prompt Caching
Set use_prompt_caching=True when creating your agent:
from computer import Computer
from agent import ComputerAgent
computer = Computer(os_type="linux", provider_type="docker", image="trycua/cua-xfce:latest")
await computer.run()
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
use_prompt_caching=True
)Provider Behavior
Anthropic
When enabled, Cua adds {"cache_control": "ephemeral"} to messages. This tells Anthropic to cache the prompt for the duration of the session.
# Required for Anthropic models
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
use_prompt_caching=True
)OpenAI
OpenAI automatically caches prompts over 1000 tokens. You don't need to set use_prompt_caching—it happens automatically.
# Caching is automatic for OpenAI
agent = ComputerAgent(
model="openai/computer-use-preview",
tools=[computer]
)Other Providers
For providers that don't support caching, the parameter is ignored.
When to Use
Iterative tasks - When the agent runs multiple turns with similar context:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
use_prompt_caching=True
)
# Each turn reuses cached context
for task in tasks:
async for result in agent.run(task):
process(result)Long system prompts - When using detailed instructions that repeat across calls:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
instructions="Very long system prompt with detailed rules...",
use_prompt_caching=True
)Multi-turn conversations - When conversation history grows but the beginning stays constant:
agent = ComputerAgent(
model="anthropic/claude-sonnet-4-5-20250929",
tools=[computer],
use_prompt_caching=True
)
messages = [{"role": "system", "content": "Long context..."}]
for user_input in inputs:
messages.append({"role": "user", "content": user_input})
async for result in agent.run(messages):
messages.extend(result["output"])Cost Impact
With Anthropic's prompt caching:
- Cached input tokens cost 90% less than regular input tokens
- Cache writes have a small additional cost
- Cache hits provide significant savings on repeated runs
Check your usage in the response:
async for result in agent.run(messages):
usage = result["usage"]
print(f"Tokens: {usage['total_tokens']}, Cost: ${usage['response_cost']:.4f}")Was this page helpful?