Demonstration-Guided Skills
Use human demonstrations to guide agent behavior and improve task completion
Human demonstrations provide powerful context for guiding agent behavior. By showing an agent how a task is performed once, it can learn the workflow pattern and generalize to similar tasks with different inputs.
Overview
The demonstration-guided workflow consists of three stages:
- Record - Capture a human performing the task via VNC recording
- Process - Convert the recording into semantic action traces with VLM captions
- Prompt - Use the processed trajectory to guide agent execution
This approach enables:
- One-shot learning: Teach a workflow once, execute it many times
- Reduced errors: Agents follow demonstrated patterns instead of discovering them
- Complex workflows: Handle multi-step tasks that require specific sequences
Example: Using a Demonstration to Guide an Agent
1. Record a demonstration
cua skills record --sandbox my-sandboxWhen prompted, name it flight-booking and describe it as "Book a flight from NYC to LA".
2. View the saved skill
The skill is saved to ~/.cua/skills/flight-booking/SKILL.md:
cua skills read flight-booking3. Use the skill to guide an agent
Read the SKILL.md file and prefix it to your prompt:
from pathlib import Path
from agent import ComputerAgent
# Load the skill prompt from SKILL.md
skill_path = Path.home() / ".cua/skills/flight-booking/SKILL.md"
skill_prompt = skill_path.read_text()
agent = ComputerAgent(loop="anthropic")
# Prefix the skill as context, then give the actual task
messages = [
{"role": "user", "content": skill_prompt},
{"role": "user", "content": "Book a flight from Boston to Chicago for next Friday"}
]
async for result in agent.run(messages):
print(result)The skill contains a natural language description of the demonstrated workflow, allowing the agent to follow the same pattern while adapting to the new parameters (Boston→Chicago instead of NYC→LA).
Skill Format
Each skill is saved to ~/.cua/skills/<name>/ with the following structure:
~/.cua/skills/flight-booking/
├── SKILL.md # Skill description and agent prompt
└── trajectory/
├── trajectory.json # Raw events and captions
├── events.json # Input events
├── flight-booking.mp4 # Video recording
├── step_1_full.jpg # Full screenshot for step 1
├── step_1_crop.jpg # Cropped screenshot with marker
└── ...The SKILL.md file contains the skill metadata, step descriptions, and agent prompt:
---
name: flight-booking
description: Book a flight from NYC to LA
---
# flight-booking
Book a flight from NYC to LA
## Steps
### Step 1: Click on the departure city field
**Context:** A flight booking form with empty departure and arrival fields...
**Intent:** User wants to enter the departure city
**Expected Result:** Field becomes focused for text input
### Step 2: Type "NYC"
...
## Agent Prompt
You have been shown a demonstration of how to perform this task:
Book a flight from NYC to LA
The demonstration consisted of the following steps:
Step 1: Click on the departure city field
Step 2: Type "NYC"
...Was this page helpful?