Training Computer-Use Models: Creating Human Trajectories with Cua
Published on May 1, 2025 by Dillon DuPont
In our previous posts, we covered building your own Computer-Use Operator and using the Agent framework to simplify development. Today, we'll focus on a critical aspect of improving computer-use agents and models: gathering high-quality demonstration data using Cua's Computer-Use Interface (CUI) and its Gradio UI to create and share human-generated trajectories.
Why is this important? Underlying models used by Computer-use agents need examples of how humans interact with computers to learn effectively. By creating a dataset of diverse, well-executed tasks, we can help train better models that understand how to navigate user interfaces and accomplish real tasks.
What You'll Learn
By the end of this tutorial, you'll be able to:
- Set up the Computer-Use Interface (CUI) with Gradio UI support
- Record your own computer interaction trajectories
- Organize and tag your demonstrations
- Upload your datasets to Hugging Face for community sharing
- Contribute to improving computer-use AI for everyone
Prerequisites:
- macOS Sonoma (14.0) or later
- Python 3.10+
- Basic familiarity with Python and terminal commands
- A Hugging Face account (for uploading datasets)
Estimated Time: 20-30 minutes
Understanding Human Trajectories
What are Human Trajectories?
Human trajectories, in the context of Computer-use AI Agents, are recordings of how humans interact with computer interfaces to complete tasks. These interactions include:
- Mouse movements, clicks, and scrolls
- Keyboard input
- Changes in the UI state
- Time spent on different elements
These trajectories serve as examples for AI models to learn from, helping them understand the relationship between:
- The visual state of the screen
- The user's goal or task
- The most appropriate action to take
Why Human Demonstrations Matter
Unlike synthetic data or rule-based automation, human demonstrations capture the nuanced decision-making that happens during computer interaction:
- Natural Pacing: Humans pause to think, accelerate through familiar patterns, and adjust to unexpected UI changes
- Error Recovery: Humans demonstrate how to recover from mistakes or handle unexpected states
- Context-Sensitive Actions: The same UI element might be used differently depending on the task context
By contributing high-quality demonstrations, you're helping to create more capable, human-like computer-use AI systems.
Setting Up Your Environment
Installing the CUI with Gradio Support
The Computer-Use Interface includes an optional Gradio UI specifically designed to make recording and sharing demonstrations easy. Let's set it up:
-
Create a Python environment (optional but recommended):
bash# Using condaconda create -n cua-trajectories python=3.10conda activate cua-trajectories# Using venvpython -m venv cua-trajectoriessource cua-trajectories/bin/activate # On macOS/Linux -
Install the CUI package with UI support:
bashpip install "cua-computer[ui]" -
Set up your Hugging Face access token: Create a
.envfile in your project directory and add your Hugging Face token:bashecho "HF_TOKEN=your_huggingface_token" > .envYou can get your token from your Hugging Face account settings.
Understanding the Gradio UI
The Computer-Use Interface Gradio UI provides three main components:
- Recording Panel: Captures your screen, mouse, and keyboard activity during demonstrations
- Review Panel: Allows you to review, tag, and organize your demonstration recordings
- Upload Panel: Lets you share your demonstrations with the community via Hugging Face
The UI is designed to make the entire process seamless, from recording to sharing, without requiring deep technical knowledge of the underlying systems.
Creating Your First Trajectory Dataset
Launching the UI
To get started, create a simple Python script to launch the Gradio UI:
python# launch_trajectory_ui.pyfrom computer.ui.gradio.app import create_gradio_uifrom dotenv import load_dotenv# Load your Hugging Face token from .envload_dotenv('.env')# Create and launch the UIapp = create_gradio_ui()app.launch(share=False)
Run this script to start the UI:
bashpython launch_trajectory_ui.py
Recording a Demonstration
Let's walk through the process of recording your first demonstration:
- Start the VM: Click the "Initialize Computer" button in the UI to initialize a fresh macOS sandbox. This ensures your demonstrations are clean and reproducible.
- Perform a Task: Complete a simple task like creating a document, organizing files, or searching for information. Natural, everyday tasks make the best demonstrations.
- Review Recording: Click the "Conversation Logs" or "Function Logs" tabs to review your captured interactions, making sure there is no personal information that you wouldn't want to share.
- Add Metadata: In the "Save/Share Demonstrations" tab, give your recording a descriptive name (e.g., "Creating a Calendar Event") and add relevant tags (e.g., "productivity", "time-management").
- Save Your Demonstration: Click "Save" to store your recording locally.
Key Tips for Quality Demonstrations
To create the most valuable demonstrations:
- Start and end at logical points: Begin with a clear starting state and end when the task is visibly complete
- Narrate your thought process: Use the message input to describe what you're trying to do and why
- Move at a natural pace: Don't rush or perform actions artificially slowly
- Include error recovery: If you make a mistake, keep going and show how to correct it
- Demonstrate variations: Record multiple ways to complete the same task
Organizing and Tagging Demonstrations
Effective tagging and organization make your demonstrations more valuable to researchers and model developers. Consider these tagging strategies:
Task-Based Tags
Describe what the demonstration accomplishes:
web-browsingdocument-editingfile-managementemailscheduling
Application Tags
Identify the applications used:
findersafarinotesterminalcalendar
Complexity Tags
Indicate the difficulty level:
beginnerintermediateadvancedmulti-application
UI Element Tags
Highlight specific UI interactions:
drag-and-dropmenu-navigationform-fillingsearch
The Computer-Use Interface UI allows you to apply and manage these tags across all your saved demonstrations, making it easy to create cohesive, well-organized datasets.
Uploading to Hugging Face
Sharing your demonstrations helps advance research in computer-use AI. The Gradio UI makes uploading to Hugging Face simple:
Preparing for Upload
-
Review Your Demonstrations: Use the review panel to ensure all demonstrations are complete and correctly tagged.
-
Select Demonstrations to Upload: You can upload all demonstrations or filter by specific tags.
-
Configure Dataset Information:
- Repository Name: Format as
{your_username}/{dataset_name}, e.g.,johndoe/productivity-tasks - Visibility: Choose
publicto contribute to the community orprivatefor personal use - License: Standard licenses like CC-BY or MIT are recommended for public datasets
- Repository Name: Format as
The Upload Process
-
Click "Upload to Hugging Face": This initiates the upload preparation.
-
Review Dataset Summary: Confirm the number of demonstrations and total size.
-
Confirm Upload: The UI will show progress as files are transferred.
-
Receive Confirmation: Once complete, you'll see a link to your new dataset on Hugging Face.
Your uploaded dataset will have a standardized format with the following structure:
json{"timestamp": "2025-05-01T09:20:40.594878","session_id": "1fe9f0fe-9331-4078-aacd-ec7ffb483b86","name": "penguin lemon forest","tool_calls": [...], // Detailed interaction records"messages": [...], // User/assistant messages"tags": ["highquality", "tasks"],"images": [...] // Screenshots of each state}
This structured format makes it easy for researchers to analyze patterns across different demonstrations and build better computer-use models.
pythonfrom computer import Computercomputer = Computer(os_type="macos", display="1024x768", memory="8GB", cpu="4")try:await computer.run()screenshot = await computer.interface.screenshot()with open("screenshot.png", "wb") as f:f.write(screenshot)await computer.interface.move_cursor(100, 100)await computer.interface.left_click()await computer.interface.right_click(300, 300)await computer.interface.double_click(400, 400)await computer.interface.type_text("Hello, World!")await computer.interface.press_key("enter")await computer.interface.set_clipboard("Test clipboard")content = await computer.interface.copy_to_clipboard()print(f"Clipboard content: {content}")finally:await computer.stop()
Example: Shopping List Demonstration
Let's walk through a concrete example of creating a valuable demonstration:
Task: Adding Shopping List Items to a Doordash Cart
-
Start Recording: Begin with a clean desktop and a text file containing a shopping list.
-
Task Execution: Open the file, read the list, open Safari, navigate to Doordash, and add each item to the cart.
-
Narration: Add messages like "Reading the shopping list" and "Searching for rice on Doordash" to provide context.
-
Completion: Verify all items are in the cart and end the recording.
-
Tagging: Add tags like
shopping,web-browsing,task-completion, andmulti-step.
This type of demonstration is particularly valuable because it showcases real-world task completion requiring multiple applications and context switching.
Exploring Community Datasets
You can also learn from existing trajectory datasets contributed by the community:
- Visit Hugging Face Datasets tagged with 'cua'
- Explore different approaches to similar tasks
- Download and analyze high-quality demonstrations
Conclusion
Summary
In this guide, we've covered how to:
- Set up the Computer-Use Interface with Gradio UI
- Record high-quality human demonstrations
- Organize and tag your trajectories
- Share your datasets with the community
By contributing your own demonstrations, you're helping to build more capable, human-like AI systems that can understand and execute complex computer tasks.
Next Steps
Now that you know how to create and share trajectories, consider these advanced techniques:
- Create themed collections around specific productivity workflows
- Collaborate with others to build comprehensive datasets
- Use your datasets to fine-tune your own computer-use models