Computer Tracing API

The Computer tracing API provides a powerful way to record computer interactions for debugging, training, analysis, and compliance purposes. Inspired by Playwright's tracing functionality, it offers flexible recording options and standardized output formats.

Overview

The tracing API allows you to:

Record screenshots at key moments
Log all API calls and their results
Capture accessibility tree snapshots
Add custom metadata
Export recordings in standardized formats
Support for both automated and human-in-the-loop workflows

Basic Usage

Starting and Stopping Traces

from computer import Computer

computer = Computer(os_type="macos")
await computer.run()

# Start tracing with default options
await computer.tracing.start()

# Perform some operations
await computer.interface.left_click(100, 200)
await computer.interface.type_text("Hello, World!")
await computer.interface.press_key("enter")

# Stop tracing and save
trace_path = await computer.tracing.stop()
print(f"Trace saved to: {trace_path}")

Custom Configuration

# Start tracing with custom configuration
await computer.tracing.start({
    'video': False,              # Record video frames
    'screenshots': True,         # Record screenshots (default: True)
    'api_calls': True,          # Record API calls (default: True)
    'accessibility_tree': True, # Record accessibility snapshots
    'metadata': True,           # Allow custom metadata (default: True)
    'name': 'my_custom_trace',  # Custom trace name
    'path': './my_traces'       # Custom output directory
})

# Add custom metadata during tracing
await computer.tracing.add_metadata('user_id', 'user123')
await computer.tracing.add_metadata('test_case', 'login_flow')

# Stop with custom options
trace_path = await computer.tracing.stop({
    'path': './exports/trace.zip',
    'format': 'zip'  # 'zip' or 'dir'
})

Configuration Options

Start Options

Option	Type	Default	Description
`video`	bool	`False`	Record video frames (future feature)
`screenshots`	bool	`True`	Capture screenshots after key actions
`api_calls`	bool	`True`	Log all interface method calls
`accessibility_tree`	bool	`False`	Record accessibility tree snapshots
`metadata`	bool	`True`	Enable custom metadata recording
`name`	str	auto-generated	Custom name for the trace
`path`	str	auto-generated	Custom directory for trace files

Stop Options

Option	Type	Default	Description
`path`	str	auto-generated	Custom output path for final trace
`format`	str	`'zip'`	Output format: `'zip'` or `'dir'`

Use Cases

Custom Agent Development

from computer import Computer

async def test_custom_agent():
    computer = Computer(os_type="linux")
    await computer.run()

    # Start tracing for this test session
    await computer.tracing.start({
        'name': 'custom_agent_test',
        'screenshots': True,
        'accessibility_tree': True
    })

    # Your custom agent logic here
    screenshot = await computer.interface.screenshot()
    await computer.interface.left_click(500, 300)
    await computer.interface.type_text("test input")

    # Add context about what the agent is doing
    await computer.tracing.add_metadata('action', 'filling_form')
    await computer.tracing.add_metadata('confidence', 0.95)

    # Save the trace
    trace_path = await computer.tracing.stop()
    return trace_path

Training Data Collection

async def collect_training_data():
    computer = Computer(os_type="macos")
    await computer.run()

    tasks = [
        "open_browser_and_search",
        "create_document",
        "send_email"
    ]

    for task in tasks:
        # Start a new trace for each task
        await computer.tracing.start({
            'name': f'training_{task}',
            'screenshots': True,
            'accessibility_tree': True,
            'metadata': True
        })

        # Add task metadata
        await computer.tracing.add_metadata('task_type', task)
        await computer.tracing.add_metadata('difficulty', 'beginner')

        # Perform the task (automated or human-guided)
        await perform_task(computer, task)

        # Save this training example
        await computer.tracing.stop({
            'path': f'./training_data/{task}.zip'
        })

Human-in-the-Loop Recording

async def record_human_demonstration():
    computer = Computer(os_type="windows")
    await computer.run()

    # Start recording human demonstration
    await computer.tracing.start({
        'name': 'human_demo_excel_workflow',
        'screenshots': True,
        'api_calls': True,  # Will capture any programmatic actions
        'metadata': True
    })

    print("Trace recording started. Perform your demonstration...")
    print("The system will record all computer interactions.")

    # Add metadata about the demonstration
    await computer.tracing.add_metadata('demonstrator', 'expert_user')
    await computer.tracing.add_metadata('workflow', 'excel_data_analysis')

    # Human performs actions manually or through other tools
    # Tracing will still capture any programmatic interactions

    input("Press Enter when demonstration is complete...")

    # Stop and save the demonstration
    trace_path = await computer.tracing.stop()
    print(f"Human demonstration saved to: {trace_path}")

RPA Debugging

async def debug_rpa_workflow():
    computer = Computer(os_type="linux")
    await computer.run()

    # Start tracing with full debugging info
    await computer.tracing.start({
        'name': 'rpa_debug_session',
        'screenshots': True,
        'accessibility_tree': True,
        'api_calls': True
    })

    try:
        # Your RPA workflow
        await rpa_login_sequence(computer)
        await rpa_data_entry(computer)
        await rpa_generate_report(computer)

        await computer.tracing.add_metadata('status', 'success')

    except Exception as e:
        # Record the error in the trace
        await computer.tracing.add_metadata('error', str(e))
        await computer.tracing.add_metadata('status', 'failed')
        raise
    finally:
        # Always save the debug trace
        trace_path = await computer.tracing.stop()
        print(f"Debug trace saved to: {trace_path}")

Output Format

Directory Structure

When using format='dir', traces are saved with this structure:

trace_20240922_143052_abc123/
├── trace_metadata.json         # Overall trace information
├── event_000001_trace_start.json
├── event_000002_api_call.json
├── event_000003_api_call.json
├── 000001_initial_screenshot.png
├── 000002_after_left_click.png
├── 000003_after_type_text.png
└── event_000004_trace_end.json

Metadata Format

The trace_metadata.json contains:

{
  "trace_id": "trace_20240922_143052_abc123",
  "config": {
    "screenshots": true,
    "api_calls": true,
    "accessibility_tree": false,
    "metadata": true
  },
  "start_time": 1695392252.123,
  "end_time": 1695392267.456,
  "duration": 15.333,
  "total_events": 12,
  "screenshot_count": 5,
  "events": [...] // All events in chronological order
}

Event Format

Individual events follow this structure:

{
  "type": "api_call",
  "timestamp": 1695392255.789,
  "relative_time": 3.666,
  "data": {
    "method": "left_click",
    "args": { "x": 100, "y": 200, "delay": null },
    "result": null,
    "error": null,
    "screenshot": "000002_after_left_click.png",
    "success": true
  }
}

Integration with ComputerAgent

The tracing API works seamlessly with existing ComputerAgent workflows:

from agent import ComputerAgent
from computer import Computer

# Create computer and start tracing
computer = Computer(os_type="macos")
await computer.run()

await computer.tracing.start({
    'name': 'agent_with_tracing',
    'screenshots': True,
    'metadata': True
})

# Create agent using the same computer
agent = ComputerAgent(
    model="openai/computer-use-preview",
    tools=[computer]
)

# Agent operations will be automatically traced
async for _ in agent.run("open cua.ai and navigate to docs"):
    pass

# Save the combined trace
trace_path = await computer.tracing.stop()

Privacy Considerations

The tracing API is designed with privacy in mind:

Clipboard content is not recorded (only content length)
Screenshots can be disabled
Sensitive text input can be filtered
Custom metadata allows you to control what information is recorded

Comparison with ComputerAgent Trajectories

Feature	ComputerAgent Trajectories	Computer.tracing
Scope	ComputerAgent only	Any Computer usage
Flexibility	Fixed format	Configurable options
Custom Agents	Not supported	Fully supported
Human-in-the-loop	Limited	Full support
Real-time Control	No	Start/stop anytime
Output Format	Agent-specific	Standardized
Accessibility Data	No	Optional

Best Practices

Start tracing early: Begin recording before your main workflow to capture the complete session
Use meaningful names: Provide descriptive trace names for easier organization
Add contextual metadata: Include information about what you're testing or demonstrating
Handle errors gracefully: Always stop tracing in a finally block
Choose appropriate options: Only record what you need to minimize overhead
Organize output: Use custom paths to organize traces by project or use case

The Computer tracing API provides a powerful foundation for recording, analyzing, and improving computer automation workflows across all use cases.

Was this page helpful?

Computer Tracing API

On this page