Windows App behind VPN

Overview

This guide demonstrates how to automate Windows desktop applications (like eGecko HR/payroll systems) that run behind corporate VPN. This is a common enterprise scenario where legacy desktop applications require manual data entry, report generation, or workflow execution.

Use cases:

HR/payroll processing (employee onboarding, payroll runs, benefits administration)
Desktop ERP systems behind corporate networks
Legacy financial applications requiring VPN access
Compliance reporting from on-premise systems

Architecture:

Client-side Cua agent (Python SDK or Playground UI)
Windows VM/Sandbox with VPN client configured
RDP/remote desktop connection to target environment
Desktop application automation via computer vision and UI control

Production Deployment: For production use, consider workflow mining and custom finetuning to create vertical-specific actions (e.g., "Run payroll", "Onboard employee") instead of generic UI automation. This provides better audit trails and higher success rates.

Video Demo

Demo showing Cua automating an eGecko-like desktop application on Windows behind AWS VPN

Set Up Your Environment

Install the required dependencies:

Create a requirements.txt file:

cua-agent
cua-computer
python-dotenv>=1.0.0

Install the dependencies:

pip install -r requirements.txt

Create a .env file with your API keys:

ANTHROPIC_API_KEY=your-anthropic-api-key
CUA_API_KEY=sk_cua-api01...
CUA_SANDBOX_NAME=your-windows-sandbox

Configure Windows Sandbox with VPN

For enterprise deployments, use Cua Cloud Sandbox with pre-configured VPN:

Go to cua.ai/signin
Navigate to Dashboard > Containers > Create Instance
Create a Windows sandbox (Medium or Large for desktop apps)
Configure VPN settings:
- Upload your AWS VPN Client configuration (.ovpn file)
- Or configure VPN credentials directly in the dashboard
Note your sandbox name and API key

Your Windows sandbox will launch with VPN automatically connected.

For local development on Windows 10 Pro/Enterprise or Windows 11:

Enable Windows Sandbox

Install the pywinsandbox dependency:

pip install -U git+git://github.com/karkason/pywinsandbox.git

Create a VPN setup script that runs on sandbox startup
Configure your desktop application installation within the sandbox

Manual VPN Setup: Windows Sandbox requires manual VPN configuration each time it starts. For production use, consider Cloud Sandbox or self-hosted VMs with persistent VPN connections.

For self-managed infrastructure:

Deploy Windows VM on your preferred cloud (AWS, Azure, GCP)
Install and configure VPN client (AWS VPN Client, OpenVPN, etc.)
Install target desktop application and any dependencies

Install cua-computer-server:

pip install cua-computer-server
python -m computer_server

Configure firewall rules to allow Cua agent connections

Create Your Automation Script

Create a Python file (e.g., hr_automation.py):

import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

load_dotenv()

async def automate_hr_workflow():
    """
    Automate HR/payroll desktop application workflow.

    This example demonstrates:
    - Launching Windows desktop application
    - Navigating complex desktop UI
    - Data entry and form filling
    - Report generation and export
    """
    try:
        # Connect to Windows Cloud Sandbox with VPN
        async with Computer(
            os_type="windows",
            provider_type=VMProviderType.CLOUD,
            name=os.environ["CUA_SANDBOX_NAME"],
            api_key=os.environ["CUA_API_KEY"],
            verbosity=logging.INFO,
        ) as computer:

            # Configure agent with specialized instructions
            agent = ComputerAgent(
                model="cua/anthropic/claude-sonnet-4.5",
                tools=[computer],
                only_n_most_recent_images=3,
                verbosity=logging.INFO,
                trajectory_dir="trajectories",
                use_prompt_caching=True,
                max_trajectory_budget=10.0,
                instructions="""
You are automating a Windows desktop HR/payroll application.

IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved

COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress
                """.strip()
            )

            # Define workflow tasks
            tasks = [
                "Launch the HR application from the desktop or start menu",
                "Log in with the credentials shown in credentials.txt on the desktop",
                "Navigate to Employee Management section",
                "Create a new employee record with information from new_hire.xlsx on desktop",
                "Verify the employee was created successfully by searching for their name",
                "Generate an onboarding report for the new employee",
                "Export the report as PDF to the desktop",
                "Log out of the application"
            ]

            history = []

            for task in tasks:
                logger.info(f"\n{'='*60}")
                logger.info(f"Task: {task}")
                logger.info(f"{'='*60}\n")

                history.append({"role": "user", "content": task})

                async for result in agent.run(history):
                    for item in result.get("output", []):
                        if item.get("type") == "message":
                            content = item.get("content", [])
                            for block in content:
                                if block.get("type") == "text":
                                    response = block.get("text", "")
                                    logger.info(f"Agent: {response}")
                                    history.append({"role": "assistant", "content": response})

                logger.info("\nTask completed. Moving to next task...\n")

            logger.info("\n" + "="*60)
            logger.info("All tasks completed successfully!")
            logger.info("="*60)

    except Exception as e:
        logger.error(f"Error during automation: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(automate_hr_workflow())

import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer, VMProviderType
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

load_dotenv()

async def automate_hr_workflow():
    try:
        # Connect to Windows Sandbox
        async with Computer(
            os_type="windows",
            provider_type=VMProviderType.WINDOWS_SANDBOX,
            verbosity=logging.INFO,
        ) as computer:

            agent = ComputerAgent(
                model="cua/anthropic/claude-sonnet-4.5",
                tools=[computer],
                only_n_most_recent_images=3,
                verbosity=logging.INFO,
                trajectory_dir="trajectories",
                use_prompt_caching=True,
                max_trajectory_budget=10.0,
                instructions="""
You are automating a Windows desktop HR/payroll application.

IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
                """.strip()
            )

            tasks = [
                "Launch the HR application from the desktop",
                "Log in with credentials from credentials.txt on desktop",
                "Navigate to Employee Management and create new employee from new_hire.xlsx",
                "Generate and export onboarding report as PDF",
                "Log out of the application"
            ]

            history = []

            for task in tasks:
                logger.info(f"\nTask: {task}")
                history.append({"role": "user", "content": task})

                async for result in agent.run(history):
                    for item in result.get("output", []):
                        if item.get("type") == "message":
                            content = item.get("content", [])
                            for block in content:
                                if block.get("type") == "text":
                                    response = block.get("text", "")
                                    logger.info(f"Agent: {response}")
                                    history.append({"role": "assistant", "content": response})

            logger.info("\nAll tasks completed!")

    except Exception as e:
        logger.error(f"Error: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(automate_hr_workflow())

import asyncio
import logging
import os
from agent import ComputerAgent
from computer import Computer
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

load_dotenv()

async def automate_hr_workflow():
    try:
        # Connect to self-hosted Windows VM running computer-server
        async with Computer(
            use_host_computer_server=True,
            base_url="http://your-windows-vm-ip:5757",  # Update with your VM IP
            verbosity=logging.INFO,
        ) as computer:

            agent = ComputerAgent(
                model="cua/anthropic/claude-sonnet-4.5",
                tools=[computer],
                only_n_most_recent_images=3,
                verbosity=logging.INFO,
                trajectory_dir="trajectories",
                use_prompt_caching=True,
                max_trajectory_budget=10.0,
                instructions="""
You are automating a Windows desktop HR/payroll application.

IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Verify each action by checking on-screen confirmation messages
- Save work frequently using File > Save or Ctrl+S
                """.strip()
            )

            tasks = [
                "Launch the HR application",
                "Log in with provided credentials",
                "Complete the required HR workflow",
                "Generate and export report",
                "Log out"
            ]

            history = []

            for task in tasks:
                logger.info(f"\nTask: {task}")
                history.append({"role": "user", "content": task})

                async for result in agent.run(history):
                    for item in result.get("output", []):
                        if item.get("type") == "message":
                            content = item.get("content", [])
                            for block in content:
                                if block.get("type") == "text":
                                    response = block.get("text", "")
                                    logger.info(f"Agent: {response}")
                                    history.append({"role": "assistant", "content": response})

            logger.info("\nAll tasks completed!")

    except Exception as e:
        logger.error(f"Error: {e}")
        import traceback
        traceback.print_exc()

if __name__ == "__main__":
    asyncio.run(automate_hr_workflow())

Run Your Automation

Execute the script:

python hr_automation.py

The agent will:

Connect to your Windows environment (with VPN if configured)
Launch and navigate the desktop application
Execute each workflow step sequentially
Verify actions and handle errors
Save trajectory logs for audit and debugging

Monitor the console output to see the agent's progress through each task.

Key Configuration Options

Agent Instructions

The instructions parameter is critical for reliable desktop automation:

instructions="""
You are automating a Windows desktop HR/payroll application.

IMPORTANT GUIDELINES:
- Always wait for windows and dialogs to fully load before interacting
- Look for loading indicators and wait for them to disappear
- Verify each action by checking on-screen confirmation messages
- If a button or field is not visible, try scrolling or navigating tabs
- Desktop apps often have nested menus - explore systematically
- Save work frequently using File > Save or Ctrl+S
- Before closing, always verify changes were saved

COMMON UI PATTERNS:
- Menu bar navigation (File, Edit, View, etc.)
- Ribbon interfaces with tabs
- Modal dialogs that block interaction
- Data grids/tables for viewing records
- Form fields with validation
- Status bars showing operation progress

APPLICATION-SPECIFIC:
- Login is at top-left corner
- Employee records are under "HR Management" > "Employees"
- Reports are generated via "Tools" > "Reports" > "Generate"
- Always click "Save" before navigating away from a form
""".strip()

Budget Management

For long-running workflows, adjust budget limits:

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer],
    max_trajectory_budget=20.0,  # Increase for complex workflows
    # ... other params
)

Image Retention

Balance context and cost by retaining only recent screenshots:

agent = ComputerAgent(
    # ...
    only_n_most_recent_images=3,  # Keep last 3 screenshots
    # ...
)

Production Considerations

Production Deployment

For enterprise production deployments, consider these additional steps:

1. Workflow Mining

Before deploying, analyze your actual workflows:

Record user interactions with the application
Identify common patterns and edge cases
Map out decision trees and validation requirements
Document application-specific quirks and timing issues

2. Custom Finetuning

Create vertical-specific actions instead of generic UI automation:

# Instead of generic steps:
tasks = ["Click login", "Type username", "Type password", "Click submit"]

# Create semantic actions:
tasks = ["onboard_employee", "run_payroll", "generate_compliance_report"]

This provides:

Better audit trails
Approval gates at business logic level
Higher success rates
Easier maintenance and updates

3. Human-in-the-Loop

Add approval gates for critical operations:

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer],
    # Add human approval callback for sensitive operations
    callbacks=[ApprovalCallback(require_approval_for=["payroll", "termination"])]
)

4. Deployment Options

Choose your deployment model:

Managed (Recommended)

Cua hosts Windows sandboxes, VPN/RDP stack, and agent runtime
You get UI/API endpoints for triggering workflows
Automatic scaling, monitoring, and maintenance
SLA guarantees and enterprise support

Self-Hosted

You manage Windows VMs, VPN infrastructure, and agent deployment
Full control over data and security
Custom network configurations
On-premise or your preferred cloud

Troubleshooting

VPN Connection Issues

If the agent cannot reach the application:

Verify VPN is connected: Check VPN client status in the Windows sandbox
Test network connectivity: Try pinging internal resources
Check firewall rules: Ensure RDP and application ports are open
Review VPN logs: Look for authentication or routing errors

Application Not Launching

If the desktop application fails to start:

Verify installation: Check the application is installed in the sandbox
Check dependencies: Ensure all required DLLs and frameworks are present
Review permissions: Application may require admin rights
Check logs: Look for error messages in Windows Event Viewer

UI Element Not Found

If the agent cannot find buttons or fields:

Increase wait times: Some applications load slowly
Check screen resolution: UI elements may be off-screen
Verify DPI scaling: High DPI settings can affect element positions
Update instructions: Provide more specific navigation guidance

Cost Management

If costs are higher than expected:

Reduce max_trajectory_budget
Decrease only_n_most_recent_images
Use prompt caching: Set use_prompt_caching=True
Optimize task descriptions: Be more specific to reduce retry attempts

Next Steps

Explore custom tools: Learn how to create custom tools for application-specific actions
Implement callbacks: Add monitoring and logging for production workflows
Join community: Get help in our Discord

Form Filling - Web form automation
Post-Event Contact Export - Data extraction workflows
Custom Tools - Building application-specific functions

Was this page helpful?

Windows App behind VPN

On this page