Background Computer-Use
How Cua Driver operates an app in the background without taking focus, moving the cursor, or raising windows.
Background Computer-Use
Background computer-use means an agent operates an application without taking over the desktop. It does not change the user's frontmost app, move the real cursor, or raise the target window.
The no-foreground contract is the invariant and guarantee that makes background computer-use possible. Cua Driver addresses the target application through background-capable input, accessibility, and capture paths while preserving the developer's active session. An agent can operate one application while the developer keeps coding, reading logs, and using the same machine.
Why It Matters
Most GUI automation assumes the automated app owns the desktop. It activates a window, moves the pointer, and repeats. That works for unattended jobs, but fails when a human is using the same machine.
If an agent raised the target window before every action, the developer's editor would lose focus every time the agent acted. If the agent moved the real pointer, the developer would lose their cursor position. If screenshots required the target window to be visible on the active desktop or Space, the workspace would jump to follow the target.
That limits the tool to fully automated, unattended scenarios: scheduled tasks, CI-style jobs, or disposable desktops. Preserving desktop state enables concurrent work instead. The agent operates an app in the background while the developer keeps working in the foreground.
Platform Mechanisms
Each OS splits accessibility, input delivery, capture, and focus policy differently. Cua Driver upholds the contract with APIs that address a process, window, element, or message queue directly.
macOS
On macOS, scoped CoreGraphics event routing lets synthetic events be posted to a target process. CGEvent.postToPid routes input directly to a process, bypassing the window server's focus check. The normal event path is focus-based.
The Accessibility API provides semantic action dispatch. AXUIElementPerformAction operates on an accessibility element, such as a button or menu item, regardless of whether the owning app is frontmost. When an app exposes useful structure, the driver can act on the element rather than replay motion.
ScreenCaptureKit can capture a specific window without requiring it to be visible, raised, or on the active Space. Cua Driver also uses SkyLight pid-routed mouse delivery for apps that respond better to routed pointer events than accessibility actions.
Windows
Windows separates windows, handles, message queues, and foreground ownership. UI Automation can inspect and operate controls by window handle and automation element even when the target window is behind another window.
For input-like behavior, Cua Driver posts messages directly to a window's message queue with PostMessage, and uses synthetic pointer-device injection for cases UIPI blocks. That differs from global input injection, where the OS routes input to the focused window. Many standard Win32 controls can respond while the active window stays active.
Only a narrow set of operations still require foreground. Some DirectInput games and raw-input canvases poll device state or reject background messages. UAC dialogs live on a protected desktop where normal automation paths are restricted.
Linux
Linux depends on the session stack. AT-SPI provides accessibility-tree access and action dispatch, giving the driver a semantic path for controls that expose accessibility metadata.
On X11, synthetic input and capture are window-addressable. XSendEvent can target a window ID without activating that window, and X11 window capture can read pixels from any mapped window. XWayland bridges help when an app still exposes an X11 surface.
Wayland is the notable exception. Its security model intentionally blocks one client from synthesizing arbitrary input into another client. That is the point of the design. Native Wayland-only apps remain a known gap for background input.
Agent Cursor Overlay
The real pointer is part of the user's working state, so Cua Driver does not move it to show agent activity. Instead, the driver can render a synthetic cursor overlay: a visible marker for agent activity.
This is useful for supervision. The user can glance at the screen and see the agent clicking a button, selecting a field, or dragging inside a target window, while their own pointer stays where they left it. The overlay is an observation aid, not the input device.
Honest Limits
The contract has exceptions, and they come from OS and application input models.
Canvas and game-style applications often poll raw input state rather than accepting normal window messages or accessibility actions. DirectInput games are a common Windows example. These apps may only respond when foregrounded because their event loop is built around active device state.
On macOS, SwiftUI windows that are off the active Space can lose their accessibility tree by OS design. The window may still exist, and pixels may still be capturable, but the semantic tree can disappear until the OS considers that UI present.
These cases are not bugs in Cua Driver. They are boundaries set by the host platform. The contract is the default wherever the OS and target application provide background-addressable input, accessibility, and capture paths.