CuaGuideAdvanced

Message Format

Type definitions for agent messages and actions

The Agent SDK uses a consistent message format for all conversations. This page provides type definitions for building custom integrations or processing agent outputs programmatically.

Agent Response

Each iteration of agent.run() yields a response with outputs and usage:

class AgentResponse(TypedDict):
    output: list[AgentMessage]  # Messages from this iteration
    usage: Usage                # Token counts and cost

class Usage(TypedDict, total=False):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    response_cost: float  # Cost in USD

Message Types

The output list contains different message types:

User Message

Input from the user or system:

class UserMessage(TypedDict, total=False):
    type: Literal["message"]
    role: Literal["user", "system", "developer"]
    content: str | list[InputContent]

Assistant Message

Text output from the agent:

class AssistantMessage(TypedDict):
    type: Literal["message"]
    role: Literal["assistant"]
    content: list[OutputContent]

class OutputContent(TypedDict):
    type: Literal["output_text"]
    text: str

Reasoning Message

Agent's internal reasoning (when available):

class ReasoningMessage(TypedDict):
    type: Literal["reasoning"]
    summary: list[SummaryContent]

class SummaryContent(TypedDict):
    type: Literal["summary_text"]
    text: str

Computer Call

Action the agent wants to perform:

class ComputerCallMessage(TypedDict):
    type: Literal["computer_call"]
    call_id: str
    status: Literal["completed", "failed", "pending"]
    action: ComputerAction

Computer Call Output

Result of a computer action (screenshot):

class ComputerCallOutputMessage(TypedDict):
    type: Literal["computer_call_output"]
    call_id: str
    output: ComputerResultContent

class ComputerResultContent(TypedDict):
    type: Literal["computer_screenshot", "input_image"]
    image_url: str  # Base64 data URL

Function Call

Call to a custom tool:

class FunctionCallMessage(TypedDict):
    type: Literal["function_call"]
    call_id: str
    status: Literal["completed", "failed", "pending"]
    name: str
    arguments: str  # JSON string

class FunctionCallOutputMessage(TypedDict):
    type: Literal["function_call_output"]
    call_id: str
    output: str

Computer Actions

Actions represent operations the agent performs. The format varies by provider.

Common Actions

class ClickAction(TypedDict):
    type: Literal["click"]
    button: Literal["left", "right", "wheel", "back", "forward"]
    x: int
    y: int

class DoubleClickAction(TypedDict):
    type: Literal["double_click"]
    x: int
    y: int

class TypeAction(TypedDict):
    type: Literal["type"]
    text: str

class KeyPressAction(TypedDict):
    type: Literal["keypress"]
    keys: list[str]  # e.g., ["ctrl", "c"]

class ScrollAction(TypedDict):
    type: Literal["scroll"]
    x: int
    y: int
    scroll_x: int
    scroll_y: int

class MoveAction(TypedDict):
    type: Literal["move"]
    x: int
    y: int

class DragAction(TypedDict):
    type: Literal["drag"]
    path: list[tuple[int, int]]  # [(x1, y1), (x2, y2), ...]

class ScreenshotAction(TypedDict):
    type: Literal["screenshot"]

class WaitAction(TypedDict):
    type: Literal["wait"]

Anthropic-Specific Actions

class LeftMouseDownAction(TypedDict):
    type: Literal["left_mouse_down"]
    x: int
    y: int

class LeftMouseUpAction(TypedDict):
    type: Literal["left_mouse_up"]
    x: int
    y: int

Processing Messages

Handle different message types in your code:

async for result in agent.run(messages):
    for item in result["output"]:
        match item["type"]:
            case "message":
                print(f"Agent: {item['content'][0]['text']}")
            case "reasoning":
                print(f"Thinking: {item['summary'][0]['text']}")
            case "computer_call":
                print(f"Action: {item['action']['type']}")
            case "computer_call_output":
                print("Screenshot received")
            case "function_call":
                print(f"Tool call: {item['name']}")

Full Type Union

For type checking, use the union of all message types:

from typing import Union

AgentMessage = Union[
    UserMessage,
    AssistantMessage,
    ReasoningMessage,
    ComputerCallMessage,
    ComputerCallOutputMessage,
    FunctionCallMessage,
    FunctionCallOutputMessage,
]

ComputerAction = Union[
    ClickAction,
    DoubleClickAction,
    TypeAction,
    KeyPressAction,
    ScrollAction,
    MoveAction,
    DragAction,
    ScreenshotAction,
    WaitAction,
    LeftMouseDownAction,
    LeftMouseUpAction,
]

Was this page helpful?