Developer documents

Use SAI through Conductor.

Conductor exposes SAI's vision-and-action loop as MCP tools. Agents observe with describe, resolve targets with locate, act through OS input, then re-describe to verify progress.

01

Core Loop

1

Describe

Call describe with a focused hint, such as foreground browser window.

2

Locate

Resolve the visible target before acting.

3

Act

Use click, type_text, scroll, drag, key, or hotkey through OS input.

4

Verify

Re-describe the affected area and check whether state changed.

5

Retry

If the result is wrong, re-describe, re-locate, and retry with a more specific target.

02

Setup

On macOS, Conductor runs as a native app and needs Accessibility permission for input control. The grounder endpoint is configurable: local, self-hosted, or remote.

Step 1 — clear macOS quarantine flag on first run

xattr -dr com.apple.quarantine /Applications/conductor-mcp.app

Step 2 — register with Claude Code

claude mcp add conductor-desktop --scope user \
  --env CONDUCTOR_MCP_MOBILE_ANDROID="true" \
  --env CONDUCTOR_MCP_SCREENSHOT_MAX_WIDTH="1280" \
  --env CONDUCTOR_MCP_BACKEND="resident" \
  --env CONDUCTOR_MCP_SCREENSHOT_MODE="file" \
  --env CONDUCTOR_MCP_AUTO_SCREENSHOT="false" \
  --env CONDUCTOR_MCP_GROUNDER_URL="https://your-grounder.example.com" \
  -- /Applications/conductor-mcp.app/Contents/MacOS/conductor-mcp

03

Model Stack

Conductor uses two separate models with distinct roles. They do not share a context window.

Orchestrating LLM

Your agent model

The model driving the task — Claude, GPT-4o, or any MCP-compatible LLM. It receives prose, labels, coordinates, and state from the grounder, not raw image pixels. It decides what to do next and calls Conductor tools.

Grounder VLM

Vision model

A vision-language model that turns the screen into structured descriptions and coordinates. Powers describe and locate. Configurable via CONDUCTOR_MCP_GROUNDER_URL. Reference model: Qwen3-VL-4B-Instruct (Q4_K_M, GGUF) served via llama.cpp. The grounder_family config key controls prompt formatting.

The grounder can run locally on-device, on a self-hosted GPU server, or at a remote endpoint. Do not assume a specific model, fixed latency, or cloud-free processing unless the deployment confirms it.

04

Tool Surface

Perception

describeRead the visible screen and return prose, labels, state, and context.
locateResolve a natural-language target to screen coordinates.
cropInspect a focused visual region when the full screen is too broad.
wait_forWait until a named UI element appears before continuing.
list_windowsRead window titles, focus, bounds, z-order, and modality.
get_scene_graphInspect the window topology tree.

OS Input

click / double_click / right_clickMouse actions against semantic targets.
click_at / drag_atCoordinate-based precision actions.
type_textType into the currently focused field.
key / hotkeySend keyboard keys and shortcuts.
dragDrag from one semantic target to another.
scrollScroll within a visible region.
mouse_moveMove without clicking, usually for hover UI.

Browser

web_list_tabsList Chrome tabs when DevTools Protocol is available.
web_evalRun JavaScript in a tab and return a result.
web_cropCapture a DOM element by selector, text, or JavaScript expression.
web_markOutline an element visually, then use visual tools.

Mobile

mobile_list_devicesList connected Android devices.
describe(device_id)Observe a specific mobile device screen.
locate(device_id)Resolve a mobile UI target to coordinates.
mobile_tap / mobile_swipeTouch input on the device.
mobile_type / mobile_keyText and key input on the device.
mobile_app_launch / mobile_shellLaunch Android apps or run adb shell commands.

05

Knowledge Base

Conductor includes a persistent knowledge base (SQLite) that agents can read and write across sessions. It stores three types of entries: Skills (repeatable action patterns), Facts (discovered environment details), and Experiences (completed task outcomes). Enable it by setting CONDUCTOR_MCP_KB_PATH.

# Add to your claude mcp add command:
--env CONDUCTOR_MCP_KB_PATH="/path/to/brain.db" \
--env CONDUCTOR_MCP_KB_WRITE_ENABLED="true"
kb_record_skillRecord a repeatable action pattern the agent has learned.
kb_record_factStore a discovered fact about the environment or UI.
kb_record_experienceLog a completed task outcome for future reference.
kb_searchSearch the KB with a natural-language query.
kb_get_skillRetrieve a specific skill by name.
kb_mark_contradictedMark a KB entry as outdated or incorrect.
kb_briefSummarise KB contents relevant to the current task.

kb_write_enabled defaults to false — the agent can read and search the KB but not write new entries unless explicitly enabled. The KB tab in the dashboard shows all stored entries across the three subtabs.

06

Dashboard

When the resident backend is running, a local dashboard is available at http://127.0.0.1:8765/dashboard. The port can be changed via the port config key.

Overview

State pillIDLE / RUNNING — live agent state
Roleholder or subordinate agent mode
Backendresident or stdio
Grounder URLActive grounder endpoint
KB attachedWhether a knowledge base file is mounted
TransportMCP transport in use
Input pausedWhether desktop input is currently suspended
PID / UptimeProcess ID and running time
Re-run permission testRe-validates macOS Accessibility access

Tool calls

Live history of every tool call: name, klass (obs for perception tools, act for input/action tools), status (ok / err), duration, timestamp, and truncated args. Most recent first. The system tray icon also shows the last 5 calls.

Knowledge base

Browse KB entries across three subtabs: Skills, Facts, Experiences. Requires CONDUCTOR_MCP_KB_PATH to be set.

Config

Live view of all config keys split into hot-reloadable (take effect on next tool call) and restart-required. See the Config reference section below for the full key list.

07

Config Reference

All keys are set as environment variables on the claude mcp add command with the CONDUCTOR_MCP_ prefix (e.g. CONDUCTOR_MCP_TEXT_ONLY=true). Hot-reloadable keys take effect on the next tool call without restarting Conductor.

Hot-reloadableeffective on next tool call

KeyDefaultDescription
text_onlyfalseReturn text descriptions only — no image payload. Eliminates image tokens from the agent loop.
auto_screenshotfalseCapture a screenshot after every tool call.
screenshot_max_width1280Maximum pixel width of screenshots sent to the agent.
payload_max_width540Maximum width of inline image payloads in tool results.
tool_timingfalseAppend execution duration to every tool result.
deltas_enabledfalseInclude structural UI delta events (navigations, focus changes, node additions) in tool results.

Restart-requiredneeds conductor-mcp restart

KeyDefaultDescription
backendresidentTransport backend. resident keeps the process alive; stdio restarts per call.
grounder_urlURL of the VLM grounder endpoint. Supports local (llama.cpp), self-hosted, or remote.
grounder_familyqwenGrounder model family — determines prompt formatting. Options: qwen, openai.
grounder_timeout30Seconds before a grounder request times out.
host127.0.0.1Host the dashboard and resident backend bind to.
port8765Port the dashboard serves on.
transportstdioMCP transport layer. stdio for Claude Code; sse for other clients.
kb_path~/.conductor-mcp/brain.dbPath to the SQLite knowledge base file. Required to enable the KB tab and KB tools.
kb_write_enabledfalseAllow the agent to write new entries to the KB. False = read-only.

Advancedless commonly tuned

KeyDefaultDescription
coord_systemnorm1000Coordinate space used by locate. norm1000 normalises to 0–1000 on each axis.
cdp_host / cdp_port127.0.0.1 / 9222Chrome DevTools Protocol endpoint for web_eval and web_crop.
cdp_default_tab_filter(empty)Default tab substring filter when no match_url/match_title is passed to web tools.
mobile_android_enabledtrueEnable ADB-based Android device support.
mobile_ios_enabledfalseEnable iOS device support via WebDriverAgent.
mobile_ios_wda_url(empty)WebDriverAgent URL for iOS device control.
qdrant_url(empty)Qdrant vector DB URL for semantic KB search. Leave empty to use SQLite FTS only.
qdrant_collection_prefixconductorPrefix for Qdrant collection names.
tei_dense_url(empty)Text Embeddings Inference URL for dense vector embeddings (KB semantic search).
tei_sparse_doc_url / tei_sparse_query_url(empty)TEI URLs for sparse SPLADE embeddings.
screenshot_modefileHow screenshots are returned: file writes to disk; inline sends base64.
screenshot_dir~/.conductor-mcp/screenshotsDirectory where screenshot files are saved.
wait_for_poll_interval0.5Seconds between polls when wait_for is watching for an element.

08

Agent Pattern

Prompt rule

Observe with describe before acting.
Use locate for semantic targets.
Prefer click/type/scroll/drag/hotkey through OS input.
After each action, re-describe the affected area and verify progress.
If a click misses, re-describe, re-locate, and retry with a more specific target.

Deployment wording

The agent receives descriptions, labels, coordinates, and state in the normal loop. The grounder can run locally, self-hosted, or at a configured endpoint. Do not claim fixed latency, no-cloud processing, or a specific model unless the deployment proves it.

09

Workflow Examples

Click a visible button

describe(hint="foreground browser window")
locate(target="Get notified button in the SAI hero")
click(target="Get notified button in the SAI hero")
describe(hint="dialog opened by Get notified")

Scroll a page

describe(hint="browser page content")
scroll(target="browser page content", direction="down", amount=5)
describe(hint="newly visible section after scrolling")

Fill a form

describe(hint="email capture dialog")
click(target="email input in the dialog")
type_text(text="developer@example.com")
click(target="submit button in the dialog")
wait_for(target="success message in the dialog")

Recover from a misclick

describe(hint="settings panel")
locate(target="Wi-Fi toggle in the network settings panel")
click(target="Wi-Fi toggle in the network settings panel")
describe(hint="network settings panel")
locate(target="actual Wi-Fi on/off switch, not the sidebar row")
click(target="actual Wi-Fi on/off switch, not the sidebar row")

10

Troubleshooting

Input lock

If click, scroll, or type actions fail with an input-lock error, another Conductor client owns desktop input. Read-only tools may still work until that client exits.

CDP unavailable

If Chrome DevTools Protocol is not reachable, use the visual path: describe, locate, click, scroll, and type_text.

Target not found

Rephrase with more visual context: blue Save button in the top-right toolbar, or actual Wi-Fi switch, not the sidebar row.

UI still loading

Use wait_for with a named element instead of repeatedly clicking while the screen is changing.