Documentation

Everything you need to connect your browser to AI agents via Oya Browser.

Quickstart

Go to the dashboard and click Generate to create an API key
Oya Browser for your OS
Open the app, enter wss://browser.oya.ai/ws as server URL and paste your API key
Your browser appears in the dashboard — you can now send commands or connect AI tools

Create API Key

Go to the dashboard. Click the Generate button next to the API key field. This creates a random key and registers it with the server.

Your key is scoped — you only see browsers connected with your key. Other users' browsers are invisible to you.

Save your key somewhere safe. If you lose it, you'll need to generate a new one. The old key still works for any browsers already connected with it.

Download Browser

Platform	Download
macOS (Intel + Apple Silicon)	Oya Browser.dmg
Linux (arm64)	Oya Browser.AppImage

macOS:Open the .dmg, drag to Applications. On first launch, macOS may block the app because it's not notarized. Fix:

xattr -cr /Applications/Oya\ Browser.app

Or: right-click the app → Open → Open (bypasses Gatekeeper once).

Linux: chmod +x the AppImage and run it.

Running multiple instances

To open multiple browser windows (e.g. different accounts or different API keys):

# macOS — open another instance
open -n "/Applications/Oya Browser.app"

# With separate sessions (own cookies, own config)
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-2
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-3

# Linux
./Oya-Browser.AppImage --user-data-dir=/tmp/oya-2

Each --user-data-dir gets its own cookies, logins, and config — fully isolated sessions.

Connect

Open Oya Browser. The setup screen appears on first launch.

Field	Value
Server URL	`wss://browser.oya.ai/ws`
API Key	The key you generated in the dashboard
Browser Name	Optional — how it shows in the dashboard

Click Connect. The green dot in the toolbar confirms the connection. Your browser now appears in the dashboard.

MCP Setup

Oya Browser exposes each connected browser as an MCP server at:

https://browser.oya.ai/mcp/{BROWSER_ID}

Get your browser's ID from the dashboard (shown under each browser name, or in the MCP Tools tab).

Cursor

Add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "oya-browser": {
      "url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
      "transport": "streamable-http",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Claude Desktop

Add to Claude Desktop's MCP config (Settings → Developer → Edit Config):

{
  "mcpServers": {
    "oya-browser": {
      "url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
      "transport": "streamable-http",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Claude Code

Same config — add to your project's .claude/mcp.json or use the /browse skill command included in the repo.

analyze_page

Analyzes the current page. Returns the full page as structured markdown with every interactive element numbered.

// No parameters
analyze_page()

Returns:

Page metadata — URL, title, viewport size, scroll position
Full page content as markdown with inline element annotations like [#5 button "Submit"]
Element index — all elements listed with IDs, types, labels, visibility flags

Always call analyze_page before using click or type. Element IDs only exist after analysis and reset on every call.

navigate

Navigate the browser to a URL.

navigate({ url: "https://example.com" })

After navigating, call analyze_page again — old element IDs are invalid on the new page.

click

Click an interactive element by its ID number from analyze_page.

click({ element_id: 13 })

The element was tagged with data-ac-id="13" during analysis — the click resolves via a single querySelector.

type

Type text into an input element. Clears existing content first, then types character by character with realistic key events.

type({ element_id: 9, text: "hello world" })

press_key

Press a keyboard key. Useful for submitting forms (Enter), dismissing dialogs (Escape), or navigating (Tab, arrows).

press_key({ key: "Enter" })

Supported keys: Enter, Escape, Tab, Backspace, ArrowDown, ArrowUp, or any character.

screenshot

Capture the visible tab as a base64 PNG image.

screenshot()

scroll

Scroll the page up or down.

scroll({ direction: "down", amount: 500 })

Param	Type	Description
`direction`	`"up"` \| `"down"`	Scroll direction
`amount`	number (optional)	Pixels to scroll, default 500

Tab Management

list_tabs

List all open tabs with ID, title, URL, and which is active.

list_tabs()

open_tab

Open a new tab, optionally at a URL.

open_tab({ url: "https://gmail.com" })

switch_tab

Switch to a tab by ID (from list_tabs).

switch_tab({ tab_id: 2 })

close_tab

Close a tab. Closes the active tab if no ID specified.

close_tab({ tab_id: 3 })

wait

Wait for an element matching a CSS selector to appear on the page.

wait({ selector: ".results", timeout: 10000 })

Anonymity

Create and manage browser profiles with unique fingerprints, proxy routing, and isolated cookie stores. Each profile is a complete identity — different canvas hash, WebGL renderer, navigator properties, and session storage. Switch identities with a single MCP call.

Anonymity features are available in the Oya Browser desktop app. The Chrome extension does not include fingerprint or proxy management.

Fingerprint Spoofing

Each profile generates a coherent set of browser fingerprints that are internally consistent per platform. A Win32 profile gets Windows GPU strings, Windows fonts, and matching screen resolutions.

Canvas — deterministic pixel noise on toDataURL and toBlob
WebGL — spoofed vendor/renderer strings from real GPU database
AudioContext — noise on OfflineAudioContext.startRendering
ClientRects — sub-pixel noise on getBoundingClientRect (bypassed internally for click accuracy)
Navigator — platform, hardwareConcurrency, deviceMemory, languages, vendor
Screen — width, height, colorDepth, devicePixelRatio
WebRTC — ICE candidates stripped to prevent local IP leak
Fonts — platform-consistent font sets

Proxy Support

Each profile can include a SOCKS5 or HTTP/HTTPS proxy. The proxy is applied at the Electron session level — all traffic routes through it, including DNS (for SOCKS5). Timezone and locale auto-match the proxy's geographic location via CDP Emulation.

create_profile({
  platform: "Win32",
  timezone: "America/New_York",
  proxy_type: "socks5",
  proxy_host: "1.2.3.4",
  proxy_port: 1080,
  proxy_username: "user",
  proxy_password: "pass"
})

Anti-Detection Stealth

Always active — no configuration needed. The stealth layer removes automation indicators that anti-bot systems check for:

navigator.webdriver removed
Electron globals (window.process, window.require) deleted
window.chrome fixed to match real Chrome (app, runtime, csi, loadTimes)
navigator.plugins populated with PDF viewers
navigator.permissions.query patched
Sec-CH-UA headers rewritten to hide Electron
Google telemetry domains blocked at the network level

list_profiles

List all available anonymity profiles on the connected browser. Shows which profile is active.

list_profiles()

Returns each profile's ID, platform, timezone, and whether it has a proxy configured.

create_profile

Create a new anonymity profile with a randomized browser fingerprint. All values are generated to be internally consistent for the chosen platform.

create_profile({
  platform: "Win32",
  timezone: "Europe/London",
  locale: "en-GB"
})

Param	Type	Description
`platform`	string	Win32, MacIntel, or Linux x86_64
`timezone`	string	IANA timezone (e.g. America/New_York)
`locale`	string	Locale (e.g. en-US, en-GB)
`proxy_type`	string	http or socks5
`proxy_host`	string	Proxy server hostname or IP
`proxy_port`	number	Proxy server port
`proxy_username`	string	Proxy auth username
`proxy_password`	string	Proxy auth password

set_profile

Switch to a different anonymity profile. This closes all open tabs and reopens the browser with the new profile's fingerprint, proxy, timezone, and isolated cookie store.

set_profile({ profile_id: "profile-a1b2c3" })

Param	Type	Description
`profile_id`	string (required)	ID of the profile to activate

Switching profiles closes all open tabs. The browser reopens on google.com with the new identity.

Dashboard

The dashboard at /dashboard is the control panel. It shows your connected browsers and lets you interact with them.

Generate key — click Generate in the API key bar to create a new key
Browser list — shows all browsers connected with your key
Commands tab — quick buttons for analyze, screenshot, scroll + input fields for navigate, click, type
MCP Tools tab — shows the MCP endpoint URL, copy-paste config for Cursor/Claude, and a tool runner

Chat

Control the browser with natural language — available in both the web dashboard and the desktop app's dev panel. Type "go to google and search for cats" and the AI navigates, types, clicks, and reports back.

Formatted markdown responses with bold, code, lists, and headings
Tool call badges showing which MCP tools the AI used (analyze_page, click, type, etc.)
Copy button on hover to copy any response
Automatic context trimming when conversations get long
Conversation history preserved across messages

Chat requires an OpenAI API key. Set it in the Settings panel (gear icon in the API key bar) or OPENAI_API_KEYenv var on the server. This is optional — you don't need it for MCP tools.

Dev Panel (Desktop App)

The desktop app's dev panel ({} button in the toolbar) has four tabs:

Chat — natural language browser control with formatted responses and tool badges
Actions — quick-fire buttons and input fields for every command: analyze, screenshot, navigate, click by element #, type, press keys, hover, scroll, wait, tab management
Network — live WebSocket traffic with IN/OUT badges, expandable payloads, filter by direction or type (All, In, Out, Commands, Results)
Source — view the page as AI sees it: toggle between Markdown (analyzePage output) and HTML source, refresh on demand

Live View

The Commands tab shows a live view of the browser below the command buttons. Frames are streamed as JPEG via SSE at ~2fps.

Settings

Click the gear icon next to the API key bar. Configure:

OpenAI API Key — for the Chat feature
Chat Model — default gpt-4o-mini
Base URL — override for compatible APIs (Azure OpenAI, local LLMs, etc.)

Settings are saved on the server and persist across restarts.

REST API

All endpoints require Authorization: Bearer YOUR_API_KEY header (except health and register). Interactive API testing available at /swagger.

Method	Endpoint	Description
`GET`	`/health`	Server status + browser count
`POST`	`/register-key`	Register a new API key (`{ "key": "..." }`)
`GET`	`/browsers`	List your connected browsers
`POST`	`/browsers/:id/command`	Send command (`{ "action": "...", "params": {} }`)
`POST`	`/browsers/:id/chat`	Chat (`{ "messages": [...] }`)
`GET`	`/live/:id?key=...`	SSE live view frame stream
`GET/POST`	`/mcp/:id`	MCP Streamable HTTP endpoint
`GET`	`/config`	Get server settings
`POST`	`/config`	Update server settings

Command API Reference

Send commands via POST /browsers/:id/command. Each action uses only specific params — the rest are ignored.

Navigation Actions

Action	Params	Description
`navigate`	`url` (required)	Navigate to a URL
`open_tab`	`url` (optional)	Open a new tab
`switch_tab`	`tab_id` (required)	Activate a tab by ID
`close_tab`	`tab_id` (optional, defaults to active)	Close a tab
`list_tabs`	none	List all open tabs

Page Analysis Actions

Action	Params	Description
`analyze`	none	Full page as markdown + numbered elements
`read_page`	`selector` (optional), `limit` (default 50)	Lightweight element listing
`screenshot`	none	Capture page as PNG

Interaction Actions

Action	Params	Description
`click`	`selector` (e.g. `[data-ac-id="3"]`)	Click an element
`type`	`selector` + `text`	Type into an input
`press_key`	`key` (e.g. Enter, Tab, Escape)	Press a keyboard key
`scroll`	`direction` (up/down), `amount` (px, default 500)	Scroll the page
`wait`	`selector`, `timeout` (ms, default 10000)	Wait for element to appear

Examples

// Navigate to a page
{ "action": "navigate", "params": { "url": "https://google.com" } }

// Analyze current page (no params needed)
{ "action": "analyze" }

// Click element #3 from analyze results
{ "action": "click", "params": { "selector": "[data-ac-id=\"3\"]" } }

// Type into element #9
{ "action": "type", "params": { "selector": "[data-ac-id=\"9\"]", "text": "hello world" } }

// Press Enter
{ "action": "press_key", "params": { "key": "Enter" } }

// Scroll down
{ "action": "scroll", "params": { "direction": "down", "amount": 500 } }

// Screenshot (no params needed)
{ "action": "screenshot" }

// List all tabs
{ "action": "list_tabs" }

// Open new tab
{ "action": "open_tab", "params": { "url": "https://gmail.com" } }

// Switch to tab
{ "action": "switch_tab", "params": { "tab_id": 2 } }

// Close tab (omit tab_id to close active tab)
{ "action": "close_tab", "params": { "tab_id": 3 } }

// Wait for element
{ "action": "wait", "params": { "selector": ".results", "timeout": 10000 } }

// Read page elements (lightweight)
{ "action": "read_page", "params": { "limit": 20 } }

Typical Workflow

1. navigate → go to the page
2. analyze  → understand the page, get element IDs
3. click / type / press_key / scroll → interact
4. analyze  → re-analyze after page changes (old IDs are invalid)
5. repeat until task is done

WebSocket Protocol

Browsers connect via WebSocket at wss://browser.oya.ai/ws.

Auth

First message from browser:

{ "type": "auth", "api_key": "...", "browser_id": "...", "browser_name": "..." }

Server responds:

{ "type": "auth_ok", "browser_id": "..." }

Commands

Server → Browser:

{ "type": "cmd", "id": "uuid", "action": "analyze", "params": {} }

Browser → Server:

{ "type": "cmd_result", "id": "uuid", "ok": true, "data": { ... } }

Ping/Pong

Both sides send { "type": "ping" } and respond with { "type": "pong" } every 15-20 seconds.

Live Stream

Server → Browser: { "type": "stream_start", "fps": 2 }

Browser → Server: { "type": "frame", "data": "data:image/jpeg;base64,..." }

Server → Browser: { "type": "stream_stop" }