Documentation

Everything you need to connect your browser to AI agents via Oya Browser.

Quickstart

  1. Go to the dashboard and click Generate to create an API key
  2. Oya Browser for your OS
  3. Open the app, enter wss://browser.oya.ai/ws as server URL and paste your API key
  4. Your browser appears in the dashboard — you can now send commands or connect AI tools

Create API Key

Go to the dashboard. Click the Generate button next to the API key field. This creates a random key and registers it with the server.

Your key is scoped — you only see browsers connected with your key. Other users' browsers are invisible to you.

Save your key somewhere safe. If you lose it, you'll need to generate a new one. The old key still works for any browsers already connected with it.

Download Browser

PlatformDownload
macOS (Intel + Apple Silicon)Oya Browser.dmg
Linux (arm64)Oya Browser.AppImage

macOS:Open the .dmg, drag to Applications. On first launch, macOS may block the app because it's not notarized. Fix:

xattr -cr /Applications/Oya\ Browser.app

Or: right-click the app → Open → Open (bypasses Gatekeeper once).

Linux: chmod +x the AppImage and run it.

Running multiple instances

To open multiple browser windows (e.g. different accounts or different API keys):

# macOS — open another instance
open -n "/Applications/Oya Browser.app"

# With separate sessions (own cookies, own config)
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-2
open -n "/Applications/Oya Browser.app" --args --user-data-dir=/tmp/oya-3

# Linux
./Oya-Browser.AppImage --user-data-dir=/tmp/oya-2

Each --user-data-dir gets its own cookies, logins, and config — fully isolated sessions.

Connect

Open Oya Browser. The setup screen appears on first launch.

FieldValue
Server URLwss://browser.oya.ai/ws
API KeyThe key you generated in the dashboard
Browser NameOptional — how it shows in the dashboard

Click Connect. The green dot in the toolbar confirms the connection. Your browser now appears in the dashboard.

MCP Setup

Oya Browser exposes each connected browser as an MCP server at:

https://browser.oya.ai/mcp/{BROWSER_ID}

Get your browser's ID from the dashboard (shown under each browser name, or in the MCP Tools tab).

Cursor

Add to .cursor/mcp.json in your project:

{
  "mcpServers": {
    "oya-browser": {
      "url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
      "transport": "streamable-http",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Claude Desktop

Add to Claude Desktop's MCP config (Settings → Developer → Edit Config):

{
  "mcpServers": {
    "oya-browser": {
      "url": "https://browser.oya.ai/mcp/YOUR_BROWSER_ID",
      "transport": "streamable-http",
      "headers": {
        "Authorization": "Bearer YOUR_API_KEY"
      }
    }
  }
}

Claude Code

Same config — add to your project's .claude/mcp.json or use the /browse skill command included in the repo.

analyze_page

Analyzes the current page. Returns the full page as structured markdown with every interactive element numbered.

// No parameters
analyze_page()

Returns:

  • Page metadata — URL, title, viewport size, scroll position
  • Full page content as markdown with inline element annotations like [#5 button "Submit"]
  • Element index — all elements listed with IDs, types, labels, visibility flags
Always call analyze_page before using click or type. Element IDs only exist after analysis and reset on every call.

Navigate the browser to a URL.

navigate({ url: "https://example.com" })
After navigating, call analyze_page again — old element IDs are invalid on the new page.

click

Click an interactive element by its ID number from analyze_page.

click({ element_id: 13 })

The element was tagged with data-ac-id="13" during analysis — the click resolves via a single querySelector.

type

Type text into an input element. Clears existing content first, then types character by character with realistic key events.

type({ element_id: 9, text: "hello world" })

press_key

Press a keyboard key. Useful for submitting forms (Enter), dismissing dialogs (Escape), or navigating (Tab, arrows).

press_key({ key: "Enter" })

Supported keys: Enter, Escape, Tab, Backspace, ArrowDown, ArrowUp, or any character.

screenshot

Capture the visible tab as a base64 PNG image.

screenshot()

scroll

Scroll the page up or down.

scroll({ direction: "down", amount: 500 })
ParamTypeDescription
direction"up" | "down"Scroll direction
amountnumber (optional)Pixels to scroll, default 500

Tab Management

list_tabs

List all open tabs with ID, title, URL, and which is active.

list_tabs()

open_tab

Open a new tab, optionally at a URL.

open_tab({ url: "https://gmail.com" })

switch_tab

Switch to a tab by ID (from list_tabs).

switch_tab({ tab_id: 2 })

close_tab

Close a tab. Closes the active tab if no ID specified.

close_tab({ tab_id: 3 })

wait

Wait for an element matching a CSS selector to appear on the page.

wait({ selector: ".results", timeout: 10000 })

Anonymity

Create and manage browser profiles with unique fingerprints, proxy routing, and isolated cookie stores. Each profile is a complete identity — different canvas hash, WebGL renderer, navigator properties, and session storage. Switch identities with a single MCP call.

Anonymity features are available in the Oya Browser desktop app. The Chrome extension does not include fingerprint or proxy management.

Fingerprint Spoofing

Each profile generates a coherent set of browser fingerprints that are internally consistent per platform. A Win32 profile gets Windows GPU strings, Windows fonts, and matching screen resolutions.

  • Canvas — deterministic pixel noise on toDataURL and toBlob
  • WebGL — spoofed vendor/renderer strings from real GPU database
  • AudioContext — noise on OfflineAudioContext.startRendering
  • ClientRects — sub-pixel noise on getBoundingClientRect (bypassed internally for click accuracy)
  • Navigator — platform, hardwareConcurrency, deviceMemory, languages, vendor
  • Screen — width, height, colorDepth, devicePixelRatio
  • WebRTC — ICE candidates stripped to prevent local IP leak
  • Fonts — platform-consistent font sets

Proxy Support

Each profile can include a SOCKS5 or HTTP/HTTPS proxy. The proxy is applied at the Electron session level — all traffic routes through it, including DNS (for SOCKS5). Timezone and locale auto-match the proxy's geographic location via CDP Emulation.

create_profile({
  platform: "Win32",
  timezone: "America/New_York",
  proxy_type: "socks5",
  proxy_host: "1.2.3.4",
  proxy_port: 1080,
  proxy_username: "user",
  proxy_password: "pass"
})

Anti-Detection Stealth

Always active — no configuration needed. The stealth layer removes automation indicators that anti-bot systems check for:

  • navigator.webdriver removed
  • Electron globals (window.process, window.require) deleted
  • window.chrome fixed to match real Chrome (app, runtime, csi, loadTimes)
  • navigator.plugins populated with PDF viewers
  • navigator.permissions.query patched
  • Sec-CH-UA headers rewritten to hide Electron
  • Google telemetry domains blocked at the network level

list_profiles

List all available anonymity profiles on the connected browser. Shows which profile is active.

list_profiles()

Returns each profile's ID, platform, timezone, and whether it has a proxy configured.

create_profile

Create a new anonymity profile with a randomized browser fingerprint. All values are generated to be internally consistent for the chosen platform.

create_profile({
  platform: "Win32",
  timezone: "Europe/London",
  locale: "en-GB"
})
ParamTypeDescription
platformstringWin32, MacIntel, or Linux x86_64
timezonestringIANA timezone (e.g. America/New_York)
localestringLocale (e.g. en-US, en-GB)
proxy_typestringhttp or socks5
proxy_hoststringProxy server hostname or IP
proxy_portnumberProxy server port
proxy_usernamestringProxy auth username
proxy_passwordstringProxy auth password

set_profile

Switch to a different anonymity profile. This closes all open tabs and reopens the browser with the new profile's fingerprint, proxy, timezone, and isolated cookie store.

set_profile({ profile_id: "profile-a1b2c3" })
ParamTypeDescription
profile_idstring (required)ID of the profile to activate
Switching profiles closes all open tabs. The browser reopens on google.com with the new identity.

Dashboard

The dashboard at /dashboard is the control panel. It shows your connected browsers and lets you interact with them.

  • Generate key — click Generate in the API key bar to create a new key
  • Browser list — shows all browsers connected with your key
  • Commands tab — quick buttons for analyze, screenshot, scroll + input fields for navigate, click, type
  • MCP Tools tab — shows the MCP endpoint URL, copy-paste config for Cursor/Claude, and a tool runner

Chat

Control the browser with natural language — available in both the web dashboard and the desktop app's dev panel. Type "go to google and search for cats" and the AI navigates, types, clicks, and reports back.

  • Formatted markdown responses with bold, code, lists, and headings
  • Tool call badges showing which MCP tools the AI used (analyze_page, click, type, etc.)
  • Copy button on hover to copy any response
  • Automatic context trimming when conversations get long
  • Conversation history preserved across messages
Chat requires an OpenAI API key. Set it in the Settings panel (gear icon in the API key bar) or OPENAI_API_KEYenv var on the server. This is optional — you don't need it for MCP tools.

Dev Panel (Desktop App)

The desktop app's dev panel ({} button in the toolbar) has four tabs:

  • Chat — natural language browser control with formatted responses and tool badges
  • Actions — quick-fire buttons and input fields for every command: analyze, screenshot, navigate, click by element #, type, press keys, hover, scroll, wait, tab management
  • Network — live WebSocket traffic with IN/OUT badges, expandable payloads, filter by direction or type (All, In, Out, Commands, Results)
  • Source — view the page as AI sees it: toggle between Markdown (analyzePage output) and HTML source, refresh on demand

Live View

The Commands tab shows a live view of the browser below the command buttons. Frames are streamed as JPEG via SSE at ~2fps.

Settings

Click the gear icon next to the API key bar. Configure:

  • OpenAI API Key — for the Chat feature
  • Chat Model — default gpt-4o-mini
  • Base URL — override for compatible APIs (Azure OpenAI, local LLMs, etc.)

Settings are saved on the server and persist across restarts.

REST API

All endpoints require Authorization: Bearer YOUR_API_KEY header (except health and register). Interactive API testing available at /swagger.

MethodEndpointDescription
GET/healthServer status + browser count
POST/register-keyRegister a new API key ({ "key": "..." })
GET/browsersList your connected browsers
POST/browsers/:id/commandSend command ({ "action": "...", "params": {} })
POST/browsers/:id/chatChat ({ "messages": [...] })
GET/live/:id?key=...SSE live view frame stream
GET/POST/mcp/:idMCP Streamable HTTP endpoint
GET/configGet server settings
POST/configUpdate server settings

Command API Reference

Send commands via POST /browsers/:id/command. Each action uses only specific params — the rest are ignored.

Navigation Actions

ActionParamsDescription
navigateurl (required)Navigate to a URL
open_taburl (optional)Open a new tab
switch_tabtab_id (required)Activate a tab by ID
close_tabtab_id (optional, defaults to active)Close a tab
list_tabsnoneList all open tabs

Page Analysis Actions

ActionParamsDescription
analyzenoneFull page as markdown + numbered elements
read_pageselector (optional), limit (default 50)Lightweight element listing
screenshotnoneCapture page as PNG

Interaction Actions

ActionParamsDescription
clickselector (e.g. [data-ac-id="3"])Click an element
typeselector + textType into an input
press_keykey (e.g. Enter, Tab, Escape)Press a keyboard key
scrolldirection (up/down), amount (px, default 500)Scroll the page
waitselector, timeout (ms, default 10000)Wait for element to appear

Examples

// Navigate to a page
{ "action": "navigate", "params": { "url": "https://google.com" } }

// Analyze current page (no params needed)
{ "action": "analyze" }

// Click element #3 from analyze results
{ "action": "click", "params": { "selector": "[data-ac-id=\"3\"]" } }

// Type into element #9
{ "action": "type", "params": { "selector": "[data-ac-id=\"9\"]", "text": "hello world" } }

// Press Enter
{ "action": "press_key", "params": { "key": "Enter" } }

// Scroll down
{ "action": "scroll", "params": { "direction": "down", "amount": 500 } }

// Screenshot (no params needed)
{ "action": "screenshot" }

// List all tabs
{ "action": "list_tabs" }

// Open new tab
{ "action": "open_tab", "params": { "url": "https://gmail.com" } }

// Switch to tab
{ "action": "switch_tab", "params": { "tab_id": 2 } }

// Close tab (omit tab_id to close active tab)
{ "action": "close_tab", "params": { "tab_id": 3 } }

// Wait for element
{ "action": "wait", "params": { "selector": ".results", "timeout": 10000 } }

// Read page elements (lightweight)
{ "action": "read_page", "params": { "limit": 20 } }

Typical Workflow

1. navigate → go to the page
2. analyze  → understand the page, get element IDs
3. click / type / press_key / scroll → interact
4. analyze  → re-analyze after page changes (old IDs are invalid)
5. repeat until task is done

WebSocket Protocol

Browsers connect via WebSocket at wss://browser.oya.ai/ws.

Auth

First message from browser:

{ "type": "auth", "api_key": "...", "browser_id": "...", "browser_name": "..." }

Server responds:

{ "type": "auth_ok", "browser_id": "..." }

Commands

Server → Browser:

{ "type": "cmd", "id": "uuid", "action": "analyze", "params": {} }

Browser → Server:

{ "type": "cmd_result", "id": "uuid", "ok": true, "data": { ... } }

Ping/Pong

Both sides send { "type": "ping" } and respond with { "type": "pong" } every 15-20 seconds.

Live Stream

Server → Browser: { "type": "stream_start", "fps": 2 }

Browser → Server: { "type": "frame", "data": "data:image/jpeg;base64,..." }

Server → Browser: { "type": "stream_stop" }