v2.3 — Open Source

Your AI Agent
for the Browser

Type what you want in natural language. Crab-Agent navigates pages, clicks elements, fills forms, reads content, manages tabs — all autonomously via Chrome DevTools Protocol. Bring your own API key. Works with any LLM provider.

Get Extension See how it works

Click & Type

Navigate

Read Pages

Tabs

Schedule

Features

Everything you need to automate the web

A complete agent toolkit built into a Chrome side panel. No coding required.

Vision + Native AX Tree

Uses Chromium's native Accessibility tree with ref IDs (covers Shadow DOM, cross-origin iframes, custom elements). Pixel-perfect coordinates from the renderer.

Full Browser Control via CDP

Click, type, scroll, drag, navigate, open tabs, fill forms, upload files, execute JavaScript — hardware-level events through Chrome DevTools Protocol, not synthetic JS.

Native Tool Calling

Uses each provider's native tool-use API (Anthropic tool_use, OpenAI function calling, Gemini function_declarations). Structured responses, not text JSON parsing.

GIF Replay Recording

Record every task replay as GIF/HTML/JSON. Review what the agent did, debug failed flows, or share runs as visual artifacts.

Task Scheduler

Schedule tasks for the future — one-time or recurring. Natural language time parsing via LLM. The agent runs them automatically via Chrome alarms.

Domain Permission System

Domain-based permissions keep you in control. Smart message compaction with progressive token budgeting keeps long sessions stable.

How it works

From natural language to action

A tool-use agent loop that keeps going until the task is done.

You describe the task

Open the side panel and type what you want in plain language. Attach screenshots if needed. "Book the cheapest flight to Tokyo for next Friday"

Agent observes the page

Crab-Agent takes a screenshot and pulls the native CDP Accessibility tree, mapping every interactive element to a ref ID (e.g., ref_42) with pixel-perfect coordinates.

LLM decides the next action

The conversation (including visual context) is sent to your chosen LLM via native tool-calling APIs. It selects a tool — click, type, navigate, read — and the extension executes it via CDP.

Repeat until done

The result is appended to the conversation and the loop continues. State manager handles loop detection. The agent handles multi-step flows, tab switching, and error recovery automatically.

24 Built-in Tools

A tool for every browser interaction

22 external + 2 internal. The agent picks the right tool for each step.

computer (13 actions)

navigate

read_page (AX tree)

find (semantic)

form_input

get_page_text

tabs_context

tabs_create

switch_tab

close_tab

javascript_tool

file_upload

upload_image

document_generator

gif_creator

canvas_toolkit

visualize (charts)

code_editor

set_of_mark

resize_window

shortcuts_list

shortcuts_execute

read_console_messages

read_network_requests

update_plan

ask_user / done

Architecture

Built for reliability

React 18 + TypeScript + Vite. Chrome MV3 service worker. Multi-provider LLM gateway.

Side Panel (React 18 + Zustand) Chat | Workflows | Schedule | Settings | Port messages | Background Service Worker Session management | Tab groups | Alarms | Scheduler | Agent Loop (tool-use cycle) Screenshot -> AX Tree -> Call LLM -> Execute Tool -> Repeat | | LLM Gateway Tool Executors (CDP) Anthropic (tool_use) Hardware mouse/keyboard OpenAI (function calling) Browser: tabs, navigate Gemini (function_declarations) Page: read, find, JS OpenRouter Files: upload, download Ollama (text JSON) Docs, GIF, canvas OpenAI-compatible (auto-detect) Permissions, scheduler

Credits & Thanks

Standing on the shoulders of giants

Crab-Agent wouldn't exist without these amazing projects and their creators.

Clawd Tank

Assets & Mascot

The adorable "Clawd" crab pixel-art mascot and SVG animations used throughout Crab-Agent are derived from the Clawd Tank project by Marcio Granzotto. Thank you for the amazing crab character!

Claude for Chrome (Anthropic)

Agent Logic

Core agent loop architecture and browser automation logic inspired by Anthropic's Claude for Chrome. The tool-use cycle pattern — screenshot, observe, decide, act — draws heavily from their pioneering work on AI browser agents.

Your AI Agentfor the Browser