Puppeteer

Anthropic's official Puppeteer server for browser automation.

Works with: Claude DesktopClaude CodeCursorWindsurfCline
Quick install
npx -y @modelcontextprotocol/server-puppeteer

How to install the Puppeteer MCP server

Add this to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer"
      ]
    }
  }
}

Add this to your Claude Code MCP configuration:

npx -y @modelcontextprotocol/server-puppeteer

Add this to your Cursor MCP configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer"
      ]
    }
  }
}

Add this to your Windsurf MCP configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer"
      ]
    }
  }
}

Add this to your Cline MCP configuration:

{
  "mcpServers": {
    "puppeteer": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-puppeteer"
      ]
    }
  }
}

The Puppeteer MCP server is Anthropic’s official browser-automation server. It gives Claude full control of a headless Chromium instance: navigate URLs, click elements, fill forms, take screenshots, scrape content.

This is the foundational server for any agentic workflow that needs to interact with the live web beyond simple search and fetch. Combine it with the Fetch and Search servers to build full research agents.

Why use it

Some content lives behind interactivity. Logins, infinite scrolls, JavaScript-rendered SPAs, and forms all need a real browser. Puppeteer gives Claude one.

For testing, monitoring, scraping, or “go to X, do Y, report back” workflows, this is the right tool. For simpler “fetch a URL and read it” tasks, the Fetch server is lighter.

What it actually does

A rich tool surface: navigate to URL, click selector, type into input, take screenshot, evaluate JavaScript, scroll, wait for selector, get HTML. Claude composes these into multi-step workflows.

Practical patterns:

  • “Navigate to my dashboard, take a screenshot, and tell me what’s broken.”
  • “Fill out the form at example.com/signup with these test credentials and report any errors.”
  • “Scroll through this page until you find the pricing section, then screenshot it.”

Gotchas

Browser automation is heavy. Each invocation spins up Chromium. For high-throughput workflows, consider Browserbase which provides hosted browsers with persistent sessions.

JavaScript evaluation runs in the page context. This is powerful but dangerous. Don’t navigate to untrusted sites with sensitive data on the same machine; cross-origin scripts could exfiltrate.

Screenshots can grow large quickly. Each screenshot Claude takes consumes context window space. For long sessions, take screenshots strategically rather than after every action.

For agentic browser workflows, also look at Browser Use and Claude in Chrome, which abstract the lower-level Puppeteer/Playwright APIs into more LLM-friendly interfaces.

Puppeteer MCP server: FAQs

How is it different from Playwright?

Puppeteer is Chromium-only and Google-maintained. Playwright is cross-browser (Chromium, Firefox, WebKit) and Microsoft-maintained. For most Claude workflows either works; pick Playwright if you need WebKit support.

Does it run in headless or headed mode?

Headless by default. You can flip it to headed for debugging by setting environment variables. Most production workflows stay headless.

Can it bypass CAPTCHAs?

No, and you shouldn't try. CAPTCHAs are anti-bot defenses. The server respects them. If you need to interact with a site that requires human verification, do it manually.

Is it safe to give it network access?

It runs Chromium with whatever capabilities the host environment grants. Don't run it on a sensitive network. For production, use Browser Use or Browserbase, which sandbox the browser process.

How does it integrate with vision?

It can take screenshots that Claude reads. This means Claude can navigate visually rather than relying purely on the DOM. Useful for sites where the DOM is obfuscated or dynamically rendered.