Official Design & Media

ElevenLabs

Text-to-speech generation with ElevenLabs voices.

Works with: Claude DesktopCursor
Quick install
npx -y @elevenlabs/mcp

How to install the ElevenLabs MCP server

Add this to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "elevenlabs": {
      "command": "npx",
      "args": [
        "-y",
        "@elevenlabs/mcp"
      ]
    }
  }
}

Add this to your Cursor MCP configuration:

{
  "mcpServers": {
    "elevenlabs": {
      "command": "npx",
      "args": [
        "-y",
        "@elevenlabs/mcp"
      ]
    }
  }
}

The ElevenLabs MCP server gives Claude direct access to the best text-to-speech engine on the market. Generate audio from any prompt, use prebuilt or cloned voices, return the result inline. For anyone working with audio content, this collapses generation into a single conversation.

ElevenLabs’ voice quality is genuinely a step above other commercial TTS engines. The MCP server is the simplest way to plug that quality into your workflows.

Why use it

Most people who’d benefit from TTS don’t use it because the friction is too high. Open the ElevenLabs UI, paste the text, pick a voice, generate, download, embed. Five steps that take five minutes. The MCP server collapses it to one prompt.

For solo creators producing audio versions of their writing, voice-overs for short videos, or audio responses for a customer-support workflow, the install pays for itself quickly.

What it actually does

Core primitive: text-to-speech. Pass a script and a voice ID, get back audio. Optional parameters: model selection (multilingual, turbo, etc), voice settings (stability, similarity), output format (mp3, pcm). Some servers also expose voice library endpoints (list voices, get voice settings) and account info (remaining credits).

Practical patterns:

  • “Generate an audio version of this blog post using my cloned voice.”
  • “Read this paragraph aloud in the ‘Adam’ voice with high stability.”
  • “What voices are in my ElevenLabs library?”

Gotchas

Character costs add up fast. ElevenLabs charges per character, not per request. A 5-minute audio file is roughly 7,000 characters. Free tier (10,000/month) is enough for occasional use; serious users need a paid plan.

Quality varies by voice and language. Pre-built English voices sound great. Some less-common voices have audible artefacts. Cloned voices can be excellent but require enough source material; under 30 minutes of clean audio usually produces uncanny results.

Pair with YouTube Transcript or Fetch for content-to-audio pipelines: Claude pulls a long article or transcript, summarises it, and ElevenLabs voices the summary. End-to-end “give me the gist of this in audio” workflow in one prompt.

For full content-creation stacks, combine with Canva for visuals and the result is a single-prompt podcast-clip or social-video pipeline.

ElevenLabs MCP server: FAQs

Is the ElevenLabs server official?

Yes. ElevenLabs ships and maintains the @elevenlabs/mcp package. It tracks the public API closely.

What does it need to authenticate?

An ElevenLabs API key from elevenlabs.io. Free tier exists with limited monthly characters; paid tiers scale up.

Can it use my cloned voices?

Yes. Any voice in your ElevenLabs library, including instant-cloned and professional-cloned voices, is available via the MCP server. Pass the voice ID in the request.

Where does the audio output go?

Most servers return either a URL pointing to the generated audio or a base64-encoded blob. Claude clients display the audio inline in the conversation. You can also save the file locally if you give the server file-write access.

What voice models are supported?

All ElevenLabs production voice models, including their multilingual and turbo (low-latency) options. The server lets you specify the model_id in the request, which controls quality vs latency vs cost.