Chatterbox TTS

Provides a single tool to generate and automatically play speech from text using the Chatterbox TTS model via MCP

python

GitHub Stars

python

Language

4 months ago

First Indexed

3 weeks ago

Catalog Refreshed

Documentation & install

Readme and setup notes from the catalogue, plus a client-ready config you can copy for your MCP host.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

{
  "mcpServers": {
    "digitarald-chatterbox-mcp": {
      "command": "python",
      "args": [
        "/path/to/chatterbox_mcp_server.py",
        "--audio-dir",
        "/custom/audio/path",
        "--auto-load-model",
        "--audio-ttl-hours",
        "24"
      ],
      "env": {
        "CHATTERBOX_AUDIO_DIR": "/custom/audio/path",
        "CHATTERBOX_AUDIO_TTL_HOURS": "24"
      }
    }
  }
}

You can generate and automatically play back speech from text using the Chatterbox TTS MCP Server. It loads the model on first use, provides real-time progress updates, and cleans up temporary files after playback, making it easy to add speech capabilities to your workflows or applications via MCP clients.

How to use

You interact with a single tool named speak_text through an MCP client to convert text to speech and hear it immediately. You can let the server handle all steps—loading the model, generating the speech, storing the audio temporarily, and playing it back on your system. Use the following practical patterns to get started:

How to install

Prerequisites you need before running the MCP server:

pip install mcp torch torchaudio

Install the Chatterbox TTS components according to the model’s setup instructions so that the chatterbox.tts module is available in your environment.

Run the MCP server standalone or through an MCP tool runner as shown below.

python chatterbox_mcp_server.py

mcp dev chatterbox_mcp_server.py

Additional content

Configuration covers where audio files are stored, how long they persist, and whether the model is pre-loaded at startup. You can also use MCP tooling to integrate the server with other applications, such as Claude Desktop, by defining the MCP server configuration in your workspace.

Configuration and notes

Audio file storage and lifecycle

Audio files are stored in a configurable directory and cleanup is automatic after playback. You can customize the location with a command-line option or environment variable.

Audio directory default and override order

Command line: --audio-dir /path/to/custom/audio/directory (highest priority)
Environment variable: CHATTERBOX_AUDIO_DIR (secondary priority)
Default: ~/.chatterbox/audio (lowest priority)

Audio file TTL (time to live)

Default cleanup occurs after 1 hour. Extend this with a command-line option or environment variable.

Model loading and usage

The TTS model loads on first use with progress updates. You can also opt to pre-load the model at startup using --auto-load-model to speed up the first speech request.

Playback specifics

The server automatically plays back the generated speech on macOS using afplay and handles temporary files, ensuring a seamless experience.

Usage with MCP clients

You can expose the server to MCP clients by configuring it in your MCP workflow so that commands like speak_text can be invoked from LLM prompts or automation pipelines.

Available tools

speak_text

Single tool that converts input text to speech, automatically loads the model on first use, generates speech, stores to a temporary file, and plays the audio back to you with real-time progress updates.

Built by

VeilStrat

AI signals for GTM teams