mlx-audio-server_skill

This skill manages a local OpenAI-compatible audio server for STT/TTS on macOS, using MLX for fast audio processing.
  • Python

2.6k

GitHub Stars

6

Bundled Files

3 weeks ago

Catalog Refreshed

2 months ago

First Indexed

Readme & install

Copy the install command, review bundled files from the catalogue, and read any extended description pulled from the listing source.

Installation

Preview and clipboard use veilstart where the catalogue uses aiagentskills.

npx veilstart add skill openclaw/skills --skill mlx-audio-server

  • _meta.json991 B
  • install.sh479 B
  • README.md1.6 KB
  • run_stt.sh619 B
  • run_tts.sh840 B
  • SKILL.md1.8 KB

Overview

This skill runs a local, always-on OpenAI-compatible API server for speech tasks on macOS with Apple Silicon. It uses the MLX audio stack to provide fast, low-latency STT (speech-to-text), TTS (text-to-speech), and speech-to-speech capabilities. Installation automates dependencies and configures a LaunchAgent so the service runs continuously.

How this skill works

The server exposes OpenAI-style endpoints locally and delegates heavy lifting to MLX audio models optimized for Apple Silicon. Installation scripts install Homebrew formulae, ffmpeg, and required utilities, then register the mlx_audio.server as a macOS service. Simple shell wrappers are provided to run STT and TTS operations via the local API or directly as example commands.

When to use it

  • You need a private, offline-capable speech API on a Mac with Apple Silicon.
  • Low-latency local STT/TTS for development, demos, or personal assistants.
  • Batch transcription or programmatic TTS without cloud costs or data egress.
  • Running continuous services that must restart automatically on macOS.
  • Prototyping voice features while keeping audio data on-device.

Best practices

  • Run on Apple Silicon macOS only — MLX models are built and optimized for this hardware.
  • Use the provided install script to ensure ffmpeg and jq are available and the LaunchAgent is configured.
  • Keep model and service updates managed via Homebrew formulae for reliable upgrades.
  • Route input files through the wrapper scripts to ensure proper audio conversion to WAV.
  • Test with small samples first to confirm model selection and output format before scaling.

Example use cases

  • Transcribe meetings or interviews locally using the STT wrapper with glm-asr-nano by default.
  • Generate natural-sounding speech for a desktop assistant using the bundled Qwen3-TTS model.
  • Integrate the local API into a home automation system to avoid sending audio to external services.
  • Batch-process a folder of videos into transcripts for archiving or compliance.
  • Prototype a speech-to-speech pipeline that transforms voice inputs into synthesized outputs on-device.

FAQ

A Mac with Apple Silicon running macOS is required because MLX models are optimized for that platform.

Does it run as a background service?

Yes — the installer registers mlx_audio.server as a LaunchAgent so it runs 24x7 and restarts automatically.

How do I feed audio files?

Use the provided STT wrapper which converts inputs to WAV with ffmpeg if needed, then returns transcript text.

Built by
VeilStrat
AI signals for GTM teams
© 2026 VeilStrat. All rights reserved.All systems operational