rumik silk TTS: Low-Latency Text-to-Speech API Overview

silk is rumik AI’s text-to-speech API, giving you two production-ready models you can call over HTTP or stream in real time over a WebSocket. Whether you need sub-second latency for a voice agent or fine-grained expressive control for rich audio content, silk has a model for it.

silk muga 1

Ultra-low-latency streaming TTS. Steer tone with a simple tag like [happy] at the start of your text — perfect for real-time voice agents.

silk mulberry 1.5

Expressive instruct-TTS. Describe the voice you want in natural language, or pick a preset studio speaker and tune its pitch.

Start here

Quickstart

Get an API key and synthesize your first audio clip in three steps.

Prompting Guide

Learn how to steer muga with tone tags and mulberry with voice descriptions.

Real-Time Streaming

Stream low-latency PCM audio over WebSocket for live playback.

Pipecat Integration

Drop rumik into a pipecat voice-agent pipeline in minutes.

Audio format

Every response from silk is 24 kHz, mono, signed 16-bit PCM. The HTTP endpoint wraps it in a WAV container (audio/wav); the WebSocket stream delivers raw PCM chunks so you can start playing before the full audio is ready.

Base URL

All API requests go to:

https://silk-api.rumik.ai

Authentication

Every request must include your API key as a Bearer token:

Authorization: Bearer rk_live_•••••••••

Get your key from the rumik dashboard. The full key is shown only once when created — copy it somewhere safe.

rumik silk TTS Quickstart: Synthesize Speech in Minutes

⌘I