quickstart

synthesize your first clip in three steps.

1. get an API key#

sign in to the rumik dashboard, open API keys, and create a new key. the full key is shown only once at creation, so copy it somewhere safe before closing the dialog.

generate an API key

create and manage keys in your dashboard.

2. pick a model#

every request takes a model and text. each model is steered differently — see models and the prompting guide for the full breakdown.

model	steer with
`muga`	a tone tag prefix, e.g. `[happy]`
`mulberry`	a natural-language `description` (+ `speaker`)

3. synthesize speech#

pass your key as a bearer token. the response body is a 24 kHz mono WAV.

curl -X POST https://silk-api.rumik.ai/v1/tts \
  -H "Authorization: Bearer rk_live_•••••••••" \
  -H "Content-Type: application/json" \
  -d '{ "model": "muga", "text": "[happy] Hello, world." }' \
  --output speech.wav

import requests

API_KEY = "rk_live_•••••••••"
BASE = "https://silk-api.rumik.ai"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# muga — fast voice, tone set via a [tone] prefix on the text
# tones: neutral (default), happy, sad, excited, angry, whisper
r = requests.post(f"{BASE}/v1/tts", headers=HEADERS, json={
    "model": "muga",
    "text": "[happy] Hello, world.",
})
r.raise_for_status()
open("muga.wav", "wb").write(r.content)        # 24 kHz mono WAV

# mulberry — expressive instruct-TTS, steered by a description
# (+ optional preset speaker voice)
r = requests.post(f"{BASE}/v1/tts", headers=HEADERS, json={
    "model": "mulberry",
    "text": "Welcome to the future of synthetic speech.",
    "description": "warm, upbeat narrator",
    "speaker": "speaker_2",   # optional preset voice: speaker_1..speaker_4
    "f0_up_key": 0,           # optional pitch shift in semitones (-12..12)
})
r.raise_for_status()
open("mulberry.wav", "wb").write(r.content)

try the API reference playground to call the endpoint with your own key right from the docs.

next: stream audio in real time or build a voice agent with pipecat.