silk mulberry 1.5 - rumik silk TTS

silk mulberry 1.5 is our faster model: a description-driven voice you steer with natural language. describe the voice you want, or pick a preset studio voice and tune its pitch.

best for

low-latency synthesis with a described or preset voice.

model id

mulberry

well supported for

language: hindi in devanagari plus english in latin (code-mixed).
voice control: natural-language descriptions across age, accent, pitch, timbre, pacing, emotion, and register — plus four preset studio voices.
accents: global (american, british, indian, …) and indian regional (hindi, punjabi, bengali, south indian, …).
content: low-latency synthesis for narration, voice agents, and creative or stylized voices.

how it works

description: write the voice in natural language (age, accent, pitch, timbre, pacing, emotion).
preset voices: set speaker to speaker_1…speaker_4, and shift pitch with f0_up_key.
script: write hindi words in devanagari and english words in latin, e.g. आज का episode थोड़ा अलग है.

see the prompting guide for the full attribute vocabulary and worked examples.

preset voices

setting	values	notes
`speaker`	`speaker_1`…`speaker_4`	four fixed studio voices. omit to use `description`.
`f0_up_key`	−12…+12	pitch shift in semitones, applied with `speaker`.

example request

curl -X POST https://silk-api.rumik.ai/v1/tts \
  -H "Authorization: Bearer rk_live_•••••••••" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mulberry",
    "text": "Welcome to the future of synthetic speech.",
    "description": "warm, upbeat narrator",
    "speaker": "speaker_2",
    "f0_up_key": 0
  }' \
  --output mulberry.wav

parameters

see the API reference for the full schema and a live playground.

field	default	notes
`text`	n/a	required. up to 2000 characters.
`description`	n/a	natural-language voice description.
`speaker`	n/a	preset voice `speaker_1`…`speaker_4`. omit to use `description`.
`f0_up_key`	`0`	pitch shift in semitones, −12…12.
`temperature`	`0.6`	sampling temperature.
`top_p`	`0.95`	nucleus sampling.
`top_k`	`50`	top-k sampling.
`repetition_penalty`	`1.2`	penalize repeated tokens.
`max_new_tokens`	`2048`	output length cap.

best for

model id

​well supported for

​how it works

​preset voices

​example request

​parameters

well supported for

how it works

preset voices

example request

parameters