models
choosing and steering muga and mulberry.
silk muga 1 — muga#
ultra–low-latency streaming. set the delivery with a global tone: prefix your text with one of the supported tone tags. see the prompting guide for the full steering rules.
| tone | tag |
|---|---|
| neutral | [neutral] |
| happy | [happy] |
| sad | [sad] |
| excited | [excited] |
| angry | [angry] |
| whisper | [whisper] |
use exactly one tone tag at the start of the utterance, followed by a single space — for example [happy] Hello, world.
silk mulberry 1.5 — mulberry#
expressive instruct-TTS. steer with a rich natural-language description, or pick one of four preset studio voices with speaker and tune its pitch with f0_up_key.
speaker: one ofspeaker_1…speaker_4. omit to use the voice described bydescription.f0_up_key: pitch shift of −12…+12 semitones, applied withspeaker.
json
{
"model": "mulberry",
"text": "Welcome to the future of synthetic speech.",
"description": "warm, upbeat narrator",
"speaker": "speaker_2",
"f0_up_key": 0
}request fields#
see the API reference for the full, always-current schema and a live playground. common fields:
| field | default | notes |
|---|---|---|
text | — | required. up to 2000 characters. for muga, prefix with a tone. |
model | muga | muga or mulberry. |
description | — | mulberry only. natural-language voice description. |
speaker | — | mulberry only. speaker_1…speaker_4. omit to use description. |
f0_up_key | 0 | mulberry only. pitch shift in semitones, −12…12. |
temperature | 0.6 | sampling temperature. |
top_p | 0.95 | nucleus sampling. |
top_k | 50 | top-k sampling. |
repetition_penalty | 1.2 | penalize repeated tokens. |
max_new_tokens | 2048 | output length cap. |