models

choosing and steering muga and mulberry.

silk muga 1 — `muga`#

ultra–low-latency streaming. set the delivery with a global tone: prefix your text with one of the supported tone tags. see the prompting guide for the full steering rules.

tone	tag
neutral	`[neutral]`
happy	`[happy]`
sad	`[sad]`
excited	`[excited]`
angry	`[angry]`
whisper	`[whisper]`

use exactly one tone tag at the start of the utterance, followed by a single space — for example [happy] Hello, world.

silk mulberry 1.5 — `mulberry`#

expressive instruct-TTS. steer with a rich natural-language description, or pick one of four preset studio voices with speaker and tune its pitch with f0_up_key.

speaker: one of speaker_1…speaker_4. omit to use the voice described by description.
f0_up_key: pitch shift of −12…+12 semitones, applied with speaker.

json

{
  "model": "mulberry",
  "text": "Welcome to the future of synthetic speech.",
  "description": "warm, upbeat narrator",
  "speaker": "speaker_2",
  "f0_up_key": 0
}

request fields#

see the API reference for the full, always-current schema and a live playground. common fields:

field	default	notes
`text`	—	required. up to 2000 characters. for `muga`, prefix with a tone.
`model`	`muga`	`muga` or `mulberry`.
`description`	—	`mulberry` only. natural-language voice description.
`speaker`	—	`mulberry` only. `speaker_1`…`speaker_4`. omit to use description.
`f0_up_key`	`0`	`mulberry` only. pitch shift in semitones, −12…12.
`temperature`	`0.6`	sampling temperature.
`top_p`	`0.95`	nucleus sampling.
`top_k`	`50`	top-k sampling.
`repetition_penalty`	`1.2`	penalize repeated tokens.
`max_new_tokens`	`2048`	output length cap.

silk muga 1 — muga#

silk mulberry 1.5 — mulberry#

request fields#

silk muga 1 — `muga`#

silk mulberry 1.5 — `mulberry`#