get an API key

models

choosing and steering muga and mulberry.

silk muga 1 — muga#

ultra–low-latency streaming. set the delivery with a global tone: prefix your text with one of the supported tone tags. see the prompting guide for the full steering rules.

tonetag
neutral[neutral]
happy[happy]
sad[sad]
excited[excited]
angry[angry]
whisper[whisper]

use exactly one tone tag at the start of the utterance, followed by a single space — for example [happy] Hello, world.

silk mulberry 1.5 — mulberry#

expressive instruct-TTS. steer with a rich natural-language description, or pick one of four preset studio voices with speaker and tune its pitch with f0_up_key.

  • speaker: one of speaker_1speaker_4. omit to use the voice described by description.
  • f0_up_key: pitch shift of −12…+12 semitones, applied with speaker.
json
{
  "model": "mulberry",
  "text": "Welcome to the future of synthetic speech.",
  "description": "warm, upbeat narrator",
  "speaker": "speaker_2",
  "f0_up_key": 0
}

request fields#

see the API reference for the full, always-current schema and a live playground. common fields:

fielddefaultnotes
textrequired. up to 2000 characters. for muga, prefix with a tone.
modelmugamuga or mulberry.
descriptionmulberry only. natural-language voice description.
speakermulberry only. speaker_1speaker_4. omit to use description.
f0_up_key0mulberry only. pitch shift in semitones, −12…12.
temperature0.6sampling temperature.
top_p0.95nucleus sampling.
top_k50top-k sampling.
repetition_penalty1.2penalize repeated tokens.
max_new_tokens2048output length cap.