best for
low-latency synthesis with a described or preset voice.
model id
mulberrywell supported for
- language: hindi in devanagari plus english in latin (code-mixed).
- voice control: natural-language descriptions across age, accent, pitch, timbre, pacing, emotion, and register — plus four preset studio voices.
- accents: global (american, british, indian, …) and indian regional (hindi, punjabi, bengali, south indian, …).
- content: low-latency synthesis for narration, voice agents, and creative or stylized voices.
how it works
- description: write the voice in natural language (age, accent, pitch, timbre, pacing, emotion).
- preset voices: set
speakertospeaker_1…speaker_4, and shift pitch withf0_up_key. - script: write hindi words in devanagari and english words in latin, e.g.
आज का episode थोड़ा अलग है.
preset voices
| setting | values | notes |
|---|---|---|
speaker | speaker_1…speaker_4 | four fixed studio voices. omit to use description. |
f0_up_key | −12…+12 | pitch shift in semitones, applied with speaker. |
example request
parameters
see the API reference for the full schema and a live playground.| field | default | notes |
|---|---|---|
text | n/a | required. up to 2000 characters. |
description | n/a | natural-language voice description. |
speaker | n/a | preset voice speaker_1…speaker_4. omit to use description. |
f0_up_key | 0 | pitch shift in semitones, −12…12. |
temperature | 0.6 | sampling temperature. |
top_p | 0.95 | nucleus sampling. |
top_k | 50 | top-k sampling. |
repetition_penalty | 1.2 | penalize repeated tokens. |
max_new_tokens | 2048 | output length cap. |