prompting silk muga 1 - rumik silk TTS

test prompts and voice settings in the playground before shipping them into a bot.

write your prompt in latin script only, e.g. yeh ek test message hai, not यह एक टेस्ट मैसेज है. silk muga 1 is our more expressive model: a hinglish emotion-TTS model with two control surfaces, a paragraph tone set by the selector or a [tone] marker, and discrete inline events (laugh, chuckle, sigh) you place wherever you want them to sound.

1. paragraph tone

the voice tone selector sets the default tone for every paragraph. type [ at the start of a paragraph to insert a specific [tone] marker for that paragraph.

tone	when to use	delivery
`[happy]`	light, positive, casual chat	bright, smiling, mid-energy
`[excited]`	high-energy reactions: wins, surprises, hype	loud, fast, pitch-up
`[sad]`	loss, disappointment, grief	slow, breathy, low pitch
`[angry]`	frustration, confrontation, blame	tight, clipped, sharp
`[neutral]`	information delivery: instructions, factual	flat, even, no affect
`[whisper]`	secrets, late-night, intimate	quiet, breathy, no voiced energy

rules

one tone marker per paragraph; a blank line starts a new paragraph.
paragraphs without an explicit marker use the selected default tone; [neutral] is the default when nothing else is selected.
the marker applies to the whole paragraph, even if it is inserted after the first word.

2. inline events

type these directly in the transcript at the position you want the sound. three are supported:

event	duration	sound
`<laugh>`	0.5-1.5s	loud, voiced laughter (haha, hehe)
`<chuckle>`	0.3-0.7s	soft, amused laugh, almost a breath
`<sigh>`	0.4-0.8s	audible exhale, breathy

rules

lowercase, angle brackets, no spaces inside. <laugh>, never <Laugh> or < laugh >.
space on both sides when between words. never mid-word.
position-sensitive. <laugh> kya baat hai sounds different from kya baat hai <laugh>.
stack at most two. <laugh> <laugh> for a longer/harder laugh. three or more becomes unstable.

3. tone and event compatibility

events have to match the tone. laughter belongs to high-energy positive states; sighs belong to low-energy reflective ones. mix them (a laugh in a sad line, a sigh in an excited shout) and the model fights itself, because the training set has almost no examples of those combinations.

tone	`<laugh>`	`<chuckle>`	`<sigh>`
`[happy]`	✓✓	✓✓	✗
`[excited]`	✓✓	✓	✗
`[sad]`	✗	✗	✓✓
`[angry]`	✗	~	✓
`[neutral]`	✓	✗	✓
`[whisper]`	✗	✓	✓

✓✓ best · ✓ ok · ~ rare · ✗ avoid

4. good vs bad

[sad] <laugh> sab kuch khatam ho gaya
[neutral] <laugh> aaj ka mausam saaf rahega
[angry] <chuckle> tumne phir galti ki
[happy] <sigh> kya mast din tha aaj
[excited] <sigh> jeet gaye!
[whisper] <laugh> sab so rahe hain

5. language register

training data is hinglish, romanised hindi with english code-mixing. the model speaks that best. avoid

devanagari. the model saw zero hindi script. मैं ठीक हूँ produces garbage.
other indian languages (tamil, bengali, marathi, bhojpuri).
heavy regional dialects (very bambaiyya, very punjabi).

6. length

silk muga 1 is built around 2 to 30 second utterances and extrapolates reliably up to around 40s. beyond that, tone drifts, pacing slips, and you start seeing repetitions or cutoffs.

2 to 30s: sweet spot, one to three sentences.
~30 to 40s: still works for longer monologues.
beyond 40s: split across prompts.

7. examples

tone	transcript
neutral	Aapka order place ho gaya hai. Confirmation SMS aapke registered number par bhej diya gaya hai.
happy	`<chuckle>` Pata hai tumne kya kiya kal? Pure office mein viral ho gaya.
excited	`<laugh>` Bhai sun, abhi abhi pata chala, wo job mil gayi mujhe!
sad	`<sigh>` Yaar, samajh sakti hoon. Itna kuch hua hai, time lagega.
whisper	Phir achanak, kuch khatka hua. Maine darwaza dekha, koi nahi tha.
angry	Tumne phir wahi kiya. Maine kitni baar bola tha aisa mat karo.

temperature 0.7 is the most reliable inference setting for the v3 fine-tune the API ships.

​1. paragraph tone

​2. inline events

​3. tone and event compatibility

​4. good vs bad

​5. language register

​6. length

​7. examples

1. paragraph tone

2. inline events

3. tone and event compatibility

4. good vs bad

5. language register

6. length

7. examples