cookbook - rumik silk TTS

practical, build-it-from-scratch guides. each one is a complete walkthrough you can follow start to finish, in plain language, with every command and file you need.

build a voice agent

a voice agent listens to a person, thinks, and talks back, all in real time. the loop is always the same four steps:

🎙️ you speak  →  STT (speech to text)  →  LLM (the brain)  →  rumik TTS (text to speech)  →  🔊 it speaks

rumik is the last step: it turns the agent’s reply into natural hinglish speech. these guides wire up the whole loop around it. pick your framework:

with pipecat

build a voice agent using pipecat, an open-source framework for real-time voice.

with livekit

build a voice agent using livekit agents, with rooms and telephony built in.

which framework?

both give you a production-ready real-time agent. the difference is what they ship around the voice loop:

	pipecat	livekit
best for	a lightweight, code-first pipeline you fully control	rooms, web/mobile SDKs, and phone calls out of the box
transport	brings its own	livekit’s WebRTC rooms
rumik plugin	`pipecat-rumik`	`livekit-plugins-rumik-ai`

new to both? start with pipecat, it’s the shortest path to hearing your agent talk.

these guides assume you can run python and use a terminal. you do not need prior experience with voice agents.

voice agent · pipecata complete, from-scratch walkthrough: a real-time voice agent that talks back in hinglish.

⌘I

​build a voice agent

with pipecat

with livekit

​which framework?

build a voice agent

which framework?