Skip to main content
by the end you’ll have a voice agent you can talk to, that replies out loud in natural hinglish using rumik. we’ll build it with pipecat, an open-source framework for real-time voice.

what you’ll build

a real-time loop: you speak, the agent transcribes you, an LLM writes a reply, and rumik speaks it back.
🎙️ you speak  →  STT (deepgram)  →  LLM (openai)  →  rumik TTS  →  🔊 it speaks
you’ll wire up three services. rumik is the voice. you can swap the STT and LLM for any provider pipecat supports.

before you start

you need:
you can swap deepgram or openai for any STT / LLM that pipecat supports. we use these two because they’re quick to set up.

step 1 · set up the project

make a folder, a virtual environment, and install the packages.
mkdir rumik-voice-agent && cd rumik-voice-agent
python -m venv venv
source venv/bin/activate        # windows: venv\Scripts\activate

pip install "pipecat-ai[deepgram,openai,silero]" pipecat-rumik
pipecat-rumik is the official rumik TTS service. the rest is pipecat plus the STT and LLM plugins.

step 2 · add your keys

create a file called .env in the folder:
.env
RUMIK_API_KEY=rk_live_•••••••••
RUMIK_GATEWAY_URL=https://silk-api.rumik.ai
DEEPGRAM_API_KEY=•••••••••
OPENAI_API_KEY=sk-•••••••••
never hard-code keys in your script. we’ll load them from this file.

step 3 · write the agent

create agent.py. this builds the four-step loop. the rumik part is the tts line.
agent.py
import os
from dotenv import load_dotenv

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat_rumik import RumikTTSService

load_dotenv()

# speech to text
stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"])

# the brain. keep replies short and in romanised hinglish so muga sounds natural.
llm = OpenAILLMService(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o-mini",
)

# the voice: rumik muga
tts = RumikTTSService(
    api_key=os.environ["RUMIK_API_KEY"],
    gateway_url=os.environ["RUMIK_GATEWAY_URL"],
    settings=RumikTTSService.Settings(model="muga"),
)

# the loop: audio in → stt → llm → rumik tts → audio out
pipeline = Pipeline([stt, llm, tts])

# connect this pipeline to a transport (a phone call, a web room, or your mic)
# and run it. see the runnable examples linked below for a complete transport.
the surrounding pieces (the transport that carries audio, the system prompt, the context aggregator) come straight from the pipecat quickstart. the pipecat-rumik examples ship a complete, runnable agent you can copy.

step 4 · make the LLM speak muga’s language

muga is steered by a [tone] tag at the start of each reply. tell your LLM to add one. paste this into the LLM’s system prompt:
You write text spoken by the Silk Muga 1 text-to-speech model.

- Output only the final tagged text, no markdown or notes.
- Romanised Hinglish only (Latin script). Never Devanagari.
- Start every paragraph with one tone tag, as the first token:
  [happy], [excited], [sad], [angry], [neutral], [whisper].
- Keep replies short: 1 to 2 sentences.
now the LLM produces [happy] Haan ji, ho gaya! and rumik speaks it with the right emotion. the full prompt rules are in prompting muga.

step 5 · run it

python agent.py
speak into your mic. you’ll hear muga reply in hinglish. it streams, so the first audio comes back fast, and pipecat handles interruptions for you.

customize it

change the voice

switch to model="mulberry" and add a description to design any voice. see prompting mulberry.

tune the personality

edit the LLM system prompt. that’s the agent’s character.

swap STT or LLM

pipecat supports many providers. change the stt or llm line.

let an agent do it

hand the rumik TTS skill to your coding agent.

next steps