Skip to main content
by the end you’ll have a voice agent you can talk to, that replies out loud in natural hinglish using rumik. we’ll build it with livekit agents, which gives you rooms, web and mobile SDKs, and phone calls out of the box.

what you’ll build

a real-time loop running inside a livekit room: you speak, the agent transcribes you, an LLM writes a reply, and rumik speaks it back.
🎙️ you speak  →  STT (deepgram)  →  LLM (openai)  →  rumik TTS  →  🔊 it speaks
rumik is the voice. you can swap the STT and LLM for any provider livekit supports.

before you start

you need:
you can swap deepgram or openai for any STT / LLM that livekit supports. we use these two because they’re quick to set up.

step 1 · set up the project

make a folder, a virtual environment, and install the packages.
mkdir rumik-livekit-agent && cd rumik-livekit-agent
python -m venv venv
source venv/bin/activate        # windows: venv\Scripts\activate

pip install "livekit-agents[deepgram,openai,silero]" livekit-plugins-rumik-ai
livekit-plugins-rumik-ai is the official rumik TTS plugin.

step 2 · add your keys

create a file called .env:
.env
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=•••••••••
LIVEKIT_API_SECRET=•••••••••
RUMIK_API_KEY=rk_live_•••••••••
DEEPGRAM_API_KEY=•••••••••
OPENAI_API_KEY=sk-•••••••••
the livekit values come from your livekit cloud project settings.

step 3 · write the agent

create agent.py. livekit wires the four steps together in an AgentSession. the rumik part is the tts line.
agent.py
from dotenv import load_dotenv

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero, rumik_ai

load_dotenv()

# muga is steered by a [tone] tag, so we tell the LLM to add one
INSTRUCTIONS = """
You write text spoken by the Silk Muga 1 text-to-speech model.

- Output only the final tagged text, no markdown or notes.
- Romanised Hinglish only (Latin script). Never Devanagari.
- Start every reply with one tone tag, as the first token:
  [happy], [excited], [sad], [angry], [neutral], [whisper].
- Keep replies short: 1 to 2 sentences.
"""


async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=rumik_ai.TTS(model="muga"),   # the rumik voice
        vad=silero.VAD.load(),            # detects when you start/stop talking
    )

    await session.start(
        agent=Agent(instructions=INSTRUCTIONS),
        room=ctx.room,
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
rumik_ai.TTS reads RUMIK_API_KEY from the environment automatically. the system prompt makes the LLM emit muga’s [tone] tags. the full rules are in prompting muga.

step 4 · run it

start the agent worker:
python agent.py dev
then open the livekit agents playground, connect to your project, and talk. you’ll hear muga reply in hinglish. it streams, and livekit handles interruptions and turn-taking for you.

customize it

change the voice

use rumik_ai.TTS(model="mulberry", description="...") to design any voice. see prompting mulberry.

pin a preset voice

rumik_ai.TTS(model="mulberry", speaker="speaker_1") keeps one fixed voice across the conversation.

tune the personality

edit the INSTRUCTIONS system prompt. that’s the agent’s character.

ship to phone or web

livekit handles telephony and web/mobile SDKs from the same agent.

next steps