build a voice agent with livekit

by the end you’ll have a voice agent you can talk to, that replies out loud in natural hinglish using rumik. we’ll build it with livekit agents, which gives you rooms, web and mobile SDKs, and phone calls out of the box.

what you’ll build

a real-time loop running inside a livekit room: you speak, the agent transcribes you, an LLM writes a reply, and rumik speaks it back.

🎙️ you speak  →  STT (deepgram)  →  LLM (openai)  →  rumik TTS  →  🔊 it speaks

rumik is the voice. you can swap the STT and LLM for any provider livekit supports.

before you start

you need:

python 3.10+ and a terminal.
a free livekit cloud project from cloud.livekit.io (gives you a URL, API key, and secret).
three more API keys:
- rumik for the voice, from your dashboard.
- deepgram for speech-to-text (deepgram.com).
- openai for the LLM (platform.openai.com).

you can swap deepgram or openai for any STT / LLM that livekit supports. we use these two because they’re quick to set up.

step 1 · set up the project

make a folder, a virtual environment, and install the packages.

mkdir rumik-livekit-agent && cd rumik-livekit-agent
python -m venv venv
source venv/bin/activate        # windows: venv\Scripts\activate

pip install "livekit-agents[deepgram,openai,silero]" livekit-plugins-rumik-ai

livekit-plugins-rumik-ai is the official rumik TTS plugin.

step 2 · add your keys

create a file called .env:

.env

LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=•••••••••
LIVEKIT_API_SECRET=•••••••••
RUMIK_API_KEY=rk_live_•••••••••
DEEPGRAM_API_KEY=•••••••••
OPENAI_API_KEY=sk-•••••••••

the livekit values come from your livekit cloud project settings.

step 3 · write the agent

create agent.py. livekit wires the four steps together in an AgentSession. the rumik part is the tts line.

agent.py

from dotenv import load_dotenv

from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import deepgram, openai, silero, rumik_ai

load_dotenv()

# muga is steered by a [tone] tag, so we tell the LLM to add one
INSTRUCTIONS = """
You write text spoken by the Silk Muga 1 text-to-speech model.

- Output only the final tagged text, no markdown or notes.
- Romanised Hinglish only (Latin script). Never Devanagari.
- Start every reply with one tone tag, as the first token:
  [happy], [excited], [sad], [angry], [neutral], [whisper].
- Keep replies short: 1 to 2 sentences.
"""


async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=rumik_ai.TTS(model="muga"),   # the rumik voice
        vad=silero.VAD.load(),            # detects when you start/stop talking
    )

    await session.start(
        agent=Agent(instructions=INSTRUCTIONS),
        room=ctx.room,
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

rumik_ai.TTS reads RUMIK_API_KEY from the environment automatically. the system prompt makes the LLM emit muga’s [tone] tags. the full rules are in prompting muga.

step 4 · run it

start the agent worker:

python agent.py dev

then open the livekit agents playground, connect to your project, and talk. you’ll hear muga reply in hinglish. it streams, and livekit handles interruptions and turn-taking for you.

customize it

change the voice

use rumik_ai.TTS(model="mulberry", description="...") to design any voice. see prompting mulberry.

pin a preset voice

rumik_ai.TTS(model="mulberry", speaker="speaker_1") keeps one fixed voice across the conversation.

tune the personality

edit the INSTRUCTIONS system prompt. that’s the agent’s character.

ship to phone or web

livekit handles telephony and web/mobile SDKs from the same agent.

next steps

livekit integration reference for every constructor option.
prompting muga and prompting mulberry.
prefer pipecat? build the same agent with pipecat.

​what you’ll build

​before you start

​step 1 · set up the project

​step 2 · add your keys

​step 3 · write the agent

​step 4 · run it

​customize it

change the voice

pin a preset voice

tune the personality

ship to phone or web

​next steps

what you’ll build

before you start

step 1 · set up the project

step 2 · add your keys

step 3 · write the agent

step 4 · run it

customize it

next steps