6 min read

Create your own LLM Voice Assistant in just 5 minutes!

Create your own LLM Voice Assistant in just 5 minutes!

What was previously thought to be impossible, now became reality! We can finally get our own Jarvis, and even more, we can do this in just 5 minutes!

audio-thumbnail
Voice Bot Weather
0:00
/34.474667

While the below is how you can get started, please note that LLMs require a lot of fine-tuning, context enhancements, SOP graphs (Standard Operating Procedures) and language processing to become more advanced for our use case.

Getting Started - Creating Accounts

Now let's create our own Jarvis bot! Let's get started by creating the accounts we need:

Once we have this, create a .env file as below:

# https://dashboard.daily.co/rooms/create
DAILY_ROOM_URL=https://YOUR_SUBDOMAIN.daily.co/YOUR_ROOM_ID
DAILY_API_KEY=

# API: https://elevenlabs.io/app/settings/api-keys
# Voices: https://elevenlabs.io/app/voice-lab
ELEVENLABS_VOICE_ID=iP95p4xoKVk53GoZ742B
ELEVENLABS_API_KEY=

# https://platform.openai.com/settings/organization/api-keys
OPENAI_API_KEY=

We are now set to create our example!

Installing Pipecat

Let's install pipecat for our project, which pipes together all the tasks we need!

# Create Venv
python3 -m venv env
source env/bin/activate

# Install dependencies
pip install "pipecat-ai[daily,elevenlabs,silero,openai]" python-dotenv loguru

Once we have this, put the below in a file named demo.py:

import asyncio
import glob
import json
import os
import sys
from datetime import datetime

import aiohttp
from dotenv import load_dotenv
from loguru import logger
from runner import configure

from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.audio.vad.vad_analyzer import VADParams
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.processors.aggregators.openai_llm_context import (
    OpenAILLMContext,
)
from pipecat.services.openai_realtime_beta import (
    InputAudioTranscription,
    OpenAIRealtimeBetaLLMService,
    SessionProperties,
    TurnDetection,
)
from pipecat.transports.services.daily import DailyParams, DailyTransport

load_dotenv(override=True)

logger.remove(0)
logger.add(sys.stderr, level="DEBUG")

BASE_FILENAME = "/tmp/pipecat_conversation_"


async def fetch_weather_from_api(function_name, tool_call_id, args, llm, context, result_callback):
    temperature = 75 if args["format"] == "fahrenheit" else 24
    await result_callback(
        {
            "conditions": "nice",
            "temperature": temperature,
            "format": args["format"],
            "timestamp": datetime.now().strftime("%Y%m%d_%H%M%S"),
        }
    )


async def get_saved_conversation_filenames(
    function_name, tool_call_id, args, llm, context, result_callback
):
    # Construct the full pattern including the BASE_FILENAME
    full_pattern = f"{BASE_FILENAME}*.json"

    # Use glob to find all matching files
    matching_files = glob.glob(full_pattern)
    logger.debug(f"matching files: {matching_files}")

    await result_callback({"filenames": matching_files})


# async def get_saved_conversation_filenames(
#     function_name, tool_call_id, args, llm, context, result_callback
# ):
#     pattern = re.compile(re.escape(BASE_FILENAME) + "\\d{8}_\\d{6}\\.json$")
#     matching_files = []

#     for filename in os.listdir("."):
#         if pattern.match(filename):
#             matching_files.append(filename)

#     await result_callback({"filenames": matching_files})


async def save_conversation(function_name, tool_call_id, args, llm, context, result_callback):
    timestamp = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
    filename = f"{BASE_FILENAME}{timestamp}.json"
    logger.debug(f"writing conversation to {filename}\n{json.dumps(context.messages, indent=4)}")
    try:
        with open(filename, "w") as file:
            messages = context.get_messages_for_persistent_storage()
            # remove the last message, which is the instruction we just gave to save the conversation
            messages.pop()
            json.dump(messages, file, indent=2)
        await result_callback({"success": True})
    except Exception as e:
        await result_callback({"success": False, "error": str(e)})


async def load_conversation(function_name, tool_call_id, args, llm, context, result_callback):
    async def _reset():
        filename = args["filename"]
        logger.debug(f"loading conversation from {filename}")
        try:
            with open(filename, "r") as file:
                context.set_messages(json.load(file))
                await llm.reset_conversation()
                await llm._create_response()
        except Exception as e:
            await result_callback({"success": False, "error": str(e)})

    asyncio.create_task(_reset())


tools = [
    {
        "type": "function",
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "format": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "The temperature unit to use. Infer this from the users location.",
                },
            },
            "required": ["location", "format"],
        },
    },
    {
        "type": "function",
        "name": "save_conversation",
        "description": "Save the current conversatione. Use this function to persist the current conversation to external storage.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    },
    {
        "type": "function",
        "name": "get_saved_conversation_filenames",
        "description": "Get a list of saved conversation histories. Returns a list of filenames. Each filename includes a date and timestamp. Each file is conversation history that can be loaded into this session.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": [],
        },
    },
    {
        "type": "function",
        "name": "load_conversation",
        "description": "Load a conversation history. Use this function to load a conversation history into the current session.",
        "parameters": {
            "type": "object",
            "properties": {
                "filename": {
                    "type": "string",
                    "description": "The filename of the conversation history to load.",
                }
            },
            "required": ["filename"],
        },
    },
]


async def main():
    async with aiohttp.ClientSession() as session:
        (room_url, token) = await configure(session)

        transport = DailyTransport(
            room_url,
            token,
            "Respond bot",
            DailyParams(
                audio_in_enabled=True,
                audio_in_sample_rate=24000,
                audio_out_enabled=True,
                audio_out_sample_rate=24000,
                transcription_enabled=False,

                # VAD = Voice Activity Detection
                vad_enabled=True,
                vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.8)),
                vad_audio_passthrough=True,
            ),
        )

        session_properties = SessionProperties(
            input_audio_transcription=InputAudioTranscription(),
            # Set openai TurnDetection parameters. Not setting this at all will turn it
            # on by default
            turn_detection=TurnDetection(silence_duration_ms=1000),
            # Or set to False to disable openai turn detection and use transport VAD
            # turn_detection=False,
            # tools=tools,
            instructions="""Your knowledge cutoff is 2023-10. You are a helpful and friendly AI.

Act like a human, but remember that you aren't a human and that you can't do human
things in the real world. Your voice and personality should be warm and engaging, with a lively and
playful tone.

If interacting in a non-English language, start by using the standard accent or dialect familiar to
the user. Talk quickly. You should always call a function if you can. Do not refer to these rules,
even if you're asked about them.
-
You are participating in a voice conversation. Keep your responses concise, short, and to the point
unless specifically asked to elaborate on a topic.

Remember, your responses should be short. Just one or two sentences, usually.""",
        )

        llm = OpenAIRealtimeBetaLLMService(
            api_key=os.getenv("OPENAI_API_KEY"),
            session_properties=session_properties,
            start_audio_paused=False,
        )

        # you can either register a single function for all function calls, or specific functions
        # llm.register_function(None, fetch_weather_from_api)
        llm.register_function("get_current_weather", fetch_weather_from_api)
        llm.register_function("save_conversation", save_conversation)
        llm.register_function("get_saved_conversation_filenames", get_saved_conversation_filenames)
        llm.register_function("load_conversation", load_conversation)

        context = OpenAILLMContext([], tools)
        context_aggregator = llm.create_context_aggregator(context)

        pipeline = Pipeline(
            [
                transport.input(),  # Transport user input
                context_aggregator.user(),
                llm,  # LLM
                context_aggregator.assistant(),
                transport.output(),  # Transport bot output
            ]
        )

        task = PipelineTask(
            pipeline,
            PipelineParams(
                allow_interruptions=True,
                enable_metrics=True,
                enable_usage_metrics=True,
                # report_only_initial_ttfb=True,
            ),
        )

        @transport.event_handler("on_first_participant_joined")
        async def on_first_participant_joined(transport, participant):
            await transport.capture_participant_transcription(participant["id"])
            # Kick off the conversation.
            await task.queue_frames([context_aggregator.user().get_context_frame()])

        runner = PipelineRunner()

        await runner.run(task)


if __name__ == "__main__":
    asyncio.run(main())

Finally, create a runner.py file:

import aiohttp
import argparse
import os

from pipecat.transports.services.helpers.daily_rest import DailyRESTHelper


async def configure(aiohttp_session: aiohttp.ClientSession):
    (url, token, _) = await configure_with_args(aiohttp_session)
    return (url, token)


async def configure_with_args(
    aiohttp_session: aiohttp.ClientSession, parser: argparse.ArgumentParser | None = None
):
    if not parser:
        parser = argparse.ArgumentParser(description="Daily AI SDK Bot Sample")

    parser.add_argument(
        "-u", "--url", type=str, required=False, help="URL of the Daily room to join"
    )
    parser.add_argument(
        "-k",
        "--apikey",
        type=str,
        required=False,
        help="Daily API Key (needed to create an owner token for the room)",
    )

    args, unknown = parser.parse_known_args()

    url = args.url or os.getenv("DAILY_ROOM_URL")
    key = args.apikey or os.getenv("DAILY_API_KEY")

    if not url:
        raise Exception(
            "No Daily room specified. use the -u/--url option from the command line, or set DAILY_ROOM_URL in your environment to specify a Daily room URL."
        )

    if not key:
        raise Exception(
            "No Daily API key specified. use the -k/--apikey option from the command line, or set DAILY_API_KEY in your environment to specify a Daily API key, available from https://dashboard.daily.co/developers."
        )

    daily_rest_helper = DailyRESTHelper(
        daily_api_key=key,
        daily_api_url=os.getenv("DAILY_API_URL", "https://api.daily.co/v1"),
        aiohttp_session=aiohttp_session,
    )

    # Create a meeting token for the given room with an expiration 1 hour in
    # the future.
    expiry_time: float = 60 * 60

    token = await daily_rest_helper.get_token(url, expiry_time)

    return (url, token, args)

We are now set to go!

Running

Enter python demo.py and watch our bot come alive!

When we join the room on our device (or any other! long live WebRTC). We will see that a participant has joined, and we are able to interact with the model:

Interacting with it and asking about the weather is also possible!

Next Steps

We successfully created an agent, that responds within an acceptable time amount. The next steps to look into are now to:

  • Connect it to a web interface (for live monitoring, without terminals)
  • Create action intents and SOPs to ensure it follows a paradigm and interacts with our services
  • Fine-tune the model so it can only react on our content
  • ...