A Hands-On Guide to Creating a ChatGPT Voice Interface in 10 minutes

4 min readFeb 11, 2023

OpenAI API’s like davinci-003 works best when provided enough context. It can be tedious to type all that context. It’s easy to use Python’s speechrecognition module along with a text to speech service service from Azure to talk to GPT APIs instead of all the typing. It is inspirated by Nikhil Sehgal’s https://www.youtube.com/watch?v=uITSqxVvO24.

Follow these easy steps for creating a quick and dirty voice interface for GPT AI.

Create an OpenAI account to get an API key (free for 14 days)
Create an Azure Cognitive Services free account
Create a .env file
Install Python modules
Create new Python script
Create a batch file to execute this script so that you can double click and start talking…
Create a Desktop Shortcut

Begin by creating an application root folder called openai-assistant eg: C:\openai-assistant

Step 1: Go to OpenAI.com and create an account, then click your Profile > View API Keys > Create a new secret key

Copy this value starting from sk- to a notepad, it will be needed in step 3

Step 2: Go to Azure Portal (create an account if you dont have one already). Type “Speech Service” in Search at top and click on Speech Services. Click New. Create a new Resource Group if you need to. Name will have to be unique. Select Pricing Tier = Free and click Review and Create.

Wait for Speech Service to be deployed, then navigate to Keys and Endpoint blade in left and copy Key 1 and Region to be used in Step 3

Step 3 Create and .env file in your application root folder. Then add following 3 lines to it. Replace placeholders with OpenAPI key, Azure Speech Service Key 1 and Azure Speech Service Region

OPENAI_API_KEY="<OpenAPI Key starting with sk- >"
SPEECH_KEY="<Azure Speech Service Key1>"
SPEECH_REGION="<Azure Speech Service Region>"

Step 4

Create requirements.txt file in your application root folder and copy following lines in it. Then execute it with pip installer

azure-cognitiveservices-speech==1.25.0
SpeechRecognition==3.8.1
openai==0.26.4
keyboard==0.13.5
PyAudio==0.2.13

pip install -r requirements.txt

Step 5 Create a main.py file and copy following script to it

import openai 
import speech_recognition as sr
from dotenv import load_dotenv
import os
import azure.cognitiveservices.speech as speechsdk
import keyboard 

load_dotenv()

openai.api_key = os.getenv("OPENAI_API_KEY")
speech_key, speech_region = os.getenv("SPEECH_KEY"), os.getenv("SPEECH_REGION")

#Uses Azure Cognitive Services to convert text to smart sounding English accent speech
def synthesize_speech(text):
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
    speech_config.speech_synthesis_voice_name="en-GB-RyanNeural" 
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config = speech_config, audio_config = audio_config)
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()


def prompt_to_listen(recognizer, prompt):
    print(prompt)

    with sr.Microphone() as source:
        recognizer.adjust_for_ambient_noise(source, duration=0.2)
        audio = recognizer.listen(source, timeout = 10)
    try:
        user_question = recognizer.recognize_google(audio_data=audio)
        print(f"\n<< {user_question.capitalize()}")
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))

    if len(user_question) > 0:
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=user_question,
            temperature=0,
            max_tokens=250, #You can go upto 4000, it impacts cost
            top_p=1,
            frequency_penalty=0,
            presence_penalty=0,
            stop=None
        )

        response_text = response.choices[0].text
        print(f"{response_text}")
        print(f"\nTotal Tokens Consumed: {response.usage.total_tokens}")
        synthesize_speech(response_text)

        action_prompt = f"\nPress Spacebar to ask follow up question? Press Enter to end."
        print(action_prompt)

        while True:
            if keyboard.read_key() == "space":
                prompt = f"\nGo ahead, I am listening..."
                prompt_to_listen(recognizer, prompt)
                break
            if keyboard.read_key() == "enter":
                break


if __name__ == "__main__":
    recognizer = sr.Recognizer()
    recognizer.energy_threshold = 200
    recognizer.dynamic_energy_threshold = True
    
    context_prompt = f"\nHow would you like AI to help you? Speak, I am listening..."
    
    # Recursive method to process user input with OpenAI API
    prompt_to_listen(recognizer, context_prompt)

At this point you can test your main.py file by going to a command prompt and typing below command

python main.py

Step 6 is to create a batch file that you can double click to execute this python file

cd c:\openai-assistant\ && python main.py && pause

Step 7 Right click on batch file and select send to Desktop (create shortcut) to create a shortcut on desktop.

End result will be an command window that will prompt you to speak your question and will talk back to you with the text received from OpenAI APIs. It will then prompt you to ask a follow up question using prior context. Enjoy!

Let me know if you run into any issues. I will be happy to help and update this article.

A Hands-On Guide to Creating a ChatGPT Voice Interface in 10 minutes

Written by Parag Shah