A Hands-On Guide to Creating a ChatGPT Voice Interface in 10 minutes
OpenAI API’s like davinci-003 works best when provided enough context. It can be tedious to type all that context. It’s easy to use Python’s speechrecognition module along with a text to speech service service from Azure to talk to GPT APIs instead of all the typing. It is inspirated by Nikhil Sehgal’s https://www.youtube.com/watch?v=uITSqxVvO24.
Follow these easy steps for creating a quick and dirty voice interface for GPT AI.
- Create an OpenAI account to get an API key (free for 14 days)
- Create an Azure Cognitive Services free account
- Create a .env file
- Install Python modules
- Create new Python script
- Create a batch file to execute this script so that you can double click and start talking…
- Create a Desktop Shortcut
Begin by creating an application root folder called openai-assistant eg: C:\openai-assistant
Step 1: Go to OpenAI.com and create an account, then click your Profile > View API Keys > Create a new secret key
Copy this value starting from sk- to a notepad, it will be needed in step 3
Step 2: Go to Azure Portal (create an account if you dont have one already). Type “Speech Service” in Search at top and click on Speech Services. Click New. Create a new Resource Group if you need to. Name will have to be unique. Select Pricing Tier = Free and click Review and Create.
Wait for Speech Service to be deployed, then navigate to Keys and Endpoint blade in left and copy Key 1 and Region to be used in Step 3
Step 3 Create and .env file in your application root folder. Then add following 3 lines to it. Replace placeholders with OpenAPI key, Azure Speech Service Key 1 and Azure Speech Service Region
OPENAI_API_KEY="<OpenAPI Key starting with sk- >"
SPEECH_KEY="<Azure Speech Service Key1>"
SPEECH_REGION="<Azure Speech Service Region>"
Step 4
Create requirements.txt file in your application root folder and copy following lines in it. Then execute it with pip installer
azure-cognitiveservices-speech==1.25.0
SpeechRecognition==3.8.1
openai==0.26.4
keyboard==0.13.5
PyAudio==0.2.13
pip install -r requirements.txt
Step 5 Create a main.py file and copy following script to it
import openai
import speech_recognition as sr
from dotenv import load_dotenv
import os
import azure.cognitiveservices.speech as speechsdk
import keyboard
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
speech_key, speech_region = os.getenv("SPEECH_KEY"), os.getenv("SPEECH_REGION")
#Uses Azure Cognitive Services to convert text to smart sounding English accent speech
def synthesize_speech(text):
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=speech_region)
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
speech_config.speech_synthesis_voice_name="en-GB-RyanNeural"
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config = speech_config, audio_config = audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
def prompt_to_listen(recognizer, prompt):
print(prompt)
with sr.Microphone() as source:
recognizer.adjust_for_ambient_noise(source, duration=0.2)
audio = recognizer.listen(source, timeout = 10)
try:
user_question = recognizer.recognize_google(audio_data=audio)
print(f"\n<< {user_question.capitalize()}")
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
if len(user_question) > 0:
response = openai.Completion.create(
model="text-davinci-003",
prompt=user_question,
temperature=0,
max_tokens=250, #You can go upto 4000, it impacts cost
top_p=1,
frequency_penalty=0,
presence_penalty=0,
stop=None
)
response_text = response.choices[0].text
print(f"{response_text}")
print(f"\nTotal Tokens Consumed: {response.usage.total_tokens}")
synthesize_speech(response_text)
action_prompt = f"\nPress Spacebar to ask follow up question? Press Enter to end."
print(action_prompt)
while True:
if keyboard.read_key() == "space":
prompt = f"\nGo ahead, I am listening..."
prompt_to_listen(recognizer, prompt)
break
if keyboard.read_key() == "enter":
break
if __name__ == "__main__":
recognizer = sr.Recognizer()
recognizer.energy_threshold = 200
recognizer.dynamic_energy_threshold = True
context_prompt = f"\nHow would you like AI to help you? Speak, I am listening..."
# Recursive method to process user input with OpenAI API
prompt_to_listen(recognizer, context_prompt)
At this point you can test your main.py file by going to a command prompt and typing below command
python main.py
Step 6 is to create a batch file that you can double click to execute this python file
cd c:\openai-assistant\ && python main.py && pause
Step 7 Right click on batch file and select send to Desktop (create shortcut) to create a shortcut on desktop.
End result will be an command window that will prompt you to speak your question and will talk back to you with the text received from OpenAI APIs. It will then prompt you to ask a follow up question using prior context. Enjoy!
Let me know if you run into any issues. I will be happy to help and update this article.