Streaming Speech Recognition with Python using a Websocket
We'll use the websockets
Python package to write our client. You can install
it using:
pip install websockets
The streaming speech recognition API expects 16 bit linear PCM audio and when
using REST or Websocket the audio needs to be base64
encoded. Optimally the
audio should be sampled at 16 kHz, but it's not a requirement and the service
will resample incoming audio if necessary.
Let's say we have a 16 bit WAV file sampled at 16 kHz. We can then stream chunks of audio while simultaneously receiving partial transcriptions. The first message in a stream is a config message which sets the encoding, sample rate and language of the incoming audio.
We have to define a request generator which generates chunks of audio to be recognized:
import base64
import wave
def generate_requests(wav_path, chunk_width=1024):
with wave.open(wav_path, 'r') as wav:
sample_rate = wav.getframerate()
yield {
"streamingConfig": {
"config": {
"encoding": "LINEAR16",
"sampleRateHertz": sample_rate,
"enableWordTimeOffsets": True,
"languageCode": "is-IS-x-exp",
},
"interimResults": True,
}
}
while True:
chunk = wav.readframes(chunk_width)
if not chunk:
return
yield {
"audioContent": base64.b64encode(chunk).decode('utf-8')
}
Let's now write a client using the async interface for websockets
:
import asyncio
import websockets
import sys
import os
import json
async def main():
uri = "wss://speech.talgreinir.is/v2beta1/speech:streamingrecognize?token=" + os.environ["TIRO_SPEECH_KEY"]
async with websockets.connect(uri, ssl=True) as sock:
async def read():
try:
out_transcript = ""
async for m in sock:
try:
response = json.loads(m)
transcript = response["result"]["results"][0]["alternatives"][
0
]["transcript"]
is_final = response["result"]["results"][0].get(
"isFinal", False
)
current_output = (
" ".join((out_transcript, transcript))
if out_transcript
else transcript
)
if is_final:
out_transcript = current_output
transcript = ""
print(
current_output,
end="\r",
flush=True,
)
except KeyError:
pass
except websockets.ConnectionClosed:
print()
async def send():
for m in generate_requests(sys.argv[1]):
out = json.dumps(m)
await sock.send(out)
await asyncio.gather(send(), read())
if __name__ == "__main__":
asyncio.run(main())