Speech to text conversion is something not for the chicken hearted fellow. It needs big computer and hoards of computing power. However, Raspberry Pi zero computer is here to give you the same test for a penny!

In this DIY we are going to play with Raspberry Pi zero for speech recognition. Speech recognition is a type of artificial intelligent programming. Understanding the phonetics, and then reproducing it, needs lots of training and refinements. The cost of the project will be INR 2500 [USD $30 only]. This feat is perhaps not possible on the MCU boards like — Arduino, Arduino Mega 2560, ESP32 or others.

Sound output on Pi Zero: There is no sound output on Pi-Zero board. One has to connect an HDMI device [in other words, a TV] to Pi-Zero to get sound. However, there is another easier way, which is shown in the schematic. Any two of the four PWM GPIO pins [12,13,18,19] can be used to force audio output by adding just one single line in

/boot/config.txt file - [dtoverlay=audremap,pins_18_13].

Just open the file and go to the bottom and add this line, followed by a reboot. Next time, you will be able to hear audio on those two GPIO pins. Just add a headphone on the IN-L & IN-R to get sound on the headphone. So simple! The PAM8403 amplifies the sound further for a speaker.

Schematic of audio output on Pi-zero
$> sudo nano /boot/config.txt #add the line and reboot.

For Pi-5, please look out for the location of the config.txt. It may be at some other place. This PAM8403 is a 5Volt stereo amplifier but it works on 3.3 volt too.

BOM of this project:
source Alibaba.com
  1. Raspberry Pi Zero 2W - USD $25
  2. PAM8403 Amplifier - USD $1
  3. 4 ohm Speakers - 02 nos. USD $1 / piece
  4. USB microphone - USD $1 to $20 [depending on variety]
  5. OTG cable for connecting USB microphone - USD $1 or less 

Project Software

Here we will experiment with three softwares - spchcat, pocketsphinx & Google STT. Installation of pocketsphinx & Google STT is easy & straightforward. It is a python pip module. You can use it only within python code. On the other hand, pchcat is a command line program

pocketsphinx: [The most lightweight software works mainly on English phonetics]
$> pip install pocketsphinx$>sudo apt-get install -y python3-pocketsphinx pocketsphinx-en-us

Here is a small python program for live speech transcription
from pocketsphinx import LiveSpeechspeech = LiveSpeech()print("Listening...")

for phrase in speech:    print(f"Transcript: {phrase}")

Run this code and then speak into the USB microphone and you would find live speech transcription. While this works great on a powerful computer like Raspberry Pi-4 or 5, this misses many words on Pi zero.

spchcat: Spchcat is an OSS uses TensorFlow recognition model. Supports 46 language models, including Indian languages — Tamil & Bengali.

Software Source

Help File & Codes: github.com/petewarden/spchcat 
Software: github.com/petewarden/spchcat/releases/download/v0.0.2-rpi-alpha/spchcat_0.0-2_armhf.deb

The speechcat software is a command line software which reads directly from wav files. It’s a big software [1.2GB] and will not install on pi-zero. Download it on your Raspberry Pi-4 computer or using scp [shell copy] transfer it on your Pi-zero sdcard. After that take the sdcard to a Raspberry Pi B+ or 4 and then double click the downloaded software. It will install on Raspberry Pi 4 in about 25 minutes. Take the Sdcard to your Pi-zero and it will work there.

That’s the trick to cheat a Pi-zero board!
$>spchcat - - source=system

#this will get sound from system & transcribe on terminal

$>spchcat /home/your-path/audio.wav #this will produce transcript on terminal

$>spchcat /home/your-path/audio.wav > /home/your-path/audio.txt #to audio.txt file

Sample WAV file for testing with spchcat:

https://github.com/coqui-ai/STT/releases/download/v1.1.0/audio-1.1.0.tar.gz download the sample wav files to test with spchcat. Listen to this wav files to understand the quality of the wav files required to get a clear transcription. Even a clear 8000 Hz sound is also possible to get transcribed using spchcat! Spchcat works great for wav file transcription on Pi zero.

Google STT : Google Speech-TO-Text, Support 125+ languages as off now! It works two ways - listens & transcribe, listens-records-transcribe. For listens & transcribes [in the gaps of speeches] works great for Raspberry Pi-4 or Pi-5 but for Raspberry Pi zero the 2nd option works great - Records it for a duration and then stops & transcribes. To install Google STT you have to install the pip module.

Necessary software to get Google STT on Raspberry Pi zero:
$pip install SpeechRecognition

Project code: We are providing a few codes here with description.
  1. sr1.py #live speech transcription using google STT, works great for powerful computers.
  2. sr2.py #live speech transcription using PocketSphinx, works great for powerful comp.
  3. sr3.py #offline speech transcription. Use WAV file and write to *.txt file. Uses GTTS. # works great for Raspberry Pi zero.
  4. sr4.py #Sets time frame, records live voice & then transcribes to an audio.txt file  #works great for Raspberry Pi zero. Study this code carefully to understand  #where to change the language choice [say for conversion from Japanese speech                                #to English text]
Although strictly speaking, for this project the sound output is not required on Pi zero. But you may require to test your recorded voice or listen to the English transcription. For that, this will be required.
$> aplay your-file.wav  #to playback your recorded voice

$> espeak-ng -f /home/your-path/text.txt   #text to speech using espeak-ng

$> arecord -D plughw:1,0 -f S16_LE -r 16000 -d 10 audio.wav #record 10 seconds @ 16bit.

Aftermath: Certainly, Speech to text conversion is a serious business. Serious users use costly software & powerful computers to do it online during live events. Vis-à-vis doing it on Raspberry Pi zero looks like a child’s play but there are other advantages too using this small power [5Volt only] Pi zero computers. Some typical uses that I recommend for this Pi zero speech transcription may be like this -
  1. A on-the-go voice assistant to a hearing impaired person who can promptly decipher the few sentences spoken while on the move.
  2. Traveller’s assistant on a foreign land , say talking to cab driver or the waiter etc.
  3. Robotics - Say you want to sell your voice operated robot toys to Chinese buyers! Besides, essentially consuming minimal power, it will efficiently convert the foreign speeches to English commands for further processing!
  4. Handheld transcripters to be used at the point-of-sales at super markets, to quickly help to understand the foreign phrases used by the buyers or the sellers at that point. 
  5. On Robots for providing additional self intelligent for choosing object and photographing them. The moderate sized edge impulse AI model for face-recognition, number counting etc can be loaded easily on Raspberry Pi zero.