With voice assistants becoming ubiquitous we get used to things listening to us. In this article we explore how to get a headless RaspberryPi Zero W to be our offline voice assistant.
Disclaimer: This is a work in progress. This one is just barely working. And I didn’t even proof-read it thoroughly.
Hardware
We are using a RaspberryPi and a Keyestudio ReSpeaker 2-Mic Pi Hat, which was initially designed by seeedstudio. The components look like this:
The SD card is a massively oversized U1 (for 100 MiB/s) one:
Disk /dev/mmcblk0: 183,4 GiB, 196865949696 bytes, 384503808 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000
Device Boot Start End Sectors Size Id Type
/dev/mmcblk0p1 32768 384503807 384471040 183,3G 7 HPFS/NTFS/exFAT
Software
Download a Raspbian lite image from here: https://www.raspberrypi.org/downloads/raspbian/.
sudo dd bs=4M if=2019-09-26-raspbian-buster-lite.img of=/dev/mmcblk0 status=progress conv=fsync
And here we enjoy the U1 quality of our SD card:
After dd
-ing the image we mount it, enable ssh and prepare WLAN access:
cd /media/rawland/boot
touch ssh # enable ssh
touch wpa_supplicant.conf # we enter WIFI in a second
open wpa_supplicant.conf
with your favorite editor and enter:
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1
country=US
network={
ssid="YOURSSID"
psk="YOURPASSWORD"
scan_ssid=1
}
watch out that your country code might be different!
I’m missing HypriotOS at this point already… but that’s for another day. First we go the known path.
Next we enable UART in the config.txt
inside of the boot
partition by adding
# Enable UART
enable_uart=1
at the bottom of the file. By that we can add a USB console later for debugging.
After unmounting we put the SD card into the Raspberry Pi Zero W and plug in an USB cable. Following that it will boot (intense blinking of a green LED on the board).
After it booted (and the blinking stopped), we search for it on the network:
sudo nmap -sn 192.168.xxx.0/24
or whatever your ip-range is.
Then we log in using pi:raspberry
as login:password combination
ssh pi@raspberrypi.whatever
change our password as recommended.
You can now adapt the Pi to your needs. Like expanding the filesystem and such using sudo raspi-config
which is mostly self explanatory. It is further recommended to enable SPI and I2C there as they are used quite often. We’ll need the SPI specifically later on. You can find them in Interfacing Options
and the fs expansion in Advanced Options
.
Update everything:
sudo apt update
sudo apt upgrade
Install git
sudo apt install git
ReSpeaker
When the ReSpeaker is connected well you can power the setup over it’s Micro USB port, too. That’s what I did.
Then, let’s install seedstudio’s voice card:
git clone https://github.com/respeaker/seeed-voicecard.git
cd seeed-voicecard
sudo ./install.sh
reboot
This installs:
Suggested packages:
python3-apport menu libi2c-dev python-smbus jackd2 opus-tools pulseaudio librsvg2-bin
lm-sensors speex
The following NEW packages will be installed:
dkms fontconfig fontconfig-config fonts-dejavu-core i2c-tools libaom0 libasound2-plugins
libasyncns0 libavcodec58 libavresample4 libavutil56 libcairo2 libcodec2-0.8.1 libcroco3
libdatrie1 libdrm-amdgpu1 libdrm-common libdrm-nouveau2 libdrm-radeon1 libdrm2 libflac8
libfontconfig1 libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgl1
libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libgraphite2-3 libgsm1
libharfbuzz0b libi2c0 libice6 libjack-jackd2-0 libjbig0 libllvm8 libmp3lame0 libogg0
libopenjp2-7 libopus0 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libpixman-1-0
libpulse0 librsvg2-2 librsvg2-common libsensors-config libsensors5 libshine3 libsm6
libsnappy1v5 libsndfile1 libsoxr0 libspeex1 libswresample3 libthai-data libthai0 libtheora0
libtiff5 libtwolame0 libva-drm2 libva-x11-2 libva2 libvdpau-va-gl1 libvdpau1 libvorbis0a
libvorbisenc2 libvpx5 libwavpack1 libwebp6 libwebpmux3 libx11-xcb1 libx264-155 libx265-165
libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-render0 libxcb-shm0
libxcb-sync1 libxcb-xfixes0 libxdamage1 libxfixes3 libxi6 libxrender1 libxshmfence1
libxtst6 libxvidcore4 libxxf86vm1 libzvbi-common libzvbi0 mesa-va-drivers
mesa-vdpau-drivers raspberrypi-kernel-headers read-edid va-driver-all vdpau-driver-all
x11-common
If everything went fine the last lines should look like this:
------------------------------------------------------
Please reboot your raspberry pi to apply all settings
Enjoy!
------------------------------------------------------
and thus we simply follow the instructions and sudo reboot
.
Afterwards you can check if everything works by plugging a headset by cable into the ReSpeaker and run this command:
arecord -f cd -Dhw:1 | aplay -Dhw:1
If we say something we will hear what we say with a huge delay. If this doesn’t work, check if the sound card 1 name matches seed-2mic-voicecard by comparing the outputs of aplay -l
and arecord -l
.
We can configure everything with alsamixer
if stuff is too loud and such.
Next we check the LEDs onboard. It’s always nice to see if the audio is received.
sudo apt install python-pip -y
sudo pip install spidev
cd ~/
git clone https://github.com/respeaker/mic_hat.git
cd mic_hat
python pixels.py
And now the button
sudo pip install rpi.gpio
and create a file button.py
with the content blow. We can use whatever editor, we want. I like vi(m). (sudo apt install vim
) 😊.
If your terminal supports emoji this is quite fun:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import RPi.GPIO as gpio
import time
B = 17
gpio.setmode(gpio.BCM)
gpio.setup(B, gpio.IN)
print("We check the state of the button every second.")
print("If you keep it pressed, it should say '😲', else '😑'")
while True:
state = gpio.input(B)
if state:
print("😑")
else:
print("😲")
time.sleep(1)
which results in:
else:
import RPi.GPIO as gpio
import time
B = 17
gpio.setmode(gpio.BCM)
gpio.setup(B, gpio.IN)
print("We check the state of the button every second.")
print("If you keep it pressed, it should say 'pressed', else 'off'")
while True:
state = gpio.input(B)
if state:
print("off")
else:
print("pressed")
time.sleep(1)
Intermission (former Conclusion):
We set up our hardware and installed an operating system. Next we downloaded all necessary Python libraries and a bunch of sourcecode from seeedstudio. Following that we tested the microphones, the speaker, the LEDs and the button.
Next, we set up an offline environment for speech recognition. That is Rhasspy. If you don’t care about this and are OK with using Google, Baidu or Amazon’s Alexa APIs, feel free to follow the sources below.
Docker (just read this, it doesn’t work)
Now I regret not using HypriotOS directly. Ah, we’ll do this in the end once more. Right now we are too far into it. At least we are on the Long Term Support side of things.
Install Docker with
curl -sSL https://get.docker.com | sh
let’s add our user with
sudo usermod -a -G docker $USER
and we discover that this might be bit too much for the zero, when looking at top
.
Virtual Environment (this works)
Let’s start without Docker:
git clone https://github.com/synesthesiam/rhasspy.git
and luckily we have almost a ready made bed. Almost, because we need to disable Kaldi:
DEFINE_boolean 'kaldi' false 'Install Kaldi'
in create-venv.sh
because Kaldi is not supported this is expected.
Then
cd rhasspy/
./download-dependencies.sh
./create-venv.sh
Ignore the messages about the failing Kaldi. The come from download-dependencies.sh
and it’s not supported on the armv6l
platform. :/
The overall downloads above take a while…
after this:
./run-venv.sh --profile en
if all is well, rhasspy should run at
http://early.whatever:12101
Conclusion
In this article we set up the tiny RaspberryPi Zero WH with the ReSpeaker 2 Mics. We set everything up using Rasbian and seeedstudio’s software stack. Rhasspy installed the regular steck omitting Kaldi (which is unfortunate). Basically, everything works. However, the Zero is so weak, that it takes a very long time to even understand short sentences. I’ll show a gif soon. Therefore, I shift this to the Raspberry Pi 4B and see how well this one performs.