So I have a PS3 eye plugged into a Pi Zero W that I’m using as a Rhasspy satellite, and overall it’s been a great experience testing Rhasspy 2.5-pre. However, I’ve noticed that when the gain is turned up on the mic, a fair amount of background hiss is produced which prevents Rhasspy from detecting silence and stopping recording after I finish speaking. I can reliably and consistently get it to stop if I reduce the gain of the mic in
alsamixer when I finish speaking.
Even changing the VAD sensitivity to 3 for
webrtcvad doesn’t seem to reliably detect silence after I finish speaking.
I attempted to set up Pulseaudio with the
module-echo-cancel module, but something isn’t compiled right for the Pi Zero/armv6 on Raspbian Buster, so Pulseaudio crashes with illegal instruction errors.
My next idea was to try noise reduction with
sox by generating a noise profile when the mic is silent:
# record a 5 second file with no speaking and just background noise arecord /tmp/noise.wav -r 16000 -d 5 # Create background noise profile from mp3 sox /tmp/noise.wav -n noiseprof /tmp/noise.prof # record a 5 second file with speaking arecord /tmp/speech.wav -r 16000 -d 5 # Remove noise from wav using profile sox /tmp/speech.wav /tmp/fixed.wav noisered /tmp/noise.prof 0.21
This seems to work very well and the fixed audio has much less background hiss.
So I have some questions for the gurus and for @synesthesiam:
- Are there settings I’m overlooking in
webrtcvadthat could fix this issue?
- If not, is it possible to modify the
arecordpipeline in Rhasspy to include
soxdoing noise reduction in the middle? Something like this could work:
arecord -t raw -r 16000 -f S16_LE -c 1 | sox -t raw -b 16 -e signed -c 1 -r 16k - -r 16k -t wav - noisered /tmp/noise.prof 0.21
I could buy better hardware, but I’d love to see if I could make this work before investing money. Thanks in advance for any ideas and thoughts!