Improving wake word recognition

I’m using porcupine as my wake word engine, with the default word of “porcupine”. I find that I have to be pretty close to the mic (like 8ft, 3m or so) for it to recognize me, and increasing the volume of my voice definitely helps. I’m using a USB conference mic, with the input volume about 66%. I find that increasing much past that seems to make it worse for some reason. Listening to the WAVs of my subsequent voice commands, my voice always sounds pretty quiet, which I assumed meant I needed to up the level a bit. Background mic noise does not seem bad at all, even with it up, especially compared to my first webcam-as-a-mic attempt.

Anyway, it seems to me like if I can shout at it and make it work, it should be possible to up the gain to avoid needing to do that. Is my experience a common thing, or am I missing something? Tips for tuning?

if you do a amixer -c[card] contents check that the AGC is set as likely being a conf mic it will have one.

The speex Alsa plugin also has a AGC component but it doesn’t get installed on debian as for some reason Speex is still the old and out of date release candidate so you have to install yourself than from the repo.

PS 8ft / 3m is not that close.

Amazon set some stats where ‘close’ is 0 to 0.3 m (1 ft), ‘near’ is 0 to 0.9 m (3 ft) and ‘far’ is 0 to 2.75 m (9 ft).

Unfortunately you need to get the sound right yourself where its not clipping but you are getting a good normalised signal, there is no audio processing chain its DiY.

Thanks, my cheapo conf mic doesn’t seem to have an AGC flag, but I’ve got a Jabra 410 coming soon to play with. Maybe it has one.

I’ll chase down the speex AGC thing, thanks.

So are you saying that 8ft is a reasonable distance to expect it to work from, but not much farther? I’ve never had an echo or google home thing, but I guess I expected them to work from quite a bit farther (even though I know they’re likely to outperform anything I can do). My house has some large open spaces, and was hoping I’d be able to get one satellite per space to work. Maybe my expectations are off. I guess I have a hard time seeing this be super useful if I can’t hit it from the living room as well as the adjoined kitchen :slight_smile:

There is an older install prob better doing and having the 64bit install.

The thing to do is just record with arecord to a wav and use winscp or whatever you use to get it to a desktop and use Audacity so you can visualise and listen to what you are getting.
With beamforming and various algs you can extend distance but its very much on the SNR noisefloor to how much gain you can give so very dependent on hardware.
Sound is like a ripple in a pool and depending on your room shape and what is in it you will get reverberation that will all sum at the mic with different wavelengths all mixing at the mic and often creating a very different spectral image near and far.

Many of the conference mics are not much more than desktop mics that are setup for ‘close’ field and even the more expensive ones prob should be called boardroom mics as prob better as a central mic on a big shared table than ‘far’ field.
All are different Adafruit do a electret preamp that you can stick into any sound card and the preamplification changes the level from desktop ‘close’ field to further afield very cost effectively.
Every Mic is different some more sensitive than others and some with algs to make them directional and less susceptible to reverberation with a directional gain pattern and more expensive doesn’t mean it will be better for ‘far’ field as the all have a design destination that varies.

You need to record and see what you are getting…

1 Like

Yep, I got the updated speex and alsa plugins built in my container and can capture from the agc target now.

I’ve been listening to the WAV files from various attempts all along and the one thing I notice with all of them is the acoustic echoing from the shape and nature of my room. I can’t really tell any difference with the agc target. All of them sound plenty clear to me (aside from the echo) and I’m definitely not clipping, even with agc enabled.

Perhaps I need to work on an echo filter. There’s really not much I can do to move it to a better place. The main living area is all wood and hard surfaces. It’s shocking how echoy the audio the mic picks up, compared to my ears.

The echo (reverbervation) often makes a very different spectral image likely in Audacity switch the view to spectrogram.
The KWS use what is called MFCC which is a blocky 2D form of spectrogram of approx 80x13 showing freq bins and intensity over the 1 sec capture and reverberation can greatly effect that, but also volume can as for best results you want a normalised input.

Beamforming can help with reverberation GitHub - StuartIanNaylor/2ch_delay_sum: 2 channel delay sum beamformer prob needs a tidy up but with the 2mic hats or Plugable USB Audio Adapter – Plugable Technologies (2 channel adc) it can help much.
The sound engineering for the input audio with Google, Amazon… is extremely well engineered and its not part of Rhasspy so apols you will have to do that yourself.

Yeah, I know there’s a ton of engineering in the for-pay devices. Just trying to get the best I can out of free (as in speech) stuff.

So I just noticed that I’m getting this error on startup from arecord:

arecord -f S16_LE -d 1 > /dev/null
Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 8000 Hz, Mono
warning: Unknown speex_preprocess_ctl request:  2
warning: Unknown speex_preprocess_ctl request:  6

Which seems to indicate that it’s not actually doing AGC, seemingly due to lack of support on fixed-point-only systems like ARM. I find lots of stuff out there about AGC in speex not being supported on ARM specifically. That is probably why I can’t tell a difference between the two recordings when I test it I guess.

I know I’m loading the new libraries because it used to choke on agc in asound.conf and now it doesn’t. I’ve also watched it via strace and I see it’s definitely loading my updated speex and speexdsp libraries.

Am I missing something about enabling that properly? I’ve tried recompiling the speex stack with --enable-fixed-point which makes no difference.

?

Floating-point – Arm®.

Speex AGC I have run myself and guess you just have not managed to compile speex and the alsa-plugins that are missing correctly.

There are several such references, but granted this is pretty old at this point:

http://lists.xiph.org/pipermail/speex-dev/2007-December/006401.html

That’s specifically the warning I’m getting indicating it being related to lack of floating-point support, including questions like this indicating FPU-less ARM:

http://lists.xiph.org/pipermail/speex-dev/2008-July/006795.html

However, I guess Raspberry Pi systems have hardfloat?

Anyway, I followed you instructions exactly for compiling those libraries (and I’m also not inexperienced at that). Since that error message gets referenced in questions about AGC and ARM, it seemed relevant. I’ll keep digging.

Okay, yeah, something was getting loaded from the older system libraries. Lots of cleaning and reinstalling and I no longer get that error. Now to integrate the working stuff and see if agc actually helps.

uname -a
Linux raspberrypi 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux
 aplay --version
aplay: version 1.2.4 by Jaroslav Kysela <perex@perex.cz>
sudo apt install libasound2 libasound2-dev libasound2-plugins
sudo apt install libfftw3-3 libfftw3-dev
sudo apt install libspeex1 libspeexdsp1 libspeex-dev libspeexdsp-dev speex speex-doc
sudo apt install git autotools-dev autoconf libtool pkg-config
git clone https://gitlab.xiph.org/xiph/speexdsp.git
cd speexdsp
./autogen.sh
./configure --libdir=/usr/lib/aarch64-linux-gnu/
make
sudo make install
git clone https://gitlab.xiph.org/xiph/speex
cd speex
./autogen.sh
./configure --libdir=/usr/lib/aarch64-linux-gnu/
make
sudo make install

Get latest <= to version 1.2.4 by Jaroslav Kysela <perex@perex.cz>

wget https://www.alsa-project.org/files/pub/plugins/alsa-plugins-1.2.2.tar.bz2
tar -xvf alsa-plugins-1.2.2.tar.bz2
cd alsa-plugins-1.2.2
./configure --libdir=/usr/lib/aarch64-linux-gnu/
make
sudo make install

One day xBian will update Speex from the RC to the release as this is just a pain, but Alsa-plugins wants the release not the RC.

sudo nano /etc/asound.conf to suit

pcm.!default {
    type asym
    playback.pcm "plughw:1"
    capture.pcm  "agc"
}

pcm.array {
 type hw
 card 1
}

pcm.cap {
 type plug
 slave {
   pcm "array"
   channels 1
   }
 route_policy sum
}

pcm.agc {
 type speex
 slave.pcm "cap"
 agc 1
 agc_level 4000
 denoise no
 dereverb no
}
arecord -r16000 -f S16_LE -d1 > /dev/null
Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono

I ran through as noticed we are now on kernel 6.1

I keep meaning to hack the Alsa-plugin as agc_level 4000 is really a rate and its fairly easy to add agc_max as would be much better with both params, as on silence it just ramps up into the noisefloor but actually doesn’t matter to much.

Yep, I had followed the instructions. Since I’m running rhasspy in docker, I was doing the builds outside the container and then jamming them in there. Clearly I was missing something even though everything in the chain from alsa->speex->speexdsp was loading the new stuff. But, I just changed to build and install the full chain in a new docker image based on the rhasspy one and it’s working now. AGC is definitely AGCing and I get a more “vocoder sound” to the recordings. I’ll see how it performs today.

Yep bit more of a pain with Docker but pretty easy to save the running container as a new image and change the Rhasspy run.
The emphemeral nature will mean unless you do so, your changes will be gone on each run.