Wakeword cool toy EdgeImpulse

Hi all
Just a heads up.
I just found a cool tool to build and deploy a wakeword handler.

Yep you have to train it yourself but the results are solid and the code to deploy it is straight forward ( extensive yes but clean and clear ).
Well the cpp part is anyway.
It took me several hours to record and upload the data but the interface makes that easy.
It has a nice web GUI
It’s open source
It Free for now :slight_smile:

2 Likes

Ok I have the wakeword now working. Yes I had to record my wake word 600 times but in comparison with the massive hill you have to climb elsewhere this is a fantastic resource.
It is working quite well. Very very few false positives and having compiled the cpp example with a few modifications it is now working properly in the rhasspy environment.
I have totally disabled the wake word in rhasspy and even though rhasspy is still collecting and sending the sound over mqtt (haven’t played with the mqtt enough to disable it yet ) the load is heaps lower than raven, the accuracy is through the roof in comparison and it’s really fast.
If anyone is interested in more info drop me a line and I’ll add more info.
Having a good wake word system makes the whole system work so much better.

1 Like

Sounds very interesting.
I am using rhasspy only since a few months. I couldn’t get raven to work, so I am kind of stuck with porcupine and a single wake word ‘computer’.
It would be great to have a customised wake word. I would appreciate some more details how you integrated it. Is it possible to have more than one wake word? And would it be possible to recognise different speakers and send this information through to home assistant? (I mean e.g. Google knows if I asked for something, or a guest )

Ok
So the first thing to do is train your wake word/s in edgeimpulse.
This takes a lot of repetitions (600 at least) of each word.

Then what I did was build the cpp version on my linux box however there are all sorts of deployment options. I wanted it to work with my Acusis S so I could have the radio playing or an audio book or anything else coming out of my computer removed from the mic input.

once i had the compiled version working how I wanted it I modified the cpp code so the “audio” example would exit once it had received a valid wake word.
This compiled code uses very little resources.

I then wrote a simple shell script

#!/usr/bin/env bash

while true
do
/insert path to program here/audio default
mosquitto_pub -h localhost -t hermes/hotword/Simone/detected  -m "{ \"wakewordId\": \"Simone\", \"model_id\": \"default\", \"model_type\": \"personal\", \"site_id\": \"Simone\" }" 
mosquitto_sub -W 20 -C 1 -h localhost -t hermes/asr/textCaptured

done

This is just an infinite loop that runs the audio program
when it finds a wake word it exits.
it then uses mosquitto_pub to wake rhasspy
then runs mosquitto_sub to block until rhasppy is ready again.
runs through the loop again.

ok yes it is however the only problem is each word you train takes at least 600 repetitions.
If you want to train with different voices other people need also to be involved.
There is a web interface which makes training a lot easier.
There is a explanation of how to get started here.

1 Like

Thanks, specially for the shell script.
What I meant here is not to train other voices, only mine. But whether the AI is able to detect later if it was me or a unknown/untrained voice. (Like Google does it for example if you ask Google your name it will tell you. If a friends comes by it will say something like it doesn’t know his name yet)

As it is a specific model trained to a specific pattern the wakeword is unlikely to trigger at all. But that is the cool thing about it.
My voice is quite deep and my accent is also fairly specific as such very few people would trigger the wakeword at all.

You could train it as wake1 and your partner as wake2 and they would be completely independent. The cpp exit would simply have to print the trigger on exit and pick that up in the shell script.

1 Like

Ok for consistancy I will add the process I went through here.

The process I used is

1 Train word/s in edge.
2 Tune them in edge. This takes quite a bit of tweaking to clean up your samples and the training parameters.
3 Download the sdk.
4 Export your edge results as per the instructions and include them into the sdk.
5 Compile and Test the results on the command line.
6 Modify the code so at when your wakeword is triggered over a particular percentage confidence it simply exits.

I then used the script above to run forever and start it the same way and at the same time as I start rhasspy.

Voila

some gotchas
Doing satellite stuff will be completely different. Not something I’m chasing sorry.
But the edge code will run on many SOC systems.

TRAINING
The more variation and the more repetitions you add in the word training will make your model more robust and avoid over fitting. So be patient.

MQTT

the mosquitto clients are available in a separate deb
You probably need to have mqtt set as external on localhost in your rhasspy config

the relevant section of my profile.json is

   "mqtt": {
        "enabled": "true",
        "site_id": "Simone"
    },

SOUND
As you are sharing the microphone with rhasspy you will need to have pulse enabled. Or a fancy alsa loop back device that I couldn’t be bothered solving.
But if you test it while rhasspy is running you can work out how to get it going at the command line test stage. If you have the alsa pulse plugin then an arecord/aplay -L will list the devices that the c++ code uses at least.
default works fine for me here even though the radio is also going out the sound device as well.

You may also have trouble getting the rhasspy sound config right.
I got mine working with gstreamer and the deb release which starts after I login to X. ie after the pulse server has started.

It works but I’m sure it’s not ideal.

 "microphone": {
        "command": {
            "record_arguments": "alsasrc ! audioconvert ! audioresample ! audio/x-raw, rate=16000, channels=1, format=S16LE ! filesink location=/dev/stdout",
            "record_program": "/usr/bin/gst-launch-1.0",
            "siteId": "localhost",
            "udp_audio_port": "12345"
        },
        "system": "command"
    },

The bogus UDP section tells rhasspy not to send the microphone stuff over mqtt while it’s alseep. Saves the mqtt traffic that’s not doing anything anyway.

and

 "sounds": {
        "command": {
            "play_arguments": "--no-interactive -q /dev/stdin",
            "play_program": "/usr/bin/gst-play-1.0"
        },
        "system": "command",
    },
3 Likes

Oops I forgot one thing.
I simply disabled wake word in rhasspy.
I also tried setting it to mqtt with no real obvious difference as the script triggers rhasspy appropriately either way.

Also

I just moved the mic and the results are not as good I have clearly over fit my model.
So just a heads up if you move the mic or change the surroundings you may just have to do more training. This will improve the results anyway.
So variation is very important in the training stage.

Also adding sounds that are similar to the wake word to the unknown group will help make your model more robust. You may just have to train it more times.

2 Likes

Hi,
Thank you for this great stuff.
Can’t wait to give it a try :slight_smile:

Regards

in the edgeimpulse example audio.cpp
I simply modified the code so that the function
classify_current_buffer
returned a negative number unless it had a confidence above 0.996
(but I worked that value out by simply testing with the basic audio.cpp)
I then added a simple test to see if the returned index was the index of my wakeword then the program exits.

really simple and clean.

Yeah you can do heaps more but that gets a reliable personal wakeword up and running.

Have fun.