Wakeword engine

KiboOst · July 16, 2020, 10:28am

Hi,

Like some may know, I do my tests with snowboy, which works really great but will be turn down end of 2020. Of course it will still works, but we won’t be able to generate new custom wakewords.

I’m looking for an engine with three different custom wakewords. One per family member so I can know who is asking something.

Right now we have such options : https://rhasspy.readthedocs.io/en/latest/wake-word/

snowboy : free, offline, custom wakeword. But EOL !
porcupine : must redo custom wakeword every 30 days
pocketsphinx : doesn’t seems very reliable
precise : seems working nice. training custom wakeword seems hard though, and still don’t run on pi0 ?

Actually I still run my snips system in production (Pi3 master and Pi0 satellites). The only thing keeping me from running rhasspy in production is akeword engine. Snips one works great with three custom wakewords (snowboy seems even a bit better).

So actually, what is the consensus regarding wakeword engine for rhasspy with several custom wakewords ?

Is there anything new on this side ? Or something to come ?

Thanks

fastjack · July 16, 2020, 12:47pm

The KWS system is indeed the last major piece of the puzzle.

There have been some discussions to port the Snips personal wakeword detection system detailed in this post:

I made a Node.js module following these guidelines with a few tweaks and it is working pretty good in my homemade setup (not Rhasspy).

It should not be too complicated to port this to python but for more efficiency it would need C++ or Rust (at least the features extraction and DTW parts). I think @maxbachmann is looking into it but not sure…

maxbachmann · July 16, 2020, 1:24pm

Yes I am looking into it, but since I got quite a bit to do at work right now I am not quite sure when I will find the time to implement it. As far as I remember a couple of the third party sources used by it are GPL Licensed and therefor need to be reimplemented

KiboOst · July 16, 2020, 5:41pm

Snips wakeword would be awesome yes.
I saw that ProjectAlice has the dpkg for it, so even if Sonos remove everything it still can be installed and used, with offline generated custom wakewords.

Regarding Porcupine, I did contact them to know what would be a price for more than one custom wakeword and 30days, personnal use. Their answer is just a no-no for this solution

Our sales team has received your inquiry and based on the provided information, has determined that this is not, unfortunately, a good fit for us.
Given our limited resources, we have decided to focus on large enterprise prospects with the significant budgets dedicated to developing innovative voice experiences; Due to the high opportunity cost, we are not able to provide our services towards personal projects, early-stage startups, companies working in the ideation stage, or pure proof-of-concept efforts with no clear path to commercialization in the near term.

synesthesiam · July 16, 2020, 8:39pm

I’ve got a start on a Python version of the Snips Personal Wakeword Detector. I’m calling it Rhasspy Raven (Hermes service here).

I recorded myself saying “okay rhasspy” 3 times, trimmed up the audio, and exported the WAV files as 16-bit 16Khz mono. It seems to work OK with a distance threshold of about 38 for me.

It’s a bit CPU hungry right now, so I’m not 100% confident it will run well on a Pi or Pi Zero. I’m not sure I’ve implemented everything correctly either, so there may be a lot of room for improvement. Obviously, a C++ version of the great @maxbachmann will blow it out of the water in terms of speed

If this works for anyone besides me, I can try to incorporate the template recording into the Rhasspy web UI and bundle it with the Docker image. Maybe we can make this like rhasspy-fuzzywuzzy and swap Raven out with an optimized C++ backend at some point in the future.

maxbachmann · July 16, 2020, 8:45pm

@synesthesiam I guess you should replace dtw-python since it is GPL license https://pypi.org/project/dtw-python/

koan · July 16, 2020, 9:29pm

Impressive!

What’s the most CPU intensive part in the code? Can’t we offload just that part to an optimized library?

fastjack · July 16, 2020, 10:21pm

Nice

The most CPU hungry part is the MFCC features extraction.

You can ease the resource consumption by calculating only the new frame MFCC features and not the whole buffer (10x improvement).

You can further improve by reducing the number of DTW calculations by averaging the keyword templates. Average template 1 and template 2. Then average avg 1-2 with template 3 etc. Do not average all the templates in one go (very bad for accuracy).

Hope this helps.

fastjack · July 16, 2020, 10:36pm

Also the use of the cosine similarity as the DTW distance calculation function with the probability formula detailed in the blog post helps getting a standardized score/threshold across all templates. Usually between 0.45 (more false positive) and 0.55 (more false negative).

Otherwise (using the Euclidean distance) the length of the templates will add too much variation and will require the user to do a trial and error process to determine the correct value to use for his specific keyword.

I think further improvement can be achieved by offloading the features normalization (as well as extraction) and DTW calculations to a lower level library.

koan · July 17, 2020, 7:31am

Ok, I just tried this on my Raspberry Pi 3 Model B Rev 1.2 satellite and it’s excellent! Good job

CPU-hungryness is fine: after the initial startup the Python process doesn’t need any more than a couple of percents of the CPU. The arecord process needs more with a continuous use of 11% of the CPU. I haven’t tried it yet on the Raspberry Pi Zero W, this board will probably need the optimizations suggested by @fastjack.

I just ran the example command with:

$ arecord -r 16000 -f S16_LE -c 1 -t raw | \
    bin/rhasspy-wake-raven --distance-threshold 47 --minimum-matches 2 etc/test/okay-rhasspy-*.wav

Note that I had to increase the distance threshold from 38 to 47 and I learned that I had to pronounce Rhasspy in a specific way (I listened to the samples and heard that @synesthesiam pronounces it with shorter vowels than I did), but after these two changes wake word detection was excellent and I haven’t found any false positives yet after shouting a handful of other wake words to my Pi.

This was just a short test, but it already works better than Porcupine in my setup, so I’m sure if I record my own samples wake word detection with rhasspy-wake-raven will be more than good enough for production use for me.

@synesthesiam Oh, and can you publish rhasspy-silence 0.3.0 to PyPI? I had to temporarily change the line for this library in requirements.txt to git+git://github.com/rhasspy/rhasspy-silence@v0.3.0#egg=rhasspy-silence (if anyone else wants to try this in the mean time) so it could find this version because it’s not on PyPI yet.

fastjack · July 17, 2020, 10:07am

One thing I included in my Node version is that when a keyword is detected the current audio buffer is made available so it can be saved to disk as WAV.

Maybe rhasspy-raven-hermes can send it in a MQTT message like rhasspy/hotword/<siteId>/audioCaptured and save it to disk if a configuration option is set.

The more you use the personal wakeword, the more dataset you generate for an eventual ulterior CNN (like Precise) to be trained for that keyword.

KiboOst · July 17, 2020, 10:51am

Sorry, hurry times here.

Does this would mean that Rhasspy docker will one day:

install snips wakeword detetor or whatever needed
get it as option in rhasspy interface
even offer a way to record/generate custom wakewords (multiple ones) from interface ?

This would be just awesome, and the last brick for me allowing going into production !

I could easily rewrite my snowboy tool for recording. Everything is here : https://github.com/KiboOst/jeedom_docs/tree/master/other/Rhasspy/SnowboyCustomMaker
If you need anything from it feel free to.

fastjack · July 17, 2020, 11:27am

I do not think that Rhasspy will ever run the original Snips hotword detector as it is not open source and is not maintained anymore (if that was what you were saying… not sure ).

What @synesthesiam, @maxbachmann, @koan and I are talking about is recreating a service that will handle the “personal” wakeword detection system that Snowboy and Snips used.

The personal wakeword templates could indeed be directly recorded from Rhasspy GUI (like what you did with SnowboyCustomMaker.

This might lead to some kind of user management (Rhasspy 2.6?)…

KiboOst · July 17, 2020, 11:30am

So we will need to install and set manually the former snips wakeword ?
That’s why I said project alive have it as a dpkg so if it is removed, we can still install it. We need reliable and fully maintainable stuff to avoid such situation like snowboy EOL.

koan · July 17, 2020, 11:34am

Of course you can install the snips-hotword package, but it’s not open source, so Rhasspy will not distribute nor promote it.

Our current priority (as it’s really the missing piece now) is having a completely open source wake word detector that performs well with low resource use. An optimized version of Rhasspy Raven can be that missing piece.

KiboOst · July 17, 2020, 11:36am

Even better yes. Like you say, completely open source wake word detector is the last brick to have, definitely.

koan · July 17, 2020, 11:41am

This “one day” me be sooner than you think. It seems to me that @synesthesiam’s hands are already itching to add the recording functionality in Rhasspy’s web interface:

Knowing his god-like productivity, this can’t take long

koan · July 17, 2020, 11:46am

It looks like there are a couple of MIT licensed alternatives for dtw-python:

synesthesiam · July 17, 2020, 3:08pm

Thank you for the excellent suggestions, @fastjack! Can I use your code on GitHub as a reference for the MIT-licensed raven project?

I think I’m doing this now, if I understand you correctly. For each buffer, I do the sliding window over it and compute MFCC for each (smaller) window. Is that right?

Opps, that’s a good point. If @fastjack allows it, I’ll just port his code to Python for DTW. It’ll be slower for sure, but it will give us a place to start.

Both MFCC and DTW calculations are the slow parts. The template averaging @fastjack mentioned would be a huge boost by itself.

Got it I’ll switch over to use cosine and probabilities.

Once we get Raven working here with properly licensed dependencies, I plan to add it to the Docker image and have the template recording happen in the web UI

So we’d have a fully open source wake word system that can have a custom wakeword recorded from within Rhasspy!

Do you mind if I use some of this code in Rhasspy to do recording/trimming in the web UI?

fastjack · July 17, 2020, 3:42pm

Please do

From what I can understand (not a Python dev…), it seems you extract MFCC features from an entire audio chunk (with an approximate length or the average of the templates). I greatly improved the CPU consuption by only extracting the features from the new window in the audio buffer. The only restriction for this technique to work is that the window size must be a multiple of the shift size (30/10 for example). It’s a bit of a hack but it reduce so much the MFCC calculations that I think it’s worth it.