Snowboy-CustomMaker

Ok got it, automatic trim !

Will do more test and update the script once done

Ok, new version (tool and doc) is online !

So, no more to manually cut sample files in audacity, snowbiyRecord will do it automatically for you :hugs:

:beers:

Hey @KiboOst,

thanks for the great tool - creating my own wakewords worked great!
Now I have recorded a wakeword for me and my wife and stored it in Rhasspy. This is both times the same wakeword, just with different voices. This works great with the following settings:

        "snowboy": {
        "model": "snowboy/HeyPico_JCH.pmdl,snowboy/HeyPico_EH.pmdl",
        "sensitivity": "0.30"
    },

The log shows me which wakeword was last recognized - both are recognized.
But I have a strange effect: After the hotword was recognized (no matter if Rhasspy hears speech or not) the recorded sound follows. Sometimes this first recorded sound is immediately followed by a second hotword activation without me having pronounced the hotword:

	[INFO:1795889] quart.serving: 127.0.0.1:33296 GET / 1.1 200 1029 6485
[DEBUG:1780638] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/error.wav']
[DEBUG:1780186] SnowboyWakeListener: loaded -> listening
[DEBUG:1780186] DialogueManager: ready -> asleep
[INFO:1780186] DialogueManager: Automatically listening for wake word
[DEBUG:1780185] DialogueManager: handling -> ready
[DEBUG:1780184] WebSocketObserver: {"text": "", "intent": {"name": "", "confidence": 0}, "entities": [], "raw_text": "", "speech_confidence": 1, "wakeId": "snowboy/HeyPico_JCH.pmdl", "siteId": "master", "slots": {}}
[WARNING:1780184] HomeAssistantIntentHandler: Empty intent. Not sending to Home Assistant
[DEBUG:1780183] DialogueManager: recognizing -> handling
[DEBUG:1780183] DialogueManager: {'text': '', 'intent': {'name': '', 'confidence': 0}, 'entities': [], 'raw_text': '', 'speech_confidence': 1, 'wakeId': 'snowboy/HeyPico_JCH.pmdl', 'siteId': 'master'}
[ERROR:1780180] FsticuffsRecognizer: in_loaded
Traceback (most recent call last):
  File "/usr/share/rhasspy/rhasspy/intent.py", line 183, in in_loaded
	assert recognitions, "No intent recognized"
AssertionError: No intent recognized
[DEBUG:1780176] DialogueManager: decoding -> recognizing
[DEBUG:1780175] DialogueManager:  (confidence=1)
[DEBUG:1780173] KaldiDecoder: 
[DEBUG:1779519] KaldiDecoder: ['bash', '/profiles/de/kaldi/model/decode.sh', '/opt/kaldi', '/profiles/de/kaldi/model', '/profiles/de/kaldi/model/graph', '/tmp/tmp86mb8vue.wav']
[DEBUG:1779441] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/recorded.wav']
[DEBUG:1779440] DialogueManager: awake -> decoding
[DEBUG:1779438] WebrtcvadCommandListener: listening -> loaded
[WARNING:1779436] WebrtcvadCommandListener: Timeout
[DEBUG:1773929] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/wake.wav']
[DEBUG:1773518] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/error.wav']
[DEBUG:1773431] SnowboyWakeListener: listening -> loaded
[DEBUG:1773430] WebrtcvadCommandListener: loaded -> listening
[DEBUG:1773429] WebrtcvadCommandListener: Will timeout in 6 second(s)
[DEBUG:1773428] DialogueManager: asleep -> awake
[DEBUG:1773428] DialogueManager: Awake!
[DEBUG:1773426] SnowboyWakeListener: Hotword(s) detected: ['snowboy/HeyPico_JCH.pmdl']
[DEBUG:1773305] SnowboyWakeListener: loaded -> listening
[DEBUG:1773304] DialogueManager: ready -> asleep
[INFO:1773301] DialogueManager: Automatically listening for wake word
[DEBUG:1773300] DialogueManager: handling -> ready
[DEBUG:1773298] WebSocketObserver: {"text": "", "intent": {"name": "", "confidence": 0}, "entities": [], "raw_text": "", "speech_confidence": 1, "wakeId": "snowboy/HeyPico_EH.pmdl", "siteId": "master", "slots": {}}
[WARNING:1773297] HomeAssistantIntentHandler: Empty intent. Not sending to Home Assistant
[DEBUG:1773296] DialogueManager: recognizing -> handling
[DEBUG:1773295] DialogueManager: {'text': '', 'intent': {'name': '', 'confidence': 0}, 'entities': [], 'raw_text': '', 'speech_confidence': 1, 'wakeId': 'snowboy/HeyPico_EH.pmdl', 'siteId': 'master'}
[ERROR:1773291] FsticuffsRecognizer: in_loaded
Traceback (most recent call last):
  File "/usr/share/rhasspy/rhasspy/intent.py", line 183, in in_loaded
	assert recognitions, "No intent recognized"
AssertionError: No intent recognized
[DEBUG:1773288] DialogueManager: decoding -> recognizing
[DEBUG:1773287] DialogueManager:  (confidence=1)
[DEBUG:1773286] KaldiDecoder: 
[DEBUG:1772385] KaldiDecoder: ['bash', '/profiles/de/kaldi/model/decode.sh', '/opt/kaldi', '/profiles/de/kaldi/model', '/profiles/de/kaldi/model/graph', '/tmp/tmpffdsqwf0.wav']
[DEBUG:1772342] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/recorded.wav']
[DEBUG:1772339] DialogueManager: awake -> decoding
[DEBUG:1772337] WebrtcvadCommandListener: listening -> loaded
[DEBUG:1772337] WebrtcvadCommandListener: Voice command finished
[DEBUG:1770416] WebrtcvadCommandListener: Voice command started
[DEBUG:1769342] APlayAudioPlayer: ['aplay', '-q', '/profiles/de/wav/wake.wav']
[DEBUG:1769341] SnowboyWakeListener: listening -> loaded
[DEBUG:1769340] WebrtcvadCommandListener: loaded -> listening
[DEBUG:1769339] WebrtcvadCommandListener: Will timeout in 6 second(s)
[DEBUG:1769337] DialogueManager: asleep -> awake
[DEBUG:1769337] DialogueManager: Awake!
[DEBUG:1769332] SnowboyWakeListener: Hotword(s) detected: ['snowboy/HeyPico_EH.pmdl']
[INFO:1765458] quart.serving: 127.0.0.1:33202 GET / 1.1 200 1029 4888

I have only observed this behaviour when two hotwords are set. As soon as only one of the two hotwords is in use, no double activation follows.
Does Rhasspy handle two consecutive activations of the same hotword (only spoken by different people)?

You may put two sensitivity comma separated also.
Anyway this is another subject :wink:

I understood the documentation on wakewords to mean that a wakeword model (a *mdl file) can contain several hotwords:

“If your hotword model has multiple embedded hotwords (such as jarvis.umdl), the “sensitivity” parameter should contain sensitivities for each embedded hotword separated by commas”.

The commas didn’t work for me either. Unfortunately, the self-recorded hotword with your tool did not solve my problems with the hotword sensitivity, although I took everything into account: recording with the same microphone in a quiet room. Either I have a lot of false positives or the sensitivity is so low that it rarely triggers. For now, I go back to porcupine with a heavy heart.

Is there a possibility to extend your tool to record much more examples? If it works then I would sit down for an hour and recoed the hotword…

At least everything after the hotword works excellent now thanks to kaldi!

I wonder if you need a “model settings” section for each model, or at least for one wake word.

I also wonder if you tested/refined the sensitivity of your models on the snowboy site (or with this tool which I am too new to Pi and git to understand how to use :crazy_face: ). Porcupine gave me more false positives than snowboy.

.3 is pretty low sensitivity, meaning (as I undersand it) you will get a lot of false positives.

Finally, the two wake word phrases that you’re using appear to be the same phrase. Although I trained my model on snowboy, various people can say the phrase and trigger Rhasspy. You’re using the multiple wake word option in a way that it wasn’t designed for (i.e., same phrase with different speaker rather than different phrases entirely). What you’re trying to approximate is a universal model rather than a personal model but this is not the way to do it, it seems.

Until you get enough people to train a universal model, I’d have one of you switch to a different wakeword phrase and increase the sensitivity to decrease false positives. Good luck!

Dear @OC2019OC
thanks for your ideas.

I don’t quite understand what you mean by “model settings” and section.

I’m not 100% sure, but in my understanding the sensitivity on https://snowboy.kitt.ai/ is only for testing and not to adjust the model.
With the tool from @KiboOst you cannot change the sensitivity when creating the hotword, since this is not an option in training (as on the website):
python3 snowboyTrain.py --token ENTERYOURTOKEN --lang en --gender M --age 30 --wakeword NAMEOFTHEMODEL

Yes, 0.3 is quite low - a higher value leads to higher sensitivity and thus there is so much false positive detection that movies / series can no longer be consumed without a false positive reaction every 20-30 minutes.

I think you are right about the use of the hotword: A hotword should ideally be trained with voices from many different people. Since snowboy only allows 3 files for one account, I have logged on to the website with different accounts and trained several times. But also the approach is not satisfying.
I will now try the public models first - this forces me to use another hotword with a hopefully better recognition rate.

No…

One hotword = three samples from one person.

We are three in the house, we have one custom wakeword (pmdl file) per family member.

Which also allow to filter what intent does knowing who is asking.

But never mix different people for recording one wakeword :scream: For such wakeword, this is called universal wakeword and you would need around 2000 different persons to get it working.

Dear @KiboOst,

thanks for the good explanation. It’s completely clear to me now.

The distinction of the family members according to wake word sounds clever - let’s see if I implement this.

Unfortunately I will probably not be able to get 2000 people to say “hey pico” three times. :sweat_smile:
I have the impression that the pitch of the voice influences the recognition rate. Depending on whether you go up or down with the voice. Maybe I am just imagining this.

You’re not imagining this, it surely influences the recognition, at least that’s my experience. For instance, if by coincidence all your training samples have their pitch going up, the recognition rate will be worse when you utter the wake word with the pitch going down.

Now here’s what I’ve been thinking:
If the height of the tone influences the recognition rate, would training the same wake word with different voice melodies improve the robustness of the wake word? So a lite variant of a universal model, not with many different people but only with voice data from me with different types of voice melody.

First of all I thought of a full-factorial matrix of meaningful combinations, where the absolute height (abs) of the voice is high (hig), medium (mid) or low (low) and the syllables of “hey pico” then take a melodic course or are monotonous (e.g. mid mid mid mid mid).

abs  hey  pi   co

low  low  low  hig
mid  low  low  hig
hig  low  low  hig

hig  low  hig  mide
mid  low  hig  mide
low  low  hig  mide

hig  hig  hig  hig
mid  mid  mid  mid
low  low  low  low

My idea is to record each melody three or six times and then feed the different melodies into the model via several accounts. It will be a bit annoying to generate multiple accounts - but if it works it is worth the work.

Forget having different voices on same custom wakeword.

A custom wakeword is a totally different beast as universal wakeword ! They even don’t have same file format.

Thanks for the program. Works great. BUT:
Please change German “de” -> “dt”
Snowboy has its own way.

Arg sorry !! Will change that :joy:

EDIT: Done :wink:

Works Great! Much better than porcupine. Your instructions were very easy to understand and setup went very quickly. Thanks.
The only thing i dont understand is that everything works fine but the advanced tab keeps showing porcupine. when i deleted porcupine, it disabled the wakeword. Then when i reselected snowboy, everything worked again, and porcupine was still in my profile? It doesnt seem to matter but it may be a bug in rhasspy.

Hello @rickmini ,

i have observed this before, between “Settings” and “Advanced” sometimes old settings are remembered and changes of the other tab are not noticed after restart of rhasspy. After deleting the browser data or visiting the website in incognito mode, the data is consistent again.
@synesthesiam Is there anything left in the browser cache? I once read that you don’t like web development that much. Maybe there is someone else who can investigate that? My abilities are limited to trial and error and thinking about that.

Thanks. I see now, after restarting the next day, that the Advanced tab shows the correct info.

Your are welcome. If you are not sure about the advanced tab, you also could open the file to see it’s actual content and not a potntially old state cached by the browser. I like to edit the file and restart rhasspy due to the browser chaching effect.

Hi @KiboOst,

Thanks for your scripts. Very useful :+1:

I used them to generate my own wakeword. So everything looks good for this part. Wav files are OK (I have followed your tips and recording were done in a non noisy environment) and pdml file imported into rhasspy.

But with my custon wakeword, Rhasspy is wake up as soon as I talk or when there is some noise in the room. Even my wife’s voice is also able to wake it up. I tried to change the sensitivity but it did not help.

Any idea ?

And I missed one point in your tips :

So, it's better to record samples in a room where you don't plan to use this wakeword

So i will try using arecord instead of pyaudio and will do my recording in an other room than the one where I will use it.