What do you use for custom wakeword

KiboOst · January 26, 2020, 9:01am

Hi,

Following this I would like to know what you guys use for your custom wakeword with rhasspy.

I have done my custom wakeword two times with snowboy. Whatever I try for audio_gain (less or greater than 1) and sensitivity, I can’t it to work reliable. If I don’t want many false positive, I can’t wake anymore when needed. Actually I’m using audio_gain 0.6 and sensitivity 0.38, yesterday was looking a movie in the same room and every 5 mins got a false positive.

Comparing with snips it’s just horrible. What is even worse, is that with snips, when there is a false positive (it happens every few days (not few minutes !!)), snips doesn’t recognize an intent. Here, rhasspy always recognize an intent, even with high confidence. So every few minutes I had the weather for tomorrow … Totally unusable like this, can’t even test to put it in a living room with family talking around.

I use a rpi4 buster lite, respeaker 2mic pihat (same hardware with snips). Did recorded the waves files on the pi, uploaded them on snowboy website and generated the pmdl file with default settings. (tried also to record from a surface pro directly on snowboy site).

I absolutely need three custom wakewords (one per family member, as I filter some intent on who is asking).
Porcupine seems a better solution, but seems we have to re generate wakeword files every 30 days even for private personnal use. So not really a solution.

snips wakeword works really great, was easy to generate samples for each wakeword and generate the wakewords, process was entirely offline, so I don’t know if rhasspy could use it.

Anyway, I would really listen what other users use for custom wakeword, how you did generate them, and how it works. And why it works …

Actually, rhasspy is an amazing solution, all works really nice with easy setup, and with coming full hermes and base/sat setup it will be a beast. But without a reliable wakeword detection, it’s just unusable.

With snips, I had to edit each sample in audacity to cut leading and trailing noise, made a big difference. Didn’t do that for snowboy, I will retry to record new samples and make them as clean as possible before generating the wakeword. But every experience will be wellcome

KiboOst · January 26, 2020, 9:20am

Indeed, something that could explain such problems, will definitely try different things.

Here are three sample wave files:

first one : one of sample that served for snowboy. Lot of leading and trailling noise
second one : the same, cut on the actual wakeword sound
third one : one of the cut sample that generated my actual production snips wakeword. We can see volume is a lot lower. Dunno if it play, but audio_gain should take care of this.

Putting all this for common experience / feedback and make things better

Indeed, such leading / trailing noise was a problem also for snips wakeword, and I did manually cut each samples in audacity (free and portable) before generating the wakeword, which made a bug difference in reliability.

Sikk · January 26, 2020, 10:50am

I’m using snowboy, and also getting false positives quite often - Yesterday some tv show changed my backlight color

Thanks for the idea with audacity, will also try this to see if i’m getting better results.

The bigger problem i have with snowboy, it seems it sometimes isn’t listening at all.
Someone posted an Issue on Github about it.
I’m a PS3 Cam as mic.

OC2019OC · January 26, 2020, 2:59pm

I use snowboy, tested the accuracy/sensitivity on the site before downloading the model until it worked very well, and it’s two syllables. I get false positives but never from the TV, and not every day. It’s always something that sounds very much like my wake word phrase, an understandable mistake.

With porcupine, I used the default and was getting false positives every day on “Pokemon.” Never got any response for PocketSphinx wake word.

KiboOst · January 26, 2020, 3:13pm

How do you do that ? never changed the sensitivity cursor on the snowboy site, don’t see how to judge which position is better. Do you decrease it until it never detect ?

OC2019OC · January 26, 2020, 3:32pm

I did it a few times with the slider under “test the model.” If I remember correctly, the further right the slider the more sensitive and more specific the wake word model. So to the left and decreased sensitivity would mean it recognizes more loosely and you’ll have more false positives.

KiboOst · January 26, 2020, 3:33pm

Thanks, will test that also ! Not sure anyway it is just the same as changing sensitivity in rhasspy profile. Dunno if it is for test only or if it is embedded somewhere in the wakeword file

kookic · January 26, 2020, 6:15pm

Have you tried using “Snowboy” outside of rhasspy, to see?
I don’t have a false positive … and if there was one I tell him an insult
[Insult]
ta gueule {action: excuse}

… and tts ‘sorry patati patata’

KiboOst · January 26, 2020, 6:34pm

Lol will create such intent just for fun

Could you share some details on your hardware and how you did generate your custom wake word ?

kookic · January 27, 2020, 7:10am

I used Snowboy before snips and rhasspy!
Here is a piece of code with several wakewords:

import sys,os , time
import signal
import snowboydecoder

models = []
path_voix = os.path.dirname(os.path.abspath(__file__))+"/resources/"
interrupted = False

def signal_handler(signal, frame):
    global interrupted
    interrupted = True

def interrupt_callback():
    global interrupted
    return interrupted
    
myordres = [ "heypoppy" , "snowboy" , "ecoute_moi" ]
nordre = len ( myordres )

for i in range(0 , nordre ):
	models.append(path_voix+myordres[i] + '.pmdl' )
	
snowboydecoder.play_audio_file(snowboydecoder.DETECT_DING)

def okrhasspy():
	snowboydecoder.play_audio_file( snowboydecoder.DETECT_DING )
	print ( 'okay rhasspy' )
	
	
nbmodel = len(models)
callbacks = [okrhasspy]

sensitivity = [0.45]*len(models)
detector = snowboydecoder.HotwordDetector(models, sensitivity = sensitivity, audio_gain = 1)

# main loop

detector.start( detected_callback = callbacks,
			interrupt_check = interrupt_callback ,
 			sleep_time = 0.3 )

detector.terminate()

sys.exit(-1)

For training i use this sh online, but
it’s been a long time, it should still work (change your token and lang) and create dir resources/

#!/bin/bash

stt_sb_train () {
    # Usage: tts_sb_train "Hey Poppy" [true]
    # $1: the string to train
   
    
    # check token is in config
    snowboy_token="YOUR TOKEN"
    #name="ferme_ma_chambre"
    #!/bin/bash
	name=$(whiptail --title "Input" --inputbox "Quelle commande ?" 10 60 heypoppy_ 3>&1 1>&2 2>&3)
 
exitstatus=$?
if [ $exitstatus = 0 ]; then
    echo "Okay, la commande est :" $Name
else
    echo "Tu as annuler... :-("
fi
    # record 3 audio samples of the hotword
  
echo "$name     Appuyer sur Entrée pour continuer..."
rec -r 16000 -c 1 -b 32 -e signed-integer tmp/1.wav
read a
echo "$name     Appuyer sur Entrée pour continuer..."
rec -r 16000 -c 1 -b 32 -e signed-integer tmp/2.wav
read a
echo "$name     Appuyer sur Entrée pour continuer..."
rec -r 16000 -c 1 -b 32 -e signed-integer tmp/3.wav
read a
play tmp/3.wav
	if (whiptail --title "Enregistrement" --yesno "Cliquer sur Ok pour continuer." 8 78) then
    
    echo "Ctr+C pour arreter"
	else
    echo "User selected No, exit status was $?."
	fi

    # get microphone information
   
    local microphone="Default"
    # build json data parameter
    local WAV1=$(base64 tmp/1.wav)
    local WAV2=$(base64 tmp/2.wav)
    local WAV3=$(base64 tmp/3.wav)
    # language forced  of https://github.com/Kitt-AI/snowboy/issues/75
    cat <<EOF >tmp/data.json
{
    "name": "$name",
    "language": "fr",
    "microphone": "$microphone",
    "token": "$snowboy_token",
    "voice_samples": [
        {"wave": "$WAV1"},
        {"wave": "$WAV2"},
        {"wave": "$WAV3"}
    ]
}
EOF
    
    # call kitt.ai endpoint with recorded samples to get model
    echo "Training model  $name"
    response_code=$(curl "https://snowboy.kitt.ai/api/v1/train/" \
        --progress-bar \
        --header "Content-Type: application/json" \
        --data   @tmp/data.json \
        --write-out "%{http_code}" \
        --output tmp/model.pmdl)
    #local response_code=$? # sometimes 0 although it failed with 400 http code
    
    # check if there was an error
    if [ "${response_code:0:1}" != "2" ]; then
        cat tmp/model.pmdl
        echo # carriage return
        echo "ERROR: error occured while training the model"
        exit 1
    fi
    
    # save model
    echo "déplace le modele: $name"
    mv tmp/model.pmdl "resources/$name.pmdl"
    echo "sauve ta voix $name.wav"
    mv tmp/1.wav "resources/$name.wav"
    echo "efface tout.."
    rm tmp/*.wav
    rm tmp/data.json
    echo "Completed"

}

stt_sb_train

KiboOst · January 27, 2020, 10:14am

Thanks, I can now generate pmdl file with python, no more require connection to website and a mic to validate/test model.

Also, I’m adapting this script from snips custom record model:

Actually it works, but keep saying me there is too much noise in the room. Interesting, will try in a very quiet room, then manually cut files with audacity, generate model, and let you know

I’ve also generated snowboy models from my snips custom wakewords files, would be interesting to test, as there are exact same wav files that generated my snips custom wakewords I use everydays with family (each one his wakewords). But I have to shutdown snips for that so both don’t wake at same time lol !

Waiting for base/sat config with pi 0, this is something sensitive and which need to be reliable for the entire system to be production ready. Will keep trying and document this, hopefully can find some good way to achieve something reliable !

KiboOst · January 27, 2020, 1:16pm

Ok, got something a lot better ! Could raise sensitivity, detection works nice even at 5 meters. Could listen music during 30mins without a single false positive
Will keep an eye on it of course !

What I did:

Rewrite the snips record tools
Moved the pi into the most silent room in the house and recorded samples with snips rhasspied tools. Sometimes it said “too much noise” and refused to record it, but managed to got my three samples. Even I could hear nothing
Cut each sample in audacity
Finnally generated pmdl file with python tool (snowboy api)

When we got base/sat and full hermes topics working, I will aggregate all this and write doc/tools for such things (rhasspy_logger, rhasspy_batchTester, snowboy_recorder, hermesledcontrol, jeedom python intent handling, etc etc).

Will let you know how it goes during coming days.

H3adcra5h · January 29, 2020, 5:49pm

Has anyone ever tried Mycroft Precise? It’s completely open source and they claim to be very precise. I am not so familiar with all this stuff, I have already failed at the installation.

KiboOst · January 29, 2020, 6:28pm

I didn’t had a single false positive since I redo my custom wakewords.
Which is promising as now there is also satellite with same wakeword in the same room So that would make double false positive

Will keep snowboy for now. Just have to be very rigorous when doing custom wakeword

kookic · January 30, 2020, 7:14am

Look at this project, it’s recognition of the speakers.
By changing his voice, for the same sentence … it should work ???

Give us news if it work.

jdinsd · January 31, 2020, 4:13am

Can confirm drastically improved performance by following (almost) all of your steps. I did not use the snips record tool but did

Record three wav files on my device (as opposed to on my laptop)
Cut samples with audacity
Used the Snowboy API http://docs.kitt.ai/snowboy/#restful-api

Zero false positives. Before I was getting 1 every few minutes. Thanks for the tips.

KiboOst · January 31, 2020, 11:40am

Have find some time to share some tips and tools for custom snowboy wakeword!

https://community.rhasspy.org/t/snowboy-custommaker/