Raven with wav files

KiboOst · November 27, 2020, 9:03pm

Anyone could enlight me on what is raven vad_sensitivity parameter ?

With snowboy or snips wakeword, on exact same hardward and environment, I can tell wakeword normally at five meter. With raven I have to speak loud to few centimeters.

Have check mic levels etc can’t get it to work nice. Didn’t find anything in the doc for vad_sensitivity

PS : strange, in doc there is this regarding silence detection:
vad_mode is the sensitivity of speech detection (3 is the least sensitive)

I’ve tried VAD Sensitivity to 2 in raven settings, and now I can talk lo and far from the mic !!! I previously tested 0.5 and couldn’t trigger the wakeword at all. So I guess the higher vad, the MORE sensitive right ??

rolyan_trauts · November 28, 2020, 10:26am

If its the standard webrtcvad api then the vad sensitivity is how aggressive the filter is to voice (0-3)

Optionally, set its aggressiveness mode, which is an integer between 0 and 3. 0 is the least aggressive about filtering out non-speech, 3 is the most aggressive. (You can also set the mode when you create the VAD, e.g. vad = webrtcvad.Vad(3)):

vad.set_mode(1)

So higher aggressiveness means its more likely to detect silence so maybe it was just running and not stopping?
Maybe its not implemented the same?

Many KWS have there own audio processing & MFCC routines and some do normalisation & pre-emphasis, some don’t prob because of load.
If it hasn’t just use the alsa speex plugin for AGC

Will have to ask @fastjack if normalisation / pre-emphasis happens with raven?

fastjack · November 28, 2020, 10:37am

From memory, no normalization of the audio signal. Only pre-amphasis.

The extracted features should be normalized inside the detection window though.

KiboOst · November 28, 2020, 4:59pm

Not sure if it is just in log but:

HotwordDetected(model_id=’/profiles/fr/raven/bibi/example-0.wav’, model_version=’’, model_type=‘personal’, current_sensitivity=0.41, site_id='studio

the raven probability_threshold is 0.41 but the raven/keywords/bibi one is actually 0.43 …

I’m trying to get good sensitivity for triggering with less false positive, specially when listening music.

Actually, when there is music and wakeword is triggered by music, it seems dialoguesession never end and it doesn’t come back to iddle state. Will investigate with mqtt explorer

fastjack · November 28, 2020, 5:10pm

Regarding false positives, avoid using short keywords. Ideally more than 3 syllabus with different sounds should be prefered.

KiboOst · November 28, 2020, 5:14pm

yes this is the case, and no problem with snips with same keywords for years