Unexpected intent recognition

adrianofoschi · December 24, 2020, 10:47am

Hi, I’m using latest rhasspy release with:

Italian language and Kaldi
fsticuffs for intent recognition
snowboy jarvis (0.5,0.5) for wake up
master/satellite setup

The Kaldi STT is really better than phocketsphninx but I have issues with intent recognition that make it unusable.
It seems that if it doens’t recognize a valid intent it chooses one randomly…

The major issues:

sometimes if I tell “turn on (light){light}” it recognized “start cleaning” or some other intent randomly
I have an intent to change tv channel and sometimes happens that it changes the tv channel but I never pronunced “jarvis” or “change tv channel to (channel){channel}”.

I’ve tried to increase jarvis sensibility to avoid false positive wakeups.
But I don’t know ho to resolve the random intent recognition issue.
Probably something is wrong in with my slots or sentences?

Example log. I tell only “jarvis” and it recognized an intent…

[DEBUG:2020-12-24 12:22:34,881] rhasspyserver_hermes: <- NluIntent(input=‘metti tv_salone 1’, intent=Intent(intent_name=‘TvChangeChannel’, confidence_score=1.0), site_id=‘satellite1’, id=None, slots=[Slot(entity=‘tv’, value={‘kind’: ‘Unknown’, ‘value’: ‘tv_salone’}, slot_name=‘tv’, raw_value=‘tivvu’, confidence=1.0, range=SlotRange(start=6, end=15, raw_start=6, raw_end=11)), Slot(entity=‘rhasspy/number’, value={‘kind’: ‘Number’, ‘value’: 1}, slot_name=‘channel’, raw_value=‘uno’, confidence=1.0, range=SlotRange(start=16, end=17, raw_start=12, raw_end=15))], session_id=‘satellite1-jarvis-49959f96-54f9-455d-8ebe-ba9ddc58e965’, custom_data=None, asr_tokens=[[AsrToken(value=‘metti’, confidence=1.0, range_start=0, range_end=5, time=None), AsrToken(value=‘tv_salone’, confidence=1.0, range_start=6, range_end=15, time=None), AsrToken(value=‘1’, confidence=1.0, range_start=16, range_end=17, time=None)]], asr_confidence=None, raw_input=‘metti tivvu uno’, wakeword_id=‘jarvis’, lang=None)
[WARNING:2020-12-24 12:22:30,233] rhasspyserver_hermes: Dialogue management is disabled. ASR will NOT be automatically enabled.
[DEBUG:2020-12-24 12:22:30,232] rhasspyserver_hermes: <- HotwordDetected(model_id=‘jarvis’, model_version=’’, model_type=‘personal’, current_sensitivity=0.5, site_id=‘satellite1’, session_id=None, send_audio_captured=None, lang=None)

adrianofoschi · December 30, 2020, 9:06pm

Does no one have the same issue?

fastjack · December 30, 2020, 10:20pm

This is one of the drawbacks of a limited language model. The STT service can only recognize words it knows and will select the best matching ones (even against noise) and the NLU will do what it can with the result.

The « none » intent is a complex matter. I made tests with Rhasspy and Snips and both struggled with this issue.

Using STT word confidence may help by substituting low confidence words with a word to lower the NLU intent confidence.

adrianofoschi · January 2, 2021, 11:56am

Is there a way to change the “confidence” in the settings?

romkabouter · January 21, 2021, 9:27am

Actually yes, I have tried it with just silence and Rhasspy was recognizing random intents.

I have set the intent to Fsticuffs, so Rhasspy should only recognize intents from the sentences.

J_J · March 14, 2021, 9:35am

For me it still recognize random intents even with fsticuffs. Any clue on that?

kralizec · March 15, 2021, 11:13am

It’s also happening to me using Spanish, and it makes it almost unusable.

While the wake word detection is great, there are still false positives, and when it happens it result is a random intent. What is worse, for some reason it seems to like switching off my TV

In my case it seems it’s kaldi who is transcribing the full sentence, so fsticuffs can’t do anything about it. If I use open transcription, then it’s too slow in my setup to be usable, and I also get many false negatives.

If I switch to pocketsphynx for STT, then it works as expected… although overall with lower accuracy, of course.

ksrimmy · June 29, 2021, 7:43am

If a limited language model leads to random intent detection than it is unusable in most scenarios.

Let’s say that computing power is not an issue. I can use satellites for the wake word detection and a server for further calculations.
So what currently supported configuration will come up with to the best results and a very low false positive rate?

Thanks for your work and your help!!!

synesthesiam · July 1, 2021, 6:17pm

I may have found an unconventional “solution” for this!

The great thing about the Kaldi “text FST” models are they will only ever return valid sentences, but it’s bad when it turns silence or gibberish into something valid too. Confidence doesn’t help much, unfortunately.

So I’ve created alternative paths in the grammar to “catch” unknown words, which bubble up as <unk> in the transcription. These low-probability paths contain a small collection of frequently-used words from the model’s language that are not used in sentences.ini. The idea is that frequent words will contain a good mix of phonemes, so misspoken words will be closer to at least one of them.

I’ll add this feature to the next release of Rhasspy, but require an option to enable it. If it works for enough people, I can make it the default

kusi · July 2, 2021, 6:39am

What’s the reason why confidence wouldn’t help?

synesthesiam · July 2, 2021, 2:28pm

Confidence in Kaldi seems to be measured as a distance between the “best” and “next best” sentences. When the possible words and sentences are restricted, there aren’t many ways for these two sentences to be that different, so you get high confidences most of the time.

With my approach here, I introduce alternative (low probability) word choices at each step, allowing Kaldi to have somewhere to go with misspoken words. This unfortunately increases training time, so it may require some tweaking to get right.