Is there a way to setup a 'default' or 'fallback' intent?

Hello!

I am using intent matching with fuzzywuzzy and I am loving it!

However, if the user says words which are not part of any of the intents in the sentences.ini file, the system forces a match with one of the existing intents and returns it (with lower confidence than 1, but not always). The result is that random words trigger an intent.

Is there a way to set a ‘default’ intent which is returned when no other intent is matched?

1 Like

Hi @fenyce_m
It’s a dirty trick, but here is my solution

Thanks @frkos!

Definitely a bit dirty but good to know it works for you as it would have taken quite a bit of time to test it myself!

Do you find the intent matching is slower with so many words in the [DUMMY] intent? Which recogniser and handler are you using? I am using Kaldi and fuzzywuzzy. However, I have also found that Fsticuffs is less agreesive in that respect and doesn’t force the intent as fuzzywuzzy does.

I’m using kaldi and fsticuffs.
I see no performance degradation. It’s super fast :upside_down_face:

Fuzzywuzzy goes out of its way to force an intent. It’s very fuzzy :slight_smile:

There are some things that we could do, though, to improve things. The dialogue manager in Rhasspy has support for enabling/disabling intents during different turns in a conversation (session). If you have yes/no response intents, they probably aren’t meant to start a session. We should have some easy way to specify which intents are “on” by default. Others are enabled explicitly with continueSession messages.

Another idea is a modification to rhasspy-fuzzywuzzy that lets you set confidence thresholds per intent. For yes/no intents then, you could set the threshold to 1 to require an exact match. Everything else will be subject to the global threshold you configure in the web UI.

I will say too that fsticuffs is usually the better choice if you’re just speaking to Rhasspy. It tends to only accept sentences that can be spoken (according to sentences.ini). Fuzzywuzzy works better if you also want to accept text chat, where you might have typos or misspellings.

1 Like

Handling the “none” intent (when speaking an utterance that is not event close towhat was defined) is rather complicated.

I’ve experimented with noise generation for the ASR/NLU part but that did not work out the way I expected (ASR accuracy went down and NLU results started acting funny).

I’ve also noticed that Snips struggled with the recognition of the “none” intent (the NLU tries very hard to match an existing intent).

@synesthesiam Having a enabled/disabled by default property for each intent would be cool! :+1:

1 Like

Yes, this is very much needed. Sometimes you want to use very generic terms like Yes, No, Stop in a “skill” and they should be off by default.

It’s technically possible to have the enable/disable per intent right now, but it might not behave how people expect.

Training Rhasspy creates both a speech and intent model. It’s easy to toggle parts of the intent model, but the speech model has to be re-created for each combination of intents (Kaldi is an exception).

So while you could disable the yes/no intent, Rhasspy will still pick up words/sentences from that intent. I’ve experimented a little with having the ASR system pick up the continueSession messages and re-generate speech models for each new combination of intents. This could be slow on a Pi if the user has a lot of intents, so I haven’t pushed it yet.

Any thoughts?

How? I know that I can provide a list of Intents to filter for in the coninuteSession but they have to exist (be enabled) in Rhasspy.

Regarding the speech model always containing all intents even disabled one: Is that really such a big issue? I find Rhasspy even usable in open transcription mode.

Think about this way: If continueSession would be used to filter for YES and NO as possible Intents and you would generate a new speech model containing only YES and NO the user could say someting else altogether and it matches either YES or NOW “at random”. While in practice you would like to prompt the user with “Please answer with yes or no” if they answer with something else. So having a speech model with more stuff it it would be better in this case.

I think the enabled/disabled state (and the intent ecosystem) is only for the NLU part.

The NLU system should score all the intents, filter on the intentFilter list (or the enabled ones if not provided) and return the one with the best score. Below a threshold you should get intentNotRecoginzed. I have no idea how to handle the « none » intent in the current setup though.

The NLU service should be the one responsible for the enabling/disabling of the intents via MQTT topics so it can be changed dynamically.

I do not think it is easy (nor advisable) to manipulate the language graph at runtime (though it has been done with Kaldi).

My 2c :blush:

The hermes/dialogueManager/configure message will set the default intent filter for new sessions in the dialogue manager. After that, you can override it within a session using the intentFilter property of startSession or continueSession.

I don’t think it’s a big issue for most people (and it seems @fastjack agrees). Maybe the simplest way forward is to add a part to the web UI where you can specify which intents are enabled up front and just let the NLU system take care of it from there?

I think we should also be able to enable/disable specific intents by default in the sentences.ini file. This way app developers can specify this in the sentences.ini file they distribute so the user doesn’t have to configure this in the UI.

How do you think that should be specified?

As it’s an ini file, we’re bound by the syntax ConfigParser understands. My first thought was to (mis)use comments for this purpose, but the ConfigParser._read() method ignores comments, so unless we want to override this method, this doesn’t work.

I’m not sure about another way to specify this, maybe a special value that won’t be interpreted as a sentence?

I’d thought about adding metadata to an intent with an @ syntax, something like:

[MyIntent]
@foo
a sentence for the intent @bar

Then we would have @foo assigned as metadata to MyIntent and @bar as metadata for the first (and only) sentence.

We could then have some known metadata tags, like @disabled, etc.

4 Likes

This looks nice, yes.

1 Like

Is there an update on implementation of this feature?

Not yet, sorry. If you’re using Fsticuffs, then you’ll at least get an “intent not recognized” message that you can handle in NodeRED or something.

I just update to 2.58, and i am struggling to get an Intent recognizer to work reliably. I am running on a Raspi 4 - 4meg. with a full upgrade to the lastest Raspi as of 1/15.
The system is fast and very responsive, but if i cough, mumble, say jabberwocky , i always get a recognized intent.
For example if i say jabberwocky i get an intent of roboServo, if i cough i get a response of roboQuery:

[roboServo]
srv_num = ( 0…32 ){num}
srv_degs = ( 0…180 ){degs}
servo <srv_num> [to] <srv_degs>

[roboQuery]
query_cmd = ( time | date | name | address | distance | information ){cmd}
tell me [what] [is][the] <query_cmd>
tell me [who] [is] [the] <query_cmd>
tell me [the] <query_cmd>
tell me [your] <query_cmd>
tell me [when] [is][the] <query_cmd>

earlier in this post you mentioned that fuzzywuzzy was “real fuzzy” and fisticuffs will only respond to my sentences. In this example i have fisticuffs selected and the []fuzzy check box unchecked. (i have tried checked and unchecked)

I must be doing something wrong, because all ambient noises and any sentences give me (seemingly) random intent results. I hardly ever get a “intent not recognized” result

I also can listen to my voice command using your Play button on the web interface and it is perfectly clear, so that cant be an issue. (I am using the Respeaker USB 4 mic)

  1. what exactly does the fuzzy checkbox do?
  2. is it my sentence structure that is the issue here? (see roboServo sentence above)

Any advice would be appreciated.

Sorry that I will not solve your problem, but I just wanted to give you a hint about the sentences:

You could use tell me [( what | who | your | when )] [is] [the] <query_cmd> to have all the sentences for this intent in one line and keep things more compact, or did you do it intentionally?