Add Support to switch between an offline and online ASR

maxbachmann · February 9, 2020, 1:58pm

Right now Rhasspy already has build in support for Pocketsphinx and Kaldi and other solutions like the google asr can be used. However the user always decides in the beginning which solution he wants to use and then this solutions is going to be used until the setting is changed. However it might be cool to be able to switch between them while rhasspy is running e.g. for continueSession.
Here is an example where this could be useful:
A user wants to use the google asr by default since the offline solutions do not perform well enough (e.g. because he wants to catch words that are not in their training sets, while the online asr will probably know it). However he might have skills that start a dialog and the follow up question allows the user only to answer with yes/no in this case it would be cool to tell continueSession that this should be performed offline, since it can be done offline.
Apparently this works the other way around aswell, so you use a offline asr and e.g. when you have a skill that requires arbitrary words you could simply start the skill with start xy (basically the way it is done e.g. in alexa) and then the skill continues the session with a online asr.

So it would be nice to allow the user to select something like default, offline and online asr` so skills can switch between them (apparently when the user did not select e.g. a online asr it will simply continue using the offline asr so the offline/online flag the skill sends with continueSession would be a wish and not something thats done for sure. This could be used to provide a fallback when the user is currently offline aswell. Especially because of the offline fallback this might be interesting for other components like tts and nlu aswell, where users might use a online solution by default.

fastjack · February 9, 2020, 2:22pm

Interesting… this would require a free form slot to recognize intent… maybe some kind of slot filling system… I don’t think this plays well with the current implemented NLUs (fuzzy or fsticuffs) except probably RASA…

maxbachmann · February 9, 2020, 2:40pm

Hm at least for ASR and TTS this problem does not exist, but yes it might cause issues with the current NLU’s.
Re NLU’s thats actually something the snips nlu supported with the useSynonyms flag (not sure in which way it still used the examples you provided for the NLU, or whether they were only used for the ASR then)

Btw I think we should add support for the Snips NLU since it is open source and a pretty good offline solution

fastjack · February 9, 2020, 3:18pm

I don’t think the Snips NLU does free form slots as it only handle gazetteer and grammar based slots… The use synonyms option only provides predefined alternatives to a slot value. Using their probabilistic intent parser with the CRF slot filler might kind of work in some cases but it is not really a supported feature.

The way Alice creator Psycho handled it using the intentNotRecognized is pretty smart but it does not work for a free form slot inside an utterance though.

The Snips NLU is pretty good indeed at generalizing an utterance to an intent and include their port of the Duckling library to parse grammar based slot values. It can indeed be a nice addition to Rhasspy NLUs.

maxbachmann · February 9, 2020, 3:25pm

Oh sorry I mean automaticallyExtensible. Yes it is not free form, but even then the asr does not know many words that follow the same grammar. E.g. add x to my shopping list the NLU can find, but the asr needs to know all the words, which might not be the case for a offline solution

fastjack · February 9, 2020, 3:34pm

Did you manage to get the Snips NLU to recognize a slot with words it was not trained on?

maxbachmann · February 9, 2020, 3:43pm

I think I did, but this was a while ago so I suppose I better retest this so it is not just my memory playing me a trick

synesthesiam · February 10, 2020, 3:10am

I believe Kaldi supports embedding an UNK (unknown) token in your language model that could be used to do this. If you have the ASR token timings, you could feed the waveform from the UNK section of the voice command to a different ASR system. Of course, that sounds hard to get right

A more general solution for the online/offline switching functionality would be to have some kind of serviceId that you could specify in continueSession for ASR. So a particular siteId might have two ASR services, one named default (offline) and another online. If you don’t ask, the default service is used for ASR, NLU, etc.

fastjack · February 10, 2020, 3:29am

Ah…I see the automatically extensible option! Looks like they use CRF to do slot filling indeed. Using an open transcription system like Google ASR and Snips NLU should allow free form slots like remind me to <... ... ... ... ...>. Nice!

@synesthesiam As the dialogue service orchestrate the flow between other services a serviceId for ASR and for NLU in continueSession might be something to consider (maybe for version 2.6 though?)