Problems recognizing certain sentenses correctly

Daenara · November 11, 2020, 6:12pm

I started playing around with rhasspy again yesterday, hoping that 2.5.7 might fix a few problems I had but it seems I wasn’t that lucky.

I have quite a few sentences concerning the weather and a few miscellaneous ones, like the time, lights and so on. Sometime between 2.4 and two months ago rhasspy started recognizing things wrong if I asked the time.

Here are my time sentences (German):

[GetTime]
wie spät ist es
sag mir die uhrzeit
wie viel uhr ist es
was ist die uhrzeit
uhrzeit

Weather sentences can be found here

What happens is that I ask for the time (mostly using “wie spät ist es” but sometimes also “wie viel uhr ist es”) and rhasspy recognizes “wie spät ist das wetter” or “wie viel uhr ist das wetter”.

So I have quite a few sentences ending with “das wetter”, and all off them have multiple variables each but I think it should be possible to still recognize sentences without automatically adding “das wetter” at the end because “es” kinda sounds like “das” sometimes.

Here are my settings:

Kaldi is on default settings and I tried Fsticuff with and without fuzzy matching, same thing. On 2.4 I used Fuzzywuzzy instead but I can’t try it on 2.5 because I always run into a timeout error because my weather sentences have too many possibilities to train before the timeout.

Anyone has an idea how I can fix this so I can actually get the time from my rhasspy again?

JGKK · November 11, 2020, 7:44pm

This is something I have thought about before. I think your problem is due to the way that ngram language models work. Rhasspy generates all the possible permutations of your sentences to calculate the language model. The language model pretty much lists statistical chances for one word following another word. So if you have a magnitude more possible sentences in the weather intent what I think happens is that the „Wetter“ at the end of the sentence becomes disproportionately likely in the statistical model as it presents all the possible sentences / word combinations proportional and not according to how often an intent will likely occur.
I used to counter act exactly this problem when training language models manually for kaldi and pocketsphinx by including sentences for different intents in proportion to each other in my text corpus I would train on. I would achieve this by adding something like a time intent that has less permutations multiple times (for very small intents even sometimes 10x) to my corpus to give them a more equal weight in the language model I trained and make them more likely to be recognized.
I wonder if a similar approach could work for Rhasspy where you add duplicate lines to small intents in the sentences.ini to make them equally as likely as intents with many permutations like your weather intent.
Maybe something like this could even be automated at some point in the future.
Long story short maybe try including each sentence in your time intent 3 times or something to make it more likely in the ngrma that Rhasspy trains. Maybe worth a try.

Johannes

fastjack · November 11, 2020, 9:26pm

Yes. @JGKK is exactly right. This is one of the pitfalls of the ngram language model…

Daenara · November 12, 2020, 12:40am

After having read through that thread I took a closer look at the kaldi options, found the text FST model, switched to that and so far it works way better. I have not tested what happens witch every combination of the weather intents but it looks very promising. Now I need to finish my weather script, finally write myself a mqtt script that collects my intents and does stuff and then find a working wakeword for me and I can actually use my assistant for basic tasks.

Thanks for the help @JGKK and @fastjack

synesthesiam · November 12, 2020, 1:22am

Glad to hear this

I’ve found such a performance improvement from it, that I’m planning to make “Text FST” the default for Kaldi in Rhasspy 2.5.8.

synesthesiam · November 12, 2020, 1:26am

This is actually done now, but I still don’t think it works very well. I count the number of possible sentences for each intent, and then adjust the FST weight for that branch accordingly.

I’ve found a large performance difference when trying different ngram algorithms. I settle on Witten-Bell based on feedback, but Opengrm supports many others (and many parameters for each). It might be worth considering exposing those options for advanced users.

JGKK · November 12, 2020, 8:52am

Are you planning to expose those options in voice2json too? That would of course make me very happy

synesthesiam · November 12, 2020, 6:20pm

I’ve created an issue to remind myself: https://github.com/synesthesiam/voice2json/issues/31