Training for unknown words

dsha · January 7, 2020, 6:58pm

Hi,

I am able to get a basic setup working & Rhasspy is able to control Home Assistant and respond to basic commands e.g. What is temperature ?

I am planning to extend so that Rhasspy can do WIKI search for spoken word or play a song or play internet radio station.

Option 1) Have a python script subscribing to MQTT event XXXX.
Have following in sentences.ini for same intent XXXX
find (apple | microsoft ){title} in wiki
find wiki for (apple | microsoft ){title}

When voice command is “find apple in wiki”, Rhasspy does correct voice to text to intent conversion, python scripts gets the intent, does wiki search and plays it back via text to speech.
But this doesn’t work when any words apart from apple or microsoft is spoken.

Is there a way to create sentences.ini so that if a word is not in sentences.ini, intent handling still works ?

Option 2) Have script subscribe to MQTT hermes/nlu/intentNotRecognized. In this case MQTT event is published but speech to text conversion is not very accurate. Most of times words after "find wiki for " are not accurate, so WIKI search is not working.

frkos · January 7, 2020, 8:43pm

Just decided to try your idea…
For better results I’m using kaldi and enabled open transcription mode to recognize all words
But what I’ve found is that in event I have _text and _raw_text the same…

The first line in my log shows full sentence please turn my desk lamp on, but in the event _raw_text contains only words from sentences.ini, ignoring unknown please, desk and my 'raw_text': 'lamp on'…wierd

[DEBUG:950613] DialogueManager: {'text': 'lamp on', 'intent': {'name': 'TurnOn', 'confidence': 0.9}, 'entities': [{'entity': 'device', 'value': 'lamp', 'raw_value': 'lamp', 'start': 0, 'raw_start': 0, 'end': 4, 'raw_end': 4}], 'raw_text': 'lamp on', 'tokens': ['lamp', 'on'], 'raw_tokens': ['lamp', 'on'], 'speech_confidence': 1, 'wakeId': 'snowboy/snowboy.umdl', 'siteId': 'default'}
[INFO:950029] quart.serving: 192.168.1.99:56562 GET / 1.1 200 1029 92220
[DEBUG:946375] DialogueManager: decoding -> recognizing
[DEBUG:946295] DialogueManager: please turn my desk lamp on (confidence=1)
[DEBUG:946181] KaldiDecoder: please turn my desk lamp on

Does anyone know is it a bug or expected behavior?

frkos · January 7, 2020, 8:49pm

according to the document it looks like a bug
https://rhasspy.readthedocs.io/en/latest/usage/#getting-the-spoken-text

But can someone confirm I didn’t miss anything? If so I will open an issue on github

dsha · January 7, 2020, 11:34pm

some more details

sentences.ini
[GetWikiSearch]
find wiki for (apple | microsoft ){title}

When voice command is 'find wiki for microsoft" MQTT event rhasspy/intent/GetWikiSearch is triggered with expected intent
When voice command is 'find wiki for bill gates", MQTT event hermes/nlu/intentNotRecognized is raised but event data is not good.

Adding Bill Gates to sentences.ini produces expected results

frkos · January 8, 2020, 7:46am

What Speech to text engine do you use?
If Kaldi, is your mic sensitive enough?

dsha · January 8, 2020, 3:24pm

Tried PocketSphinx & Kaldi both. Mic is matrix creator card.

frkos · January 8, 2020, 5:51pm

Hmmm… If Kaldi can’t recognize words, we are out of luck here
But if you try to record wav file from your mic, is it good? Can you hear your voice clearly?

*opened an issue regarding _raw_text