Advanced Built In Slots

Hey everyone :wave: ,

As a follow op on the Thread here
I wanted to hear your opinion on adding more complex built in slots to rhasspy.

For example there is Facebooks Duckling project (https://github.com/facebook/duckling) which adds functionality to parse strings to a standardized json output. Snips used it aswell for their built in entities in a rust version of the software (https://github.com/snipsco/rustling) It also supports different a huge variety of languages aswell.

Examples from their Readme.md:

Slot Example input Example value output
AmountOfMoney “42€” {"value":42,"type":"value","unit":"EUR"}
Distance “6 miles” {"value":6,"type":"value","unit":"mile"}
Duration “3 mins” {"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}
Email duckling-team@fb.com {"value":"duckling-team@fb.com"}
Numeral “eighty eight” {"value":88,"type":"value"}
Ordinal “33rd” {"value":33,"type":"value"}
PhoneNumber “+1 (650) 123-4567” {"value":"(+1) 6501234567"}
Quantity “3 cups of sugar” {"value":3,"type":"value","product":"sugar","unit":"cup"}
Temperature “80F” {"value":80,"type":"value","unit":"fahrenheit"}
Time “today at 9am” {"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}
Url https://api.wit.ai/message?q=hi {"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}
Volume “4 gallons” {"value":4,"type":"value","unit":"gallon"}

Furthermore slots like country/city names could be included. A public database is avaiable here:
http://www.geonames.org/

6 Likes

+1

This is indeed a much needed feature that harbours a few complex issues though. It has been discussed on Github (that was before the creation of this forum).

See:




Not all Rhasspy ASR/NLU services can support this.

For ASR, I think it is doable with good accuracy but only through Kaldi (impossible with Pocketsphinx, maybe Deepspeech for the English model only using open transcription).

For NLU, it’s possible with Snips NLU and maybe RASA.

For my homemade assistant (not using Rhasspy), I merged the ASR and NLU services to benefit from Kaldi Grammar FST non terminals tokens to extract entities directly (which bypass the hardest part of the NLU, extracting entities from text).
https://kaldi-asr.org/doc/grammar.html

I made my own specific NLU system (partly inspired from Snips’) to match intents and slots from there (tokens and entities). I’m actually using Rustling to parse grammar based entities like number, date, etc.

It is working very well. I’m pretty much on par with Snips (sometimes even better :nerd_face:, sometimes not :hugs:).

I would love to have more built in slots aswell. These slots types are required by pretty much every user and tend to get quite complex when handling them properly.

It would be great to have such complex slots, as long as they are optional or as long as the training time doesn’t suffer from it. At snips there were built in slot groups which one could use and combine with your own defined slot, that was useful

Just by adding slots training wouldn’t be affected. Just when using the slots in some sentences, training times would increase.

This seems to be similar aswell… @maxbachmann i saw you commiting to the project. Is this correct?

I like Mycroft’s lingua franca too, see the previous discussion here:

I haven’t used it yet for real, but it looks promising.

Yes I did commit to it already (both to microft and lingua franca), since I thought it looks interesting, but I got to admit, that I did not really use it yet either :rofl:

Are you re-using Snips’ Kaldi FSTs, or did you generate your own?

For now I’m using Snips FSTs (with word IDs) that I converted back to standard FSTs (with words). I also fixed a few issues (tri instead of trois, incorrect paths, etc).

If I have the time I’ll try to convert them to JSGF…

I’m a little curious about the state of this feature request. I’d love to move Voco away from Snips and over to Rhasspy some time in the future, but I’ve been spoiled by the freedom of being able to give voice commands in a natural language way.

Just thinking out loud, now that I have a little better understanding of machine learning:

The first step would be to have a lot of example sentences. If it’s useful, here are the training sentences I originally created:

Parts of the sentences would have to be labeled, and then the software would figure out which words in the sentence mean what, while also figuring out what the likely intent is based on the signal words and sentence structure.

Since the things, properties and values of smart devices could be a known limited set, those could be used to generate lots of additional training sentences. The training could perhaps be finished on device. This is probably how Snips injections works? If those values are known, it’s easy to expand the training sentences to include all those, and their combinations where they make sense (e.g. “set the volume of the radio to 88” is ok, while “set the moisture of the radio” is silly). If the devices in the home are known, recognition could be pretty good.

Once this part disects the sentence, the rest of it could perhaps be fed into Mycroft’s Lingua Franca tool to extract the actual values.

// update

Doing some researching. Lots of interesting ready-make things on Github. Could existing chatbot software help out here?

It seems Snips open sourced this part. Is that a useful starting point?

Whoa, Snips-NLU is already implemented in Rhasspy. Awesome!

2 Likes

Hi @candle,

Another option might be Rasa X. Rhasspy currently supports pipeline workflows through it (I need to update to whatever they have now).

I haven’t gotten it integrated into the Docker image or Debian packages yet because of the Rust dependency. If you have a pre-trained Snips NLU engine, though, it does work :slight_smile: