I started with Rhasspy some days ago with all the help of this great community.
As as first test project I created a shopping list (food only for now) using Node Red. This works great now. I used wiktionary.org to get a good amount of food words into my slot list. But actually some words are still missing. So where do you find lists of words for a special topic?
Would it make sense to have a community driven Rhasspy repo for predefined slots (e.g. for rooms, devices, food, etc)?
That’s actually a nice idea.
For food I think you will find more food words than you will ever need in the Open Food Facts database.
Thats great stuff. Thanks, haven’t found that myself.
For all who don’t know how to use the api, simply go on a page e.g products by categories or a product page and add .json to the url.
Anyway some predefined lists that come with Rhasspy would be awesome, so we can concentrate more on the voice assistant part and not so much on thinking which words might be missing.
The thing is, such a list should be maintained, curated, coordinated with other languages, … Maintaining a qualitative word list takes some effort.
If someone wants to start such a word list for a specific slot type and it’s useful for others, and we have a couple of these slot types covered, I think it makes sense to put them in a Rhasspy repository, or at least link to them in the documentation or my awesome-rhasspy repository. @synesthesiam what do you think?
If we do add more baked-in slots, I’d prefer things that are easily translated to most of the supported languages and aren’t very open-ended. Color names fit this description well, but food words less so. Pets might work too.
That being said, I have no issue with someone just posting a nice curated list of slot values in a Github repo somewhere that anyone who’s interested can clone. You can file them under a directory even, like a slot
food/apples with “honeycrisp”, “red delicious”, etc.
I agree. If we want this to succeed, I think it’s best to start small with some well-defined slot types. The day and month names are a nice example, as well as color names.
I don’t have time to do this myself now, but I want to support this, so if someone wants to create a specific slot type that is useful for others, I can create a repository in the rhasspy organization on GitHub and add it there so others can contribute their translations.
We do have to think about a license first. What license makes sense for this type of content? I’m leaning towards one of the Creative Commons licenses.
Thinking a bit further: day and month names and colors are actually special in a way: they are structured data. For instance, “February” could be decoded as
2, “red” as
#ff0000, In the long run, I think it makes more sense to add them as built-in entities in an entity parser such as Duckling or Rustling, so intents can have access to the structured representation too. But let’s learn to walk first before we start running
Duckling/Rustling already handles dates and times. I do not think that adding colors will be useful.
These NLU libraries are awesome (I use Rustling with a homemade binding for Node). I have a preference for Rustling (even though it is sadly not maintained as much as its parent) as Duckling is coded in Haskell and it is pretty impossible to bind to without installing a full blown Haskell env.
This kind of resurrect the « builtin slots » topic .
To handle translations of this slots dataset, maybe something like Weblate can be used to automate the translation process and the publishing to Github.
I like the idea to have it as part of the organization so we can have it in a central place.
Maybe it makes sense to specify some topics as a starting point so users can then contribute to it. Of course we would have to define some rules how things should be structured.
You don’t have RGB lights at home, do you? Me neither, but I think this case is useful for users of Philips Hue-type lights. Then you could say “turn the light red” and the intent handler can automatically give the command to set the light to
@koan This can already be achieved using substitutions without imposing absolute values through an entity parser. I do not see any issue with providing a list of predefined colors with hex substituions but I do not think that belongs to Duckling/Rustling.
This slot dataset also suggest some kind of « extensibility » for slots.
It might be pretty cool to have some way of importing slots values from this repository:
@import=https://github.com/rhasspy/slots/colors my own value my own value 2 ...
The dataset augmentation process could then download the slot values in the correct language automatically during training.
Just an idea
Ok, I see your point, substitutions are probably fine for this case. A simple way to import and extend a slots list would be cool indeed
I like this idea, since I do not think that everyone who knows a different language and wants to help translate will know git + whatever format these files have. E.g. Mycroft uses something similar (Pootle) for all the skill translations aswell
My dream would be to incorporate full-blown semantic web ontologies into Rhasspy. Some of my research has focused on the use of answer set programming and an event calculus to do commonsense reasoning.
I think some unholy mixture of these things could enable Rhasspy to handle more complex requests and to “fill in the gaps” by applying commonsense reasoning and facts/relationships pulled from from ontologies.
For everyone and future me, a big 'ol bag of lists: https://github.com/dariusk/corpora