Large number of slot options

HI all

I have been exploring around with rhasspy over the last week and have setup some basic smart home type intents - controlling all my devices etc successfully.

I have moved onto trying to get it to select music to play.

Quick overview of my setup
Running rhasspy on an intel nuc as a home assistant plugin. This is also running node-red which is linking everything together.
Got volumio running on a pi (the target i am controlling).
Home TrueNAS with all my music on it that volumio can select from.

I have successfully got rhasspy playing and pausing the music that is running. However i would like to be able to select which artist is playing (start relatively simple before moving onto track names).

I have created a slot called artists which i have populated with a list of all the artists i have music for (using the API’s for volumio and rhasspy so it automatically updates and saves me typing). However this list runs to several hundred lines (524 to be precise).

When i type the command into rhasspy (e.g. Play music from Queen), it works perfectly and rapidly. The intent is correctly determined and the slot is populated correctly. Flows into node-red, through my flows and Queen starts playing. This at least gives me confidence that my node-red flows for searching the NAS and controlling volumio is working.

However when i say the command, rhasspy just stays silent until the maximum duration time is reached before giving me the Error sound and “no intent recognised” on the webpage. When i empty the artists slot down to just a handful of names, it works fine.

For speech to text, i am using Kaldi and haven’t changed any defaults. I have tried using the “Open Transcription mode” but that makes the node-red side of things much much harder (and gives lots of approximations to band names when it slightly miss-hears)

So my thoughts are along three lines but i dont know which is most likely:

  1. Rhasspy just cant handle that number of options in a slot.
  2. There is a problem with my setup of Kaldi.
  3. There is some artist name in the slot that is breaking something that is being removed when i limit it to a handful of examples - special characters maybe?

Has anyone had any experience of that large of a slot list?


I will answer your questions first.

  1. Yes rhasspy can handle slots with way more options than that.
  2. This is a possibility and you may need to tune it, but not sure.
  3. This can happen but it would stop the whole slot from loading and break the training step for Kaldi as well and so would be very obvious.

I started doing that a few years back and it was working really well until recently.
My own artist list is approx 2900 long.

I think it broke when I upgraded to 2.5.10 and I found that some characters did cause the list to fail to load but that isn’t the same thing as what you are describing because yours still works when typed in.
Once I filtered out the characters (I think it was apostrophes from memory) it worked again.

It really broke when I changed my microphone and I think I either haven’t managed to re-tune it correctly or it just isn’t good enough but that was also when I upgraded to 2.5.11 and I think it may also have contributed to the issues I am seeing.
I haven’t had time to troubleshoot though

Thanks for your help.

Just removed all the special characters en-mass and that has solved the problem. Its responding instantly again with all the artists in the slot so that seems to have been the issue. Not sure why it trained ok or worked with text though.

Thanks again, i was worried it was a fundamental limitation but looks like im ok.

It’s great to hear you have it working again.

You may have been luck and had an even number falling between single quotes. I’m pretty sure that I remember correctly and only had 1 so it just bombed out.
I guess if there were an even number it would have resulted in corrupting the list without throwing an error.

I must say that the slots feature is one of the things I love the most in Rhasspy because it allows for enormous scale and flexibility. I have node-red and some command line scripts generate a large number of slots and slot entries covering my home automation and music and it means I can have a much smaller set of sentences control everything.

Actually thinking about the the reason it works for type and not spoken would probably be because the spoken word needs to be interpreted and then converted to text before the intent engine can work on it but when you type it in you bypass the speech to text engine and go straight to the intent recognition