A generic HTTP server for handling Intents

winandus · April 1, 2020, 7:05pm

Hi all,

I have been working on my own server implementation and I want to share it with you.

I created this project because I am very interested in robotics, home automation & voice control and I want to create an environment in which I can combine these three things. I stumbled upon Snips a while ago, and purchased a speaker kit for it. But by the time I was ready to start using it, Snips shut down (at least the open source part did).

On the Snips website I read some messages about Rhasspy and I decided to try it. It works great, but I decided that I would like to make any code that I create for handling Intents platform independent. That’s where this project comes in.

The code in my project is a framework for handling intents. It does a number of things:

Sets up an HTTP server which receives Intents from Rhasspy
Adds an abstraction between Rhasspy and the code that handles the intents. Because of that, it should be possible to use the IntentHandlers which are created based on this project with other software like Rhasspy.
Automatically updates Rhasspy with the latest Sentences & Slots when the server is started. This is for people (like me) who want to actively develop their IntentHandlers, and don’t want to have to manually update the Sentences & Slots in Rhasspy all the time when they make a code change.To be able to do this, this project uses IntentDefinitions. IntentDefinitions are created in code, and are broken down into Sentences & Slots. For now only quite basic IntentDefinitions can be created. In the future I want to be able to support quite complicated sentences like the ones used in Rhasspy can tell you the weather (at least if you speak German)
Enables the use of multiple speakers, so you can choose on which device your voice assistant answers you. For me personally, I want my voice assistant to be able to answer me on my Sonos speaker, so I built support for that.

Design:

The code can be found here:

I added some documentation to the code. It could probably be improved. I might extend the readme with some examples.

There is quite an extensive set of tests though, so that will help anyone who wants to make some changes.

I hope someone else finds it useful. I decided to put it under the MIT license, so you are free to use it for whatever you want. Just let me know when you build something cool with it!

I will add some features to it. I really want to use those weather intents that Daenara made (Rhasspy can tell you the weather (at least if you speak German)) in my home, and in my server, so expanding the IntentDefinitions is probably the first goal.

orca8119 · April 9, 2020, 7:01am

Hello winandus, sorry for missing this thread, as it is relatively close to my idea. Except maybe, that I had a little more dynamic “skills” in mind.
We should see what synesthesiam prefers. Elsewise I’ve no problem supporting your project (I simply want multiple intend handlers)

regards

winandus · April 9, 2020, 2:09pm

Hi orca8119,

It seems there are quite some people at the moment having similar ideas around multiple intent handlers, and repositories for sharing them. Also looking at this topic which was started yesterday: Community repository for skills?

This project has for now just been my personal project, aiming at my own requirements. I just mentioned it in your thread because it shares some ideas.

What do you mean exactly with “dynamic skills”?

My project still has some gaps though. One of the (personally I find it a cool one) features I’ve built into it is that the sentences & slots are generated from the code. So that would also work great if used from a skill repository. Just enable the IntentHandler and your Rhasspy sentences & slots are updated automatically.

But I’ve also been looking at Daenaras project and the amount of sentences & answers and the translation that’s needed for them, and that just won’t scale well in my current solution. I have been thinking about that though. Daenara is currently using python files to contain the text. A json file could also do the trick. And from that I still think it should be possible to generate Sentences.

Also, don’t feel obliged to use my project in any way I’m just sharing it because I think I built some cool stuff, that might also help or inspire other people.

Btw, I don’t have all the latest code in the GitHub repo yet, as I have a local private repo to do my development (I find it works easier). Yesterday I built a feature that allows python scripts to talk to the user via Rhasspy, without the user invoking an Intent. This is particularly useful if you want to do notifications, or use Rhasspy to set a timer for several minutes.

If you’re interested, or someone else, I can push to the GitHub repo more actively.

geoffrey · April 9, 2020, 2:44pm

@synesthesiam is also working on this I believe (skills in general I mean):

Any publicly available repo is always interesting to follow and can perhaps inspire to contribute to your work.

Daenara · April 9, 2020, 3:05pm

I aim to at least generate my own slots dynamically and do so with one slot for now. The python file is necessary for the translation because it actually contains functions to format stuff that differs to much to do otherwise (dates in different languages are complicated, so are times) and the only way I know of to make python do that work for me needs locales installed on the system and not even I do that, so I do not want it as a requirement for my script.

winandus · April 9, 2020, 4:26pm

Right. That sentence in the quote is supposed to have a perhaps in it: “A json file could perhaps also do the trick”. I don’t want you to get the idea that I think I know everything better.

I was just thinking about different orientations of strings and slots to compose a sentence, and I thought if that’s the only difference between languages, it should be possible to put that in a json. Bit naive

But I hadn’t even considered dates and times yet. Good point!

All I want to say is that I see some nice things in repos here and there, and some good ideas in forum topics. And I think if we combine all that, we could make some cool stuff together. One way or another.

Daenara · April 9, 2020, 4:30pm

I did not think that you actually took a look to see what is in that file, just wanted to mention that it is not possible with a json atm (not that I like json to begin with). That particular script is still a big work in progress but it at least does what it is supposed to do.

I, too, hope that we all can create an amazing voice assistant together which is why I hang around here so much even though I still only have a half working rhasspy for tests here, never even got around to add a wakeword.

winandus · April 9, 2020, 5:00pm

Haha, same thing here. Although mine does have a wakeword, which works every now and then

winandus · April 9, 2020, 6:18pm

Ok, I have updated the GitHub repo with the code I had locally.

It’s not getting any simpler, but here is the latest design:

The major additions are the asynchronous (I called it asynchronous, because with this you don’t need a user request for your Rhasspy to speak) module, the TriggerManager and the TimerIntentHandler.

The classes in the asynchronous module allow you to do text to speech on your Rhasspy, even when there was no intent. The TimerIntentHandler is a first example that implements this functionality. With it you can:

Create a timer for x number of seconds/minutes from now (I have an issue with that at the moment, check Problems with Number Ranges & Converters for details)
Rhasspy responds that the timer was set
After the timer expires, the TriggerManager is notified and in turn uses text-to-speech to tell you on your Rhasspy that the timer ended.