Helper library to develop Rhasspy apps in Python

After the discussion about Rhasspy apps with Hermes MQTT in AppDaemon I did a first proof of concept of an ‘app library’ in Python. This makes it possible to write the toy app from that discussion as a standalone Python script like this (with the right import statements of course):

class WhatsTheTime(RhasspyHermesMQTTApp):

    @Intent("GetTime")
    def get_time(self, intent):
        now = datetime.now().strftime("%H %M")
        self.say(f"It's {now}", intent.site_id)


if __name__ == "__main__":
    WhatsTheTime()

And as an AppDaemon app like this:

class WhatsTheTime(RhasspyHermesAppDaemonApp):

    @Intent("GetTime")
    def get_time(self, intent):
        now = self.datetime().strftime("%H %M")
        self.say(f"It's {now}", intent.site_id)

I hate boilerplate code, so this is a big improvement in usability for app developers.

To make this usable, I should implement decorators (like the @Intent) for every Hermes MQTT message an app can react to, and methods (like the say) for every Hermes MQTT message an app can send (for instance I would have to end the current session in this example instead of just using the tts/say MQTT topic that the say method emits). This is quite doable because the Rhasspy Hermes library does all the heavy lifting.

Any comments about the API design or specific requirements before I start working on a first version?

One issue I’m not sure yet how to deal with is how to make it possible to reuse the same code to run as a standalone Python app (optionally in a Docker container) or as an AppDaemon app. Right now I have defined base classes for the first and second situation, and in the second situation the base app (RhasspyHermesAppDaemonApp) inherits from AppDaemon’s mqtt.Mqtt class. As you see from the two examples above, the standalone and AppDaemon versions are almost identical. You only inherit from a different base class, and in the AppDaemon case you use another function for the datetime because that’s what AppDaemon recommends. You also have access to all AppDaemon’s state in the latter case (not used in this example). But in most cases the method body would be exactly the same in both situations if you’re not using any functionality specific to the execution environment (for instance subscribing to non-Hermes MQTT messages or using AppDaemon’s internal state or methods).

I don’t think it’s possible to just write an app and decide at runtime whether you want to run it in AppDaemon or standalone. But as far as I can see now, it should be possible to write an API that is almost exactly the same for app developers on AppDaemon or standalone Python, they just have to choose a base class.

I’m also not sure yet how to publish the library practically: one library where you just import the right base class for your execution environment? But then you only use half of the library. So maybe make it two libraries, one for AppDaemon and one for standalone Python, but with (almost) exactly the same API? That seems more difficult to maintain, but is probably nicer for the app developer.

11 Likes

Thanks @koan for putting effort into this, it’s greatly appreciated.

I’ve updated my public repo with the latest changes to the AppDaemon apps which include things I discuss below.

Class inheritance

Could it inherit adapi.ADAPI? That’s the base class for mqtt.Mqtt. This way I can mix parent classes and use also, e.g., hassapi.Hass. I can’t mix mqtt.Mqtt and hassapi.Hass though because it caused some unexpected issues – can’t find the forum thread right now.

Separated libraries

I would say that despite it’s maintenance effort, having two different libraries can be beneficial. Think about libraries having dual APIs for JEE or Spring: they are designed and packed as different libraries, so it wouldn’t be the first time.
What we can however keep in common betwen the two libraries is business logic stuff (e.g. session management, more on that later). We’d have to define “communication factories” (e.g. AD call_service for mqtt vs. Python mqtt library such as hbmqtt) but I think with a little effort it can be done. Of course if you think it’s worth it, that is.
The alternative would be two completely separated implementations, but that would be even more costly IMHO.

Session management…

This is something I’ve bumped into lately while developing AppDaemon apps for Rhasspy. I’ll explain, please tell me if I’m making any sense or if it can be approached somehow differently.

One thing about apps is that when they are inside an estabilished session (a dialogue has started), the next steps might recognize intents that are commom among apps (e.g. “start a timer” -> “how long?” -> “40 minutes”). That “40 minutes” will be recognized as a generic intent (Rhasspy doesn’t support slot filling, but that could apply also to normal intents, not just slots, that are similar, e.g. a confirmation command) which could be captured by more than one app.
Also think about satellites. Sessions are per satellite. Multiple satellites, 1 app instance (I’m thinking in an AppDaemon way here).

In my apps I’ve introduced the concept of owning a session.
I’ve written a support app for interacting with the Rhasspy Dialogue Manager (using of course Hermes). This support app keeps the state of all sessions starting and ending.
A skill app, when the dialogue should continue after the first step, requests ownership to the dialogue support app. It then asks the dialogue app to continue the session (just a method call that triggers the MQTT publish, really).
On the next dialogue step, the app will ask the dialogue support app if it’s the owner of the session and continue handling the dialogue, either continuing or ending it.

This might be an overengineered solution, but I’d like to share my idea to see if (1) I’ve got session management right and (2) it can be improved in this new library we’re going to create.

If all that above makes sense: do we want session management in this framework? Session management doesn’t only involve session ownership, but also satellites – sessions are per satellite right?

…or not?

Let’s say we don’t want (or don’t need) session management by the framework:

  • The slot filling issue would be somehow solved by native slot filling support by Rhasspy (how? Anyway this will make Rhasspy handle slot management, completely excluding the skill app)
  • The “common intents” issue (e.g. saying “proceed” for confirmation) would be solved by having different intents for each skill app
  • The satellite issue would not be solved; every app should keep state of current sessions, one for each satellite they are interacting with

Intent configuration

My apps have configurable intent names. Using annotations would not make this possible. Although this wouldn’t be a big deal, it would make Rhasspy sentences.ini grow considerably for common utterances (e.g. confirmation command or similar common commands). I don’t know if that could be a big deal either, but I’ll just put it on the plate for discussion :slight_smile: That being said, I love the annotation approach from a developer point of view.

Sessions by satellite is an assumption as a satellite may be a group of purely capture points say several in a room as distributed mic / speakers.
For many operations its logical for those satellites to have a singular session for that collection.

But yeah intent is the only requirement for Rhasspy and intent sessions should be able to be passed to an intent processor but as far as I am concerned thats a completely seperate project to the core voice AI that should be rhasspy.

Intent is the helper library.

@koan I really like the minimalist approach using decorators. :+1::sunglasses:

I think sessions are not per satellite, they are per siteId.

Session management is the way to go to avoid issues that arise when using low level topics directly.

Each app/skill must provide the necessary intents to work so if a pseudo generic intent is needed like confirmation « yes », the skill should provide its own confirmation intent and do a continueSession with an intentFilter for this specific intent.

I don’t think that Rhasspy should provide any predefined intents or gazetteer slots. That’s the job of the skill and introducing dependencies between skills should be avoided. Maybe these « shared » intents/slots can be provided by the app/skill manager instead… I wonder…

Though I do think that Rhasspy should provide builtin slots/entities for grammar based stuffs like dates, durations, numbers, etc. (as listing all the possible values is impossible) as parsing these is the job of the NLU component.

My 2c :wink:

Sorry, I confused them. I meant sites, not satellites.

Session managament is needed anyway, by it could be assisted by the framework. The main use I’m thinking of is keeping session state. The framework could keep session state per-skill and pass it along with the intent:

# made up code here :)
@intent('HelloIntent')
def handle_hello(self, intent, session_state):
  # do something

The skill can then have full read-write access to session_state at will (and data will be preserved by the framework). The less the skill knows about sessionId and everything, the better.

Ok going this way we solve the issues of slot filling and common intents (by using skill-specific intents). Mine was a dangeours path anyway :slight_smile: maybe Rhasspy could use intent aliases… :sunglasses:

I wish :slight_smile: I’m currently using Rasa NLU with Duckling: it gets the job done for the most part. But I had to patch Rhasspy.

We had a discussion about it earlier: Parsing builtin slot values

I agree about this. But what is this ‘session state’? Shouldn’t the skill just know how to end or continue the current session? That could be delivered by methods of the base class for the skills.

With some tricks it is possible. I did this in SnipsKit to create translatable intent names, see an example.

Isn’t this session_state the same thing as customData In the dialogue manager session ? Does it provides a way to abstract the serialization/deserialization of this value (as it is supposed to be a string) and it’s propagation in dialogue management topics?

Yes that’s the purpose of customData. And an app can find this in the intent object (see the NluIntent class): https://github.com/rhasspy/rhasspy-hermes/blob/master/rhasspyhermes/nlu.py

1 Like

I would actually drop that based on @fastjack answer, if it’s ok for you. One less problem to worry about. Using specific intents per-skill and intent “namespaces” would be enough (e.g. daniele-athome:myskillapp:myintent).

The problem is that only the dialogue manager knows about customData. So it’s not being passed with the Hermes message that actually counts in this case, which is hermes/intent/<intent_name>. That’s because the NluIntent message is sent by the NLU module and not the Dialogue Manager. More details in the official Hermes documentation.
Anyway, as I understand it, fixing that in Rhasspy would allow customData to be passed to NluIntents too, making my “session state” idea obsolete (btw I’ve already started working on that, but it involves multiple modules I’m afraid).

1 Like

Snips worked like that, so for snips users who have already process with customData (like me :smiley:) , it would be easier to migrate.
But it is not a reason to not ask ourselves if there is no better methods to do that :slight_smile:

Ced

Now I think it’s the best method. I just didn’t connected all the dots before – I was missing some information.

I have never used the customData yet because all my apps had a simple question - answer dialogue, but I agree: the changes you propose could make this work in Rhasspy the way that Snips worked. Thanks for taking the time to look into this!

So, I haven’t forgotten about this, just busy with some other stuff. Yesterday I started to convert my proof of concept code into a repository I can publish, but this also made me think more about the design of the API. I remember when I was creating SnipsKit one of the Snips developers suggested a Flask-like API to me.

This would mean that the standalone version of the example app that I showed in the beginning of this post would look something like this:

from rhasspyhermes_app import StandaloneApp

app = StandaloneApp("TimeApp")

@app.intent("GetTime")
def get_time(intent):
    now = datetime.now().strftime("%H %M")
    app.say(f"It's {now}", intent.site_id)

if __name__ == '__main__':
    app.run()

Such a Flask-type API is more flat with no inheritance. You don’t need to define a class. A lot of Python web developers are familiar with it. I see the beauty in this approach, and I actually rewrote part of SnipsKit in this style before I stopped using Snips, but I never actually wrote voice apps this way, so I don’t know whether it’s a good choice.

An important reason to not choose this approach is that AppDaemon apps are class-based: they should subclass the mqtt.Mqtt class for MQTT apps and AppDaemon creates an object of this class and initializes it. So choosing this approach for standalone Rhasspy apps breaks API similarity with Rhasspy apps for AppDaemon. So I’m inclined to just implement this library with the class-based approach.

I’m curious what others think, though.

2 Likes

I think too that it’s not a good idea to break API similarity. But i like the Flask-like style :slight_smile:

Ced

I really like the Flask style implementation and would prefere it over the class based one just for simplicity. Another Pro-Flask point is that this kind of minimalistic API is easier to understand and implement. it just takes less code…

i understand however the more linear approach using classes. Would using this make it easier to adapt to changes at AppDaemon in future?

Well, I do like the Flask-style API too, it’s easier to reason about, especially for simple apps. And it saves an indentation level :slight_smile: it’s just that using the class-based approach for both types of apps would make the standalone and AppDaemon versions of the API as good as identical. If I’m using the Flask-style approach for standalone apps and the class-based approach for AppDaemon apps, the APIs would still be very similar (just compare the Flask-style example to the first AppDaemon example in this post), but if you have been writing standalone apps in the Flask-style API and then for some reason switch to developing for AppDaemon (or the other way around), you suddenly have to change your programming style a bit.

This may seem like hair-splitting, but I want to start from a clean base now so I don’t have to rewrite the fundamentals of the API later. So I want to look at all the options now and hear other opinions.

@daniele_athome you’re probably the one with most experience in AppDaemon from the participants in this discussion, what do you think about it?

1 Like

Can you use the alternative AppDaemon implementation approach outlined here? Using this approach based on ADBase might allow you to build something Flask-like as you are proposing. You would still need to sub-class ADBase for your app, but the MQTT or HASS plugins can then be used via “accessors” without resorting to sub-classing.

Nice find! I forgot about this alternative. But those plugin objects are still defined inside the class, no? And AppDaemon only initializes an app when it finds a subclass of ad.ADBase defined in a file. As far as I know, other code in the file outside the class definition is not executed by AppDaemon. How should the code for the AppDaemon time app look like with this approach, according to you?