Helper library to develop Rhasspy apps in Python

Errr, is there anything that needs to change? I guess the final stanza calling app.run() is unnecessary as there is no “main” for AppDaemon apps. The meat of the question is what does StandaloneApp do behind the scenes to “run” your Rhasspy app in an AppDaemon app and hide the details?

For example, how do named (Rhasspy) apps defined with StandaloneApp be made known to AppDaemon? You still have to “configure” the AppDaemon app correct? Would the named app (e.g. “TimeApp”) map to an AppDaemon app by that name that would have to be configured via yaml in the standard AppDaemon manner? Does that mean that StandaloneApp has to generate the AppDaemon class corresponding to the named app on the fly? I presume this is possible in Python, but this starts to sound complicated. It also seems like with this flat structure, one could run into (function) namespace collisions that would be less likely in a class based approach given that unlike simple Python scripts, multiple apps live in the same Python environment with AppDaemon. Might this result in obscure issues with user contributed apps intended to run in AppDaemon?

What is it that one expects to gain in a Flask-like API versus the traditional AppDaemon approach? The two simple examples so far look pretty similar sans the class declaration. I’ve not used Flask so maybe others can be more detailed in why they like that style? Are we really gaining anything by hiding AppDaemon or just making things obscure? Your original class-based approach does seem reasonable. It can easily take advantage of the existing AppDaemon infrastructure without resorting to lots of hair to hide things. It does feel like a Flask-like structure is sort of an impedance mismatch with AppDaemon.

Sorry, seems like I have more questions than answers!

I would say that using a class-based approach or a function-based approach (both using annotations) wouldn’t really be a huge difference since an app (either AppDaemon app or standalone app) could just instantiate the “app” object and use methods provided by it to speak/listen to Hermes (IOC can be achieved for both ways with a little more effort). But there is one problem: what will the framework use for communicating with the external world (MQTT/Hermes)?

A standalone app would need some connection infrastructure (by leveraging an event loop for example), a configuration file for connection parameters, and so on.
An AppDaemon app can leverage a running application server, so no need for anything.

This difference alone forces us to create two different entry points.

Personally I’d try to leverage existing framework(s) – I would not reinvent the wheel by writing another framework from scratch (i.e. if we want to use a “Flask approach”, we should use Flask for real) – that is, if that was your idea @koan and if I understood it correctly.

I also think that creating a framework that can be used in multiple environments (that is, that can be used interchangeably with another framework or application server without touching the app code) brings more complexity and effort than benefits. Think about the compromises we’ll have to make, the complexity we’ll have to handle to make the framework “behave well” in every environment. Also think about the conventions will have to impose on the app’s creator that wants to use e.g. Flask stuff or AppDaemon stuff correctly.

In conclusion IMHO this library should focus on integrating with 1 framework/application server/whatever environment it runs in, since the real “interface” for this library is the Hermes protocol.
As much as I am personally interested in using AppDaemon – as a Home Assistant user I find it extremely convenient – any framework/environment will do. If someone wants to use another framework/environment it will require a new library.

However…

Beware: brainstorming stuff that came out of my mind almost as it was. I was tempted to not include this paragraph, but I’ll leave it here for whatever discussion it might generate :slight_smile:

If we really want to make a “general purpose” library, we could implement something like a core module that handles Hermes business logic, and then implement adapters that will make the library plug into other frameworks/application servers/environment. Those adapters would be in charge of publishing/subscribing to MQTT using the framework/environment they are supposed to interact with. An example workflow would be (I might be talking in AppDaemon terms, but I believe the same concepts can be applied to any framework/environment):

  1. The app would register to events using ways and methods provided by the library
  2. The library will use the adapter to implement that “listen to events” (the adapter will provide an interface for doing that) (*1)
  3. The adapter would listen to events by using the means provided by the enviroment
  4. The adapter will receive calls (service requests from the app or events from MQTT) that will be handled using business logic in the core module
  5. The core module will return something to the adapter (or signal it some how) that it handled the call
  6. The adapter will return the result to the app

A similar approach could be used for service requests (not event-based, think about service calls).

(*1) this part is where the library would scan for annotations and subscribes to topics accordingly, without actually knowing how, because the adapter would take care of it.

This seems an overengineered and overly complex design to me. Also I might have missed something because this is the result of a 30 minutes brainstorming. Or I might just be a bad designer :slight_smile: but to me the simple fact that it seems to be very complex makes me think that maybe it’s not worth the effort (in relation to the benefits).

In my next iteration of the proof of concept of the standalone app I’m now subclassing the HermesClient class and using the cli module, both from the Rhasspy Hermes library, which gives me all this functionality for free.

No that was not my intention. We don’t need HTTP endpoints and all this web stuff in the apps, we are using MQTT. I was just referring to the style of the API: creating a Flask object, attaching decorators of this object to functions and then running the object. This in contrast to subclassing a class of the API, attaching decorators to methods of this class and then creating an object of this subclass. As you already said, it’s not a huge difference in practice, but I feel that for casual developers this style is easier to reason about.

After thinking about this for a while, I’m also sure now that trying to create a framework that can be used both standalone and in AppDaemon will be too complex with too many compromises. It would be like driving a square peg in a round hole.

The “brainstorming stuff” that you wrote is exactly what I came up with, but it gave me nightmares about the time when I was designing bloated Java programs with factories and visitors and strategies and so on :slight_smile:

So for now I’m focusing on a library for standalone apps, because this is also the way to go if we want to deliver Rhasspy apps as Docker containers for better security. As you say, the real interface is the Hermes protocol. I see this library really as a “helper” library, a wrapper around the Rhasspy Hermes library to eliminate as much boilerplate code as possible and let developers create simple apps in a few lines of code.

4 Likes

Damn for a moment there I hoped you would go for AppDaemon eheh :smiley:
I feel like an AppDaemon plugin for Hermes is really missing, I’d like to implement it anyway. After our POCs will be out (actually if you can share what you have even if incomplete would be great), we can discuss using similar approaches so developers may easily switch from one to the other (such as naming convention stuff, e.g. name of the annotations, anything that can be shared really). I believe this could benefit everyone especially because we’ll have two different point of views to compare and work. What do you think?

Thanks for your work on this.

I’m now in my third iteration of the proof of concept, each time I had to start from scratch. First class-based, then Flask-like, now Flask-like using HermesClient and cli. I think this time it will result in a good foundation for the library. I hope this is ready and tested in a few days (sadly not much free time these days to program) and then I’ll publish it. Don’t expect too much of it yet, it’s just one or two Hermes messages wrapped to prove the approach is viable.

After that, It’s a good idea indeed to discuss some conventions so the APIs can share as much as possible in their naming, philosophy, …

By the way, do you mean “AppDaemon plugin for Hermes” or “Hermes plugin for AppDaemon”? I didn’t want to suggest the latter to you in my previous message because I didn’t know whether you wanted to implement this, but I actually think this could be a good approach to develop Rhasspy/Hermes functionality for AppDaemon: a dedicated Hermes plugin (like the HASS and MQTT plugins) to hide the MQTT details from Rhasspy app developers.

I meant Hermes plugin for AppDaemon, possibly by extending the already existing MQTT plugin (or by depending on it, I haven’t decided nor investigated yet).

I’m not sure whether to use an AppDaemon approach (i.e. listen_event, so callback-based) or an annotation approach. The latter would be easier to use (I would have preferred that AppDaemon itself was designed that way, but that’s another story) and more aligned with your design, but it would diverge from how AppDaemon recommends its apps should be implemented (EDIT: actually I could just do both, it doesn’t require that much more of an effort).

Ok, then we’re talking about the same, I just call it the other way around :slight_smile:

Yes, I realize it now. I’m sorry, English is not my mother tongue, sometimes I get lost in these kind of particulars.

So I’m finally happy with the API and I just published the first prototype: rhasspy-hermes-app.

For now this only supports the intent decorator to let a function react to an intent. In this function, you should return an EndSession object with a text string to end the current session. I did this so you shouldn’t have to refer to the intent’s session ID to end the session, because almost always you just want to end the current session and you don’t want to know anything about session IDs.

There’s much more to do of course to make this a usable library (but you can use it, the example app in the repository is already a little useful app). I have created an initial TODO list at the end of the README. Have a look, and I’d love to hear anyone’s opinion. Criticism of the current approach, wanted features, … just let me know here or open an issue in the repository.

5 Likes

Nice work! Thanks :slight_smile:

Ok, that makes sense. Also, since we said that state would be kept in custom data (any state? All of it?), all the app would need to do is just read custom data from the intent object.

I noticed this in the TODO list:

Let the app load its intents/slots/… from a file and re-train Rhasspy on installation/startup of the app.

How should this happen? Via Rhasspy API right? Like a service for requesting a new sentences file dedicated to the app. Maybe some utilities inside the library would be better for this, i.e. avoid the app to let it use Rhasspy client functions directly, mainly to avoid conflicts between apps, I was thinking of something like a “namespace” concept for apps (that would ultimately end up in the filename, e.g. namespace_appname_sentences.ini - ugly I know, but I think you get the idea).


I’ve begun experimenting with the AppDaemon plugin. I’m going to publish it to my public repository soon so we’ll compare notes. I’m using a classic AppDaemon approach for now (events/services), I’ll extend it to annotations like yours later. Maybe I should open a new topic for that :slight_smile:

Indeed. I will also implement a ContinueSession object and test a session that forwards custom data in a flow of a startSession, continueSession and endSession message.

The REST API has a /api/sentences URL which you can POST sentences to, but I don’t believe this is possible yet with the Hermes protocol. I opened an issue. @synesthesiam what do you think of this?

A namespace for sentences.ini files is a good idea. Probably also for intent names.

I’m looking forward to it!

Do we really want to put configuration APIs (not directly related to a pure messaging function) to MQTT? I mean wouldn’t it better to just let the app contact the HTTP API (via utility functions provided by the library maybe)?
I’m aware this will complicate things (the app would need HTTP credentials, for a start), but we’ll be polluting MQTT with configuration services. I don’t know, it doesn’t sound right… what do you think?

I see your point, but requiring apps or a library to use both HTTP and MQTT seems more convoluted, and is definitely more error-prone. And if the HTTP API offers this functionality, I don’t see why the MQTT API cannot offer this too.

I consider every interaction with Rhasspy as a messaging function, also configuring intents, I don’t see it as polluting. But that’s maybe because I’m much more comfortable with MQTT than with HTTP :slight_smile: This is not to say that every single aspect of Rhasspy should be configurable using MQTT messages, but adding intents/sentences seems like a common enough use case.

By the way, I have edited the issue I raised in the rhasspy-hermes repository and added some thoughts about how to handle namespace, because that’s important if this will be implemented in the MQTT API. Maybe you have some thoughts about it too.

1 Like

How about doing that during an earlier stage? Something like the setup stage. I mean configuring and training should happen only when something is installed or changed right? Normal app execution shouldn’t need this. We’ll let the build/setup infrastructure do this instead of doing it from inside the app code (I still have to think how, but I think you get the idea; it would have to be done outside the possible Docker container of the app, of course).

My main concern is exposing the MQTT system to potentially privileged operations. Privileged as in modifying configuration stuff. I understand some MQTT brokers implement ACLs and other authorization mechanisms, but this should concern Rhasspy itself. Rhasspy will have (one day I hope) an authorization layer for its HTTP API. It won’t be as easy or possible to do the same thing with MQTT.

Anyway, it’s not a big deal in the end (if I don’t want it, I would just ban the topic in mosquitto and do the training manually :smiley: or maybe I’m just a paranoid lol), but I just thought it would need proper attention before moving on.

Yes, that’s how Snips did it with the snips-skill-server. But then we need some rhasspy-skill-server that gets the sentences.ini file from an app you install, and this server has to communicate the content of this file to the NLU and ASR services, which potentially run on another machine. So then you’re back to MQTT or HTTP :slight_smile:

Privileged operations is also what I was talking about in my additions to the issue I linked to above. With an ACL and authentication this is quite easy to contain. Actually I have been running my example app this way for the past few weeks: it can only subscribe to one specific MQTT topic and publish on one other MQTT topic with this ACL file:

user rhasspy-app-time
topic read hermes/intent/GetTime
topic write hermes/dialogueManager/endSession

Ok course. It would still be a remote call anyway. Besides the protocol used, I was talking about not letting the app code do this, instead do it from a privileged account upfront. But as I said maybe I’m just a little too paranoid :slight_smile:

1 Like

Hi @koan

As I would like to implement an intent to manage Google Assistant searches (in the spirit of what is described here), I’m very interested by your framework proposal.

Unfortunately, I must be doing something wrong because I can not even run the time_app demo.

python3 time_app.py
  File "time_app.py", line 14
    return app.EndSession(f"It's {now}")
                                      ^
SyntaxError: invalid syntax

Do you have any clue?
fx

What Python version are you running (python3 --version)? You need Python 3.6 to use f-strings, and the dependency rhasspy-hermes needs it too. I have added this requirement to the installation instructions in the README. Note that the next version of rhasspy-hermes will need Python 3.7.

If it’s the f-string your Python is complaining about, you can always try if it works when replacing that line by:

return app.EndSession("It's " + now)

Indeed, I had Python 3.5.3 (on Debian 9). I reinstalled with buster instead of stretch and now it’s working well :slight_smile:

By the way, what’s the way to capture the audio stream only (Google Assistant will do the ASR)?
Will you have a way for this in your framework?

Not yet, the current code is just a proof of concept (that’s also why I have no tests, documentation or a PyPI package yet), so for now you can only listen to intents, which is what probably 90% of the apps would use :slight_smile:

But the HermesApp class subclasses the HermesClient class from the Rhasspy Hermes library, so you can definitely use it to capture the audio stream. It’s the hermes/audioServer/<SITE_ID>/audioFrame topic you have to subscribe to.

If you want this feature, maybe open an issue with a short explanation of why you need it and how exactly you’d want to use it. We can discuss the specifics there.