Helper library to develop Rhasspy apps in Python

koan · May 22, 2020, 2:03pm

Ok, then we’re talking about the same, I just call it the other way around

daniele_athome · May 22, 2020, 2:21pm

Yes, I realize it now. I’m sorry, English is not my mother tongue, sometimes I get lost in these kind of particulars.

koan · May 23, 2020, 7:01pm

So I’m finally happy with the API and I just published the first prototype: rhasspy-hermes-app.

For now this only supports the intent decorator to let a function react to an intent. In this function, you should return an EndSession object with a text string to end the current session. I did this so you shouldn’t have to refer to the intent’s session ID to end the session, because almost always you just want to end the current session and you don’t want to know anything about session IDs.

There’s much more to do of course to make this a usable library (but you can use it, the example app in the repository is already a little useful app). I have created an initial TODO list at the end of the README. Have a look, and I’d love to hear anyone’s opinion. Criticism of the current approach, wanted features, … just let me know here or open an issue in the repository.

daniele_athome · May 25, 2020, 1:17pm

Nice work! Thanks

Ok, that makes sense. Also, since we said that state would be kept in custom data (any state? All of it?), all the app would need to do is just read custom data from the intent object.

I noticed this in the TODO list:

Let the app load its intents/slots/… from a file and re-train Rhasspy on installation/startup of the app.

How should this happen? Via Rhasspy API right? Like a service for requesting a new sentences file dedicated to the app. Maybe some utilities inside the library would be better for this, i.e. avoid the app to let it use Rhasspy client functions directly, mainly to avoid conflicts between apps, I was thinking of something like a “namespace” concept for apps (that would ultimately end up in the filename, e.g. namespace_appname_sentences.ini - ugly I know, but I think you get the idea).

I’ve begun experimenting with the AppDaemon plugin. I’m going to publish it to my public repository soon so we’ll compare notes. I’m using a classic AppDaemon approach for now (events/services), I’ll extend it to annotations like yours later. Maybe I should open a new topic for that

koan · May 25, 2020, 2:19pm

Indeed. I will also implement a ContinueSession object and test a session that forwards custom data in a flow of a startSession, continueSession and endSession message.

The REST API has a /api/sentences URL which you can POST sentences to, but I don’t believe this is possible yet with the Hermes protocol. I opened an issue. @synesthesiam what do you think of this?

A namespace for sentences.ini files is a good idea. Probably also for intent names.

I’m looking forward to it!

daniele_athome · May 25, 2020, 2:36pm

Do we really want to put configuration APIs (not directly related to a pure messaging function) to MQTT? I mean wouldn’t it better to just let the app contact the HTTP API (via utility functions provided by the library maybe)?
I’m aware this will complicate things (the app would need HTTP credentials, for a start), but we’ll be polluting MQTT with configuration services. I don’t know, it doesn’t sound right… what do you think?

koan · May 25, 2020, 2:46pm

I see your point, but requiring apps or a library to use both HTTP and MQTT seems more convoluted, and is definitely more error-prone. And if the HTTP API offers this functionality, I don’t see why the MQTT API cannot offer this too.

I consider every interaction with Rhasspy as a messaging function, also configuring intents, I don’t see it as polluting. But that’s maybe because I’m much more comfortable with MQTT than with HTTP This is not to say that every single aspect of Rhasspy should be configurable using MQTT messages, but adding intents/sentences seems like a common enough use case.

By the way, I have edited the issue I raised in the rhasspy-hermes repository and added some thoughts about how to handle namespace, because that’s important if this will be implemented in the MQTT API. Maybe you have some thoughts about it too.

daniele_athome · May 25, 2020, 3:15pm

How about doing that during an earlier stage? Something like the setup stage. I mean configuring and training should happen only when something is installed or changed right? Normal app execution shouldn’t need this. We’ll let the build/setup infrastructure do this instead of doing it from inside the app code (I still have to think how, but I think you get the idea; it would have to be done outside the possible Docker container of the app, of course).

My main concern is exposing the MQTT system to potentially privileged operations. Privileged as in modifying configuration stuff. I understand some MQTT brokers implement ACLs and other authorization mechanisms, but this should concern Rhasspy itself. Rhasspy will have (one day I hope) an authorization layer for its HTTP API. It won’t be as easy or possible to do the same thing with MQTT.

Anyway, it’s not a big deal in the end (if I don’t want it, I would just ban the topic in mosquitto and do the training manually or maybe I’m just a paranoid lol), but I just thought it would need proper attention before moving on.

koan · May 25, 2020, 3:27pm

Yes, that’s how Snips did it with the snips-skill-server. But then we need some rhasspy-skill-server that gets the sentences.ini file from an app you install, and this server has to communicate the content of this file to the NLU and ASR services, which potentially run on another machine. So then you’re back to MQTT or HTTP

Privileged operations is also what I was talking about in my additions to the issue I linked to above. With an ACL and authentication this is quite easy to contain. Actually I have been running my example app this way for the past few weeks: it can only subscribe to one specific MQTT topic and publish on one other MQTT topic with this ACL file:

user rhasspy-app-time
topic read hermes/intent/GetTime
topic write hermes/dialogueManager/endSession

daniele_athome · May 25, 2020, 3:43pm

Ok course. It would still be a remote call anyway. Besides the protocol used, I was talking about not letting the app code do this, instead do it from a privileged account upfront. But as I said maybe I’m just a little too paranoid

tuxedo78 · May 27, 2020, 6:37am

Hi @koan

As I would like to implement an intent to manage Google Assistant searches (in the spirit of what is described here), I’m very interested by your framework proposal.

Unfortunately, I must be doing something wrong because I can not even run the time_app demo.

python3 time_app.py
  File "time_app.py", line 14
    return app.EndSession(f"It's {now}")
                                      ^
SyntaxError: invalid syntax

Do you have any clue?
fx

koan · May 27, 2020, 6:40am

What Python version are you running (python3 --version)? You need Python 3.6 to use f-strings, and the dependency rhasspy-hermes needs it too. I have added this requirement to the installation instructions in the README. Note that the next version of rhasspy-hermes will need Python 3.7.

If it’s the f-string your Python is complaining about, you can always try if it works when replacing that line by:

return app.EndSession("It's " + now)

tuxedo78 · May 27, 2020, 9:46am

Indeed, I had Python 3.5.3 (on Debian 9). I reinstalled with buster instead of stretch and now it’s working well

By the way, what’s the way to capture the audio stream only (Google Assistant will do the ASR)?
Will you have a way for this in your framework?

koan · May 27, 2020, 9:58am

Not yet, the current code is just a proof of concept (that’s also why I have no tests, documentation or a PyPI package yet), so for now you can only listen to intents, which is what probably 90% of the apps would use

But the HermesApp class subclasses the HermesClient class from the Rhasspy Hermes library, so you can definitely use it to capture the audio stream. It’s the hermes/audioServer/<SITE_ID>/audioFrame topic you have to subscribe to.

If you want this feature, maybe open an issue with a short explanation of why you need it and how exactly you’d want to use it. We can discuss the specifics there.

synesthesiam · May 27, 2020, 7:52pm

For Rhasspy 2.5, I’ve added a rhasspy/asr/<SITE_ID>/<SESSION_ID>/audioCaptured message that lets you get a hold of the recorded WAV data from a voice command for a session

tuxedo78 · May 28, 2020, 8:59am

Hi @synesthesiam

Thanks for the tip. Unfortunately, I have the feeling that rhasspy-dialogue-hermes doesn’t set the appropriate flag to True when handling the ContinueSession message

See the code below (using default value which defaults to False if I understand well the code)

        # Start ASR listening
        _LOGGER.debug("Listening for session %s", self.session.session_id)
        yield AsrStartListening(
            site_id=self.session.site_id, session_id=self.session.session_id
        )

Maybe we need an additionnal flag in CotinueSession to notify if we want to receive the audioCaptured message?

In the meantime, I guess that I have to go with audioFrame…

tuxedo78 · May 29, 2020, 3:38pm

Finally I got something working with the following approach.

Wakeword -> say “Ask Google” (ASR/NLU) -> in the on_intent function, publish continueSession (text=“What do you want to ask?”) -> once ASR is started, detect and store audio frame until ASR stops -> store a wav file from the audio frames -> trigger Google Assistant with the input wav and get the response in wav format -> open the wav file and publish it to the site_id to get audio feedback.

It’s likely not the most optimal (and surely not the most beautiful) piece of code! but take it as a proof of concept

The great thing is that @koan framework made the starting part very easy. Thanks a lot for the good job!
Do you plan to add additional decorators to handle other messages than /hernes/intent?

What I found not so easy is to figure out how to publish some messages like AudioPlayBytes. Maybe rhasspy-hermes should provide more app-level API (instead of having to call publish myself which is too “low-level” in my opinion)?

koan · May 29, 2020, 7:27pm

Nice!

Yes, I do. I don’t know if it makes sense to handle all of the message types this way, but definitely for the most common ones.

Yes that was also the reason why I added the EndSession class and decided to hide publishing the message in the decorator so you could just return such and object and it will be published. You can always open an issue in the repository with a proposal of how you would hide these low-level details for other types of messages. There’s still much to implement in rhasspy-hermes-app.

DanielW · May 30, 2020, 2:39pm

@koan I played around with your API and created a simple Akinator (the guess a person by yes/no questions game) app using some Node api and a horrible way to use it from Python. (See https://github.com/DanielWe2/rhasspy-hermes-app/commit/d039fcc59c2ab44958f79c852f6da330afc7a232)

It was really simple using your API. I have some questions though:

Most of my intents do basically the same. Is there a way to have one handler for multiple intents and figure out the intent name in the handler?
How do I handle the case when the user answers something that doesn’t match the intents from intent_filter? Currently it breaks the game. I would like play some help message in that case.
The game uses very generic intents like “yes” . We would need a way to only enable them in Rhasspy once the game has started.

koan · May 30, 2020, 2:57pm

Personally I prefer many short handlers instead of one big handler with an if/elif/else block, but I can see your point that for the one-line handlers in your example the latter approach can be useful. So you want a handler that runs on all intents or only on a specific list of intents?

I can add a decorator for the intentNotRecognized message. You can then let a handler react to this with a help message.

I think I saw a discussion about this in the last few weeks, but I can’t remember where (GitHub or the forum here). Rhasspy should indeed have a way to configure a specific intent as disabled by default.