Helper library to develop Rhasspy apps in Python

synesthesiam · May 27, 2020, 7:52pm

For Rhasspy 2.5, I’ve added a rhasspy/asr/<SITE_ID>/<SESSION_ID>/audioCaptured message that lets you get a hold of the recorded WAV data from a voice command for a session

tuxedo78 · May 28, 2020, 8:59am

Hi @synesthesiam

Thanks for the tip. Unfortunately, I have the feeling that rhasspy-dialogue-hermes doesn’t set the appropriate flag to True when handling the ContinueSession message

See the code below (using default value which defaults to False if I understand well the code)

        # Start ASR listening
        _LOGGER.debug("Listening for session %s", self.session.session_id)
        yield AsrStartListening(
            site_id=self.session.site_id, session_id=self.session.session_id
        )

Maybe we need an additionnal flag in CotinueSession to notify if we want to receive the audioCaptured message?

In the meantime, I guess that I have to go with audioFrame…

tuxedo78 · May 29, 2020, 3:38pm

Finally I got something working with the following approach.

Wakeword -> say “Ask Google” (ASR/NLU) -> in the on_intent function, publish continueSession (text=“What do you want to ask?”) -> once ASR is started, detect and store audio frame until ASR stops -> store a wav file from the audio frames -> trigger Google Assistant with the input wav and get the response in wav format -> open the wav file and publish it to the site_id to get audio feedback.

It’s likely not the most optimal (and surely not the most beautiful) piece of code! but take it as a proof of concept

The great thing is that @koan framework made the starting part very easy. Thanks a lot for the good job!
Do you plan to add additional decorators to handle other messages than /hernes/intent?

What I found not so easy is to figure out how to publish some messages like AudioPlayBytes. Maybe rhasspy-hermes should provide more app-level API (instead of having to call publish myself which is too “low-level” in my opinion)?

koan · May 29, 2020, 7:27pm

Nice!

Yes, I do. I don’t know if it makes sense to handle all of the message types this way, but definitely for the most common ones.

Yes that was also the reason why I added the EndSession class and decided to hide publishing the message in the decorator so you could just return such and object and it will be published. You can always open an issue in the repository with a proposal of how you would hide these low-level details for other types of messages. There’s still much to implement in rhasspy-hermes-app.

DanielW · May 30, 2020, 2:39pm

@koan I played around with your API and created a simple Akinator (the guess a person by yes/no questions game) app using some Node api and a horrible way to use it from Python. (See https://github.com/DanielWe2/rhasspy-hermes-app/commit/d039fcc59c2ab44958f79c852f6da330afc7a232)

It was really simple using your API. I have some questions though:

Most of my intents do basically the same. Is there a way to have one handler for multiple intents and figure out the intent name in the handler?
How do I handle the case when the user answers something that doesn’t match the intents from intent_filter? Currently it breaks the game. I would like play some help message in that case.
The game uses very generic intents like “yes” . We would need a way to only enable them in Rhasspy once the game has started.

koan · May 30, 2020, 2:57pm

Personally I prefer many short handlers instead of one big handler with an if/elif/else block, but I can see your point that for the one-line handlers in your example the latter approach can be useful. So you want a handler that runs on all intents or only on a specific list of intents?

I can add a decorator for the intentNotRecognized message. You can then let a handler react to this with a help message.

I think I saw a discussion about this in the last few weeks, but I can’t remember where (GitHub or the forum here). Rhasspy should indeed have a way to configure a specific intent as disabled by default.

Daenara · May 30, 2020, 9:45pm

I personally use something like GetWeather* right now to catch all weather related intents. All intents seems like it wouldn’t be all that useful but a way to either use a wildcard or add multiple decorators (one for each intent) or a list of intents to catch would be good.

DanielW · May 31, 2020, 1:34pm

My idea was to put a dict mapping intent names to answer “ids” (for the external api) at the top of my script and one handler for all answer intents and have less duplication of intent names that way.

But I thought a little further and figured out that I could possible combine all possible answers into one intent with multiple values slot values.

But I second Daenaras suggestion for a prefix (or regex match) for selection of intents. (A second decorator like on_intent_by_regex or a second parameter for on_intent would be a possibility). Ideally also for the intent_filter in ContinueSession.

That would be great.

If the goal with this module is to provide a way to build self contained apps/skills for Rhasspy, possibly combined with a community repository ( integrated into Rhasspy?) to share those, intent/slot management becomes a important topic.

Points are:

Every app should be able to add intents/slots (also with multiple translations)
There needs to be a way to protect against collision of intent names (Just prefix the app name by default as a simple form of name spaces? By default an app can only handle it’s own intents?)
We have the intent filter to filter which intents to handle in a dialog session. But I think we also need the opposite: Have intents that only trigger when used in an intent filter. That would allow apps to use pretty common phrases like “yes”, “no” without any collision issues.
- Something like “global” or “always on” intents and intents that only trigger when in a active session with an app.
The same is valid for slots
To give the user transparency and control it would be good if the Rhasspy UI would show intents installed by apps in a special menu grouped by apps. Maybe even allow to modify/disable them.

A way to integrate a configuration page for an app into the UI (also to disable it) would make it more user friendly and ties in with the whole configure by UI concept of Rhasspy.

I thought about writing simple Home Assistant app using your service and the home assistant api. But without adding intents/slots that’s not yet possible. How much work is need on the Rhasspy side to make that possible?

koan · May 31, 2020, 1:45pm

I thought about this point and some of your other points too, I opened an issue about this a week ago. Maybe you can chime in there with your remarks/ideas, because this feature definitely requires some changes in Rhasspy.

H3adcra5h · June 3, 2020, 9:27am

Great job
I developed a more or less similar solution, but want to give your solution a try. So I ported some of my work and it works so far.
In my case I need additional arguments to run my app, but the arguments parser isn’t accessible. The better solution could be adding the parser as an optional parameter like this:

def __init__(self, name: str, parser: argparse.ArgumentParser = None):
    """Initialize the Rhasspy Hermes app."""
    if parser is None:
        parser = argparse.ArgumentParser(prog=name)

With this I can add arguments before starting the app.
What do you think about?

koan · June 3, 2020, 9:39am

Yes, this was already in the back of my mind, this looks like a good solution. I added your change, thanks!

There seems to be enough interest in this library, I’ll see if I can publish a first package on PyPI one of these days, then it’s easier to use it in your projects.

H3adcra5h · June 3, 2020, 10:12am

An other change I would made, especially for “not native python speaker” like me, is to type the incoming intent in your example.

@app.on_intent("GetTime")
def get_time(intent: NluIntent):

So it’s easier for beginners to understand what kind of data comes in and what properties i have. Otherwise you must understand what your code is doing and where the data comes from.

Isn’t a must but very helpful.

koan · June 3, 2020, 10:16am

Good idea. At the moment I’m documenting the Rhasspy Hermes library, which defines all these classes. When the documentation is published, it will be clearer too what data Rhasspy Hermes App expects. Afterwards, I will work on documenting Rhasspy Hermes App.

H3adcra5h · June 3, 2020, 2:44pm

I forked your project and made same changes to subscribe raw topics. I need this for handle other events on my mqtt broker. I created a pull request, so feel free to comment, or change, or deny.

koan · June 3, 2020, 3:08pm

Thanks! I’ll have a look tomorrow.

fastjack · June 3, 2020, 8:15pm

I think topic decorators should not include the topic itself as a string but rather be a specific decorator:

@on_topic(‘hermes/dialogueManager/sessionEnded’)

should be:

@onSessionEnded

This will allow to hide the underlying topic names so they can be changed without impacting the dependent code.

Just an intuition. What do you think?

koan · June 3, 2020, 9:07pm

Yes this was the whole point of the library, to hide this. The decorated function will get a Rhasspy Hermes object as an argument. But I can see that some people need a way to subscribe to MQTT topics, for instance for non-Hermes topics, and they like to do this the same way with a decorator. The decorated function then gets the topic and payload as arguments. So I think an on_topic decorator can be complementary.

H3adcra5h · June 4, 2020, 8:16am

That’s right if you only think in the rhasspy universe. To understand my intention for this decorator “hermes/dialogueManager/sessionEnded” is very bad example. I only use this for testing as an example. In my app I subscribe also topics which are not part of rhasspy. A own decorator for all these topics is too much work, not so easy to write and hard to bring all these decorators in the on_raw_message loop.
If you take a look at the second parameter of the decorator you will see a regex pattern for topic matching. This is the more important feature and the reason why the topic name is forwarded to the method.

I will give you an example how I use this in real life.
I have 2 3d printers. For long running jobs I have an intent to enable rhasspy to inform me if a printer has finished or goes in error state. Each printer has his own topic for reporting. The topic pattern is printer/<name of printer>/state.

Now I can subscribe a topic for each printer or I use a wildcard topic like printer/+/state. With this annotation @app.on_topic("printer/+/state", re.compile(r"^printer/([^/]+)/state")) I get both topics in one method. For differentiate between the printers I need the topic to extract the name from. And at least, if I get third printer I do not have to change my code.

This is only one example where I use wildcard topics. Maybe I’m the only one that needs such kind of generic topic subscription. If it’s not useful in general I would only refactor the app so that I can easier create a subclass to implement this feature only for me.

fastjack · June 4, 2020, 9:16am

@koan @H3adcra5h Fair enough! I see both your points. I was just reacting to the dialogueManager example and did not take the “custom” topic decorator into consideration.

Not a fan of providing both the subscribed topic and a RegExp. If the goal of the library is to ease the boilerplate for users, wouldn’t it be easier provide some kind of templating for the topic name like:

printer/{name}/state will subscribe to printer/+/state and upon message will extract the middle part of the topic and provide it as name.

printer/# will match any topic starting with printer/.

The decorated function can for example accept 3 arguments: topic, payload and properties (extracted from the topic using the above).

What do you think?

koan · June 4, 2020, 9:22am

Yes that was also my reaction on the pull request. The topic seems redundant if you use the regex pattern.

Your idea of templating seems interesting too. Will this be useful for your use cases, @H3adcra5h?