Helper library to develop Rhasspy apps in Python

Hi @koan

As I would like to implement an intent to manage Google Assistant searches (in the spirit of what is described here), I’m very interested by your framework proposal.

Unfortunately, I must be doing something wrong because I can not even run the time_app demo.

python3 time_app.py
  File "time_app.py", line 14
    return app.EndSession(f"It's {now}")
                                      ^
SyntaxError: invalid syntax

Do you have any clue?
fx

What Python version are you running (python3 --version)? You need Python 3.6 to use f-strings, and the dependency rhasspy-hermes needs it too. I have added this requirement to the installation instructions in the README. Note that the next version of rhasspy-hermes will need Python 3.7.

If it’s the f-string your Python is complaining about, you can always try if it works when replacing that line by:

return app.EndSession("It's " + now)

Indeed, I had Python 3.5.3 (on Debian 9). I reinstalled with buster instead of stretch and now it’s working well :slight_smile:

By the way, what’s the way to capture the audio stream only (Google Assistant will do the ASR)?
Will you have a way for this in your framework?

Not yet, the current code is just a proof of concept (that’s also why I have no tests, documentation or a PyPI package yet), so for now you can only listen to intents, which is what probably 90% of the apps would use :slight_smile:

But the HermesApp class subclasses the HermesClient class from the Rhasspy Hermes library, so you can definitely use it to capture the audio stream. It’s the hermes/audioServer/<SITE_ID>/audioFrame topic you have to subscribe to.

If you want this feature, maybe open an issue with a short explanation of why you need it and how exactly you’d want to use it. We can discuss the specifics there.

For Rhasspy 2.5, I’ve added a rhasspy/asr/<SITE_ID>/<SESSION_ID>/audioCaptured message that lets you get a hold of the recorded WAV data from a voice command for a session :slight_smile:

2 Likes

Hi @synesthesiam

Thanks for the tip. Unfortunately, I have the feeling that rhasspy-dialogue-hermes doesn’t set the appropriate flag to True when handling the ContinueSession message

See the code below (using default value which defaults to False if I understand well the code)

        # Start ASR listening
        _LOGGER.debug("Listening for session %s", self.session.session_id)
        yield AsrStartListening(
            site_id=self.session.site_id, session_id=self.session.session_id
        )

Maybe we need an additionnal flag in CotinueSession to notify if we want to receive the audioCaptured message?

In the meantime, I guess that I have to go with audioFrame…

Finally I got something working with the following approach.

Wakeword -> say “Ask Google” (ASR/NLU) -> in the on_intent function, publish continueSession (text=“What do you want to ask?”) -> once ASR is started, detect and store audio frame until ASR stops -> store a wav file from the audio frames -> trigger Google Assistant with the input wav and get the response in wav format -> open the wav file and publish it to the site_id to get audio feedback.

It’s likely not the most optimal (and surely not the most beautiful) piece of code! but take it as a proof of concept :slight_smile:

The great thing is that @koan framework made the starting part very easy. Thanks a lot for the good job!
Do you plan to add additional decorators to handle other messages than /hernes/intent?

What I found not so easy is to figure out how to publish some messages like AudioPlayBytes. Maybe rhasspy-hermes should provide more app-level API (instead of having to call publish myself which is too “low-level” in my opinion)?

2 Likes

Nice!

Yes, I do. I don’t know if it makes sense to handle all of the message types this way, but definitely for the most common ones.

Yes that was also the reason why I added the EndSession class and decided to hide publishing the message in the decorator so you could just return such and object and it will be published. You can always open an issue in the repository with a proposal of how you would hide these low-level details for other types of messages. There’s still much to implement in rhasspy-hermes-app.

1 Like

@koan I played around with your API and created a simple Akinator (the guess a person by yes/no questions game) app using some Node api and a horrible way to use it from Python. (See https://github.com/DanielWe2/rhasspy-hermes-app/commit/d039fcc59c2ab44958f79c852f6da330afc7a232)

It was really simple using your API. I have some questions though:

  • Most of my intents do basically the same. Is there a way to have one handler for multiple intents and figure out the intent name in the handler?
  • How do I handle the case when the user answers something that doesn’t match the intents from intent_filter? Currently it breaks the game. I would like play some help message in that case.
  • The game uses very generic intents like “yes” . We would need a way to only enable them in Rhasspy once the game has started.

Personally I prefer many short handlers instead of one big handler with an if/elif/else block, but I can see your point that for the one-line handlers in your example the latter approach can be useful. So you want a handler that runs on all intents or only on a specific list of intents?

I can add a decorator for the intentNotRecognized message. You can then let a handler react to this with a help message.

I think I saw a discussion about this in the last few weeks, but I can’t remember where (GitHub or the forum here). Rhasspy should indeed have a way to configure a specific intent as disabled by default.

I personally use something like GetWeather* right now to catch all weather related intents. All intents seems like it wouldn’t be all that useful but a way to either use a wildcard or add multiple decorators (one for each intent) or a list of intents to catch would be good.

My idea was to put a dict mapping intent names to answer “ids” (for the external api) at the top of my script and one handler for all answer intents and have less duplication of intent names that way.

But I thought a little further and figured out that I could possible combine all possible answers into one intent with multiple values slot values.

But I second Daenaras suggestion for a prefix (or regex match) for selection of intents. (A second decorator like on_intent_by_regex or a second parameter for on_intent would be a possibility). Ideally also for the intent_filter in ContinueSession.

That would be great.

If the goal with this module is to provide a way to build self contained apps/skills for Rhasspy, possibly combined with a community repository ( integrated into Rhasspy?) to share those, intent/slot management becomes a important topic.

Points are:

  • Every app should be able to add intents/slots (also with multiple translations)
  • There needs to be a way to protect against collision of intent names (Just prefix the app name by default as a simple form of name spaces? By default an app can only handle it’s own intents?)
  • We have the intent filter to filter which intents to handle in a dialog session. But I think we also need the opposite: Have intents that only trigger when used in an intent filter. That would allow apps to use pretty common phrases like “yes”, “no” without any collision issues.
    • Something like “global” or “always on” intents and intents that only trigger when in a active session with an app.
  • The same is valid for slots
  • To give the user transparency and control it would be good if the Rhasspy UI would show intents installed by apps in a special menu grouped by apps. Maybe even allow to modify/disable them.

A way to integrate a configuration page for an app into the UI (also to disable it) would make it more user friendly and ties in with the whole configure by UI concept of Rhasspy.

I thought about writing simple Home Assistant app using your service and the home assistant api. But without adding intents/slots that’s not yet possible. How much work is need on the Rhasspy side to make that possible?

I thought about this point and some of your other points too, I opened an issue about this a week ago. Maybe you can chime in there with your remarks/ideas, because this feature definitely requires some changes in Rhasspy.

Great job :+1:
I developed a more or less similar solution, but want to give your solution a try. So I ported some of my work and it works so far.
In my case I need additional arguments to run my app, but the arguments parser isn’t accessible. The better solution could be adding the parser as an optional parameter like this:

def __init__(self, name: str, parser: argparse.ArgumentParser = None):
    """Initialize the Rhasspy Hermes app."""
    if parser is None:
        parser = argparse.ArgumentParser(prog=name)

With this I can add arguments before starting the app.
What do you think about?

1 Like

Yes, this was already in the back of my mind, this looks like a good solution. I added your change, thanks!

There seems to be enough interest in this library, I’ll see if I can publish a first package on PyPI one of these days, then it’s easier to use it in your projects.

2 Likes

An other change I would made, especially for “not native python speaker” like me, is to type the incoming intent in your example.

@app.on_intent("GetTime")
def get_time(intent: NluIntent):

So it’s easier for beginners to understand what kind of data comes in and what properties i have. Otherwise you must understand what your code is doing and where the data comes from.

Isn’t a must but very helpful.

1 Like

Good idea. At the moment I’m documenting the Rhasspy Hermes library, which defines all these classes. When the documentation is published, it will be clearer too what data Rhasspy Hermes App expects. Afterwards, I will work on documenting Rhasspy Hermes App.

1 Like

I forked your project and made same changes to subscribe raw topics. I need this for handle other events on my mqtt broker. I created a pull request, so feel free to comment, or change, or deny.

1 Like

Thanks! I’ll have a look tomorrow.

I think topic decorators should not include the topic itself as a string but rather be a specific decorator:

@on_topic(‘hermes/dialogueManager/sessionEnded’)

should be:

@onSessionEnded

This will allow to hide the underlying topic names so they can be changed without impacting the dependent code.

Just an intuition. What do you think?