Do we really want to put configuration APIs (not directly related to a pure messaging function) to MQTT? I mean wouldn’t it better to just let the app contact the HTTP API (via utility functions provided by the library maybe)?
I’m aware this will complicate things (the app would need HTTP credentials, for a start), but we’ll be polluting MQTT with configuration services. I don’t know, it doesn’t sound right… what do you think?
I see your point, but requiring apps or a library to use both HTTP and MQTT seems more convoluted, and is definitely more error-prone. And if the HTTP API offers this functionality, I don’t see why the MQTT API cannot offer this too.
I consider every interaction with Rhasspy as a messaging function, also configuring intents, I don’t see it as polluting. But that’s maybe because I’m much more comfortable with MQTT than with HTTP This is not to say that every single aspect of Rhasspy should be configurable using MQTT messages, but adding intents/sentences seems like a common enough use case.
By the way, I have edited the issue I raised in the rhasspy-hermes repository and added some thoughts about how to handle namespace, because that’s important if this will be implemented in the MQTT API. Maybe you have some thoughts about it too.
How about doing that during an earlier stage? Something like the setup stage. I mean configuring and training should happen only when something is installed or changed right? Normal app execution shouldn’t need this. We’ll let the build/setup infrastructure do this instead of doing it from inside the app code (I still have to think how, but I think you get the idea; it would have to be done outside the possible Docker container of the app, of course).
My main concern is exposing the MQTT system to potentially privileged operations. Privileged as in modifying configuration stuff. I understand some MQTT brokers implement ACLs and other authorization mechanisms, but this should concern Rhasspy itself. Rhasspy will have (one day I hope) an authorization layer for its HTTP API. It won’t be as easy or possible to do the same thing with MQTT.
Anyway, it’s not a big deal in the end (if I don’t want it, I would just ban the topic in mosquitto and do the training manually or maybe I’m just a paranoid lol), but I just thought it would need proper attention before moving on.
Yes, that’s how Snips did it with the snips-skill-server
. But then we need some rhasspy-skill-server
that gets the sentences.ini
file from an app you install, and this server has to communicate the content of this file to the NLU and ASR services, which potentially run on another machine. So then you’re back to MQTT or HTTP
Privileged operations is also what I was talking about in my additions to the issue I linked to above. With an ACL and authentication this is quite easy to contain. Actually I have been running my example app this way for the past few weeks: it can only subscribe to one specific MQTT topic and publish on one other MQTT topic with this ACL file:
user rhasspy-app-time
topic read hermes/intent/GetTime
topic write hermes/dialogueManager/endSession
Ok course. It would still be a remote call anyway. Besides the protocol used, I was talking about not letting the app code do this, instead do it from a privileged account upfront. But as I said maybe I’m just a little too paranoid
Hi @koan
As I would like to implement an intent to manage Google Assistant searches (in the spirit of what is described here), I’m very interested by your framework proposal.
Unfortunately, I must be doing something wrong because I can not even run the time_app demo.
python3 time_app.py
File "time_app.py", line 14
return app.EndSession(f"It's {now}")
^
SyntaxError: invalid syntax
Do you have any clue?
fx
What Python version are you running (python3 --version
)? You need Python 3.6 to use f-strings, and the dependency rhasspy-hermes needs it too. I have added this requirement to the installation instructions in the README. Note that the next version of rhasspy-hermes will need Python 3.7.
If it’s the f-string your Python is complaining about, you can always try if it works when replacing that line by:
return app.EndSession("It's " + now)
Indeed, I had Python 3.5.3 (on Debian 9). I reinstalled with buster instead of stretch and now it’s working well
By the way, what’s the way to capture the audio stream only (Google Assistant will do the ASR)?
Will you have a way for this in your framework?
Not yet, the current code is just a proof of concept (that’s also why I have no tests, documentation or a PyPI package yet), so for now you can only listen to intents, which is what probably 90% of the apps would use
But the HermesApp
class subclasses the HermesClient class from the Rhasspy Hermes library, so you can definitely use it to capture the audio stream. It’s the hermes/audioServer/<SITE_ID>/audioFrame
topic you have to subscribe to.
If you want this feature, maybe open an issue with a short explanation of why you need it and how exactly you’d want to use it. We can discuss the specifics there.
For Rhasspy 2.5, I’ve added a rhasspy/asr/<SITE_ID>/<SESSION_ID>/audioCaptured
message that lets you get a hold of the recorded WAV data from a voice command for a session
Thanks for the tip. Unfortunately, I have the feeling that rhasspy-dialogue-hermes doesn’t set the appropriate flag to True when handling the ContinueSession message
See the code below (using default value which defaults to False if I understand well the code)
# Start ASR listening
_LOGGER.debug("Listening for session %s", self.session.session_id)
yield AsrStartListening(
site_id=self.session.site_id, session_id=self.session.session_id
)
Maybe we need an additionnal flag in CotinueSession to notify if we want to receive the audioCaptured message?
In the meantime, I guess that I have to go with audioFrame…
Finally I got something working with the following approach.
Wakeword -> say “Ask Google” (ASR/NLU) -> in the on_intent function, publish continueSession (text=“What do you want to ask?”) -> once ASR is started, detect and store audio frame until ASR stops -> store a wav file from the audio frames -> trigger Google Assistant with the input wav and get the response in wav format -> open the wav file and publish it to the site_id to get audio feedback.
It’s likely not the most optimal (and surely not the most beautiful) piece of code! but take it as a proof of concept
The great thing is that @koan framework made the starting part very easy. Thanks a lot for the good job!
Do you plan to add additional decorators to handle other messages than /hernes/intent?
What I found not so easy is to figure out how to publish some messages like AudioPlayBytes. Maybe rhasspy-hermes should provide more app-level API (instead of having to call publish myself which is too “low-level” in my opinion)?
Nice!
Yes, I do. I don’t know if it makes sense to handle all of the message types this way, but definitely for the most common ones.
Yes that was also the reason why I added the EndSession
class and decided to hide publishing the message in the decorator so you could just return
such and object and it will be published. You can always open an issue in the repository with a proposal of how you would hide these low-level details for other types of messages. There’s still much to implement in rhasspy-hermes-app.
@koan I played around with your API and created a simple Akinator (the guess a person by yes/no questions game) app using some Node api and a horrible way to use it from Python. (See https://github.com/DanielWe2/rhasspy-hermes-app/commit/d039fcc59c2ab44958f79c852f6da330afc7a232)
It was really simple using your API. I have some questions though:
- Most of my intents do basically the same. Is there a way to have one handler for multiple intents and figure out the intent name in the handler?
- How do I handle the case when the user answers something that doesn’t match the intents from intent_filter? Currently it breaks the game. I would like play some help message in that case.
- The game uses very generic intents like “yes” . We would need a way to only enable them in Rhasspy once the game has started.
Personally I prefer many short handlers instead of one big handler with an if/elif/else block, but I can see your point that for the one-line handlers in your example the latter approach can be useful. So you want a handler that runs on all intents or only on a specific list of intents?
I can add a decorator for the intentNotRecognized
message. You can then let a handler react to this with a help message.
I think I saw a discussion about this in the last few weeks, but I can’t remember where (GitHub or the forum here). Rhasspy should indeed have a way to configure a specific intent as disabled by default.
I personally use something like GetWeather* right now to catch all weather related intents. All intents seems like it wouldn’t be all that useful but a way to either use a wildcard or add multiple decorators (one for each intent) or a list of intents to catch would be good.
My idea was to put a dict mapping intent names to answer “ids” (for the external api) at the top of my script and one handler for all answer intents and have less duplication of intent names that way.
But I thought a little further and figured out that I could possible combine all possible answers into one intent with multiple values slot values.
But I second Daenaras suggestion for a prefix (or regex match) for selection of intents. (A second decorator like on_intent_by_regex or a second parameter for on_intent would be a possibility). Ideally also for the intent_filter in ContinueSession.
That would be great.
If the goal with this module is to provide a way to build self contained apps/skills for Rhasspy, possibly combined with a community repository ( integrated into Rhasspy?) to share those, intent/slot management becomes a important topic.
Points are:
- Every app should be able to add intents/slots (also with multiple translations)
- There needs to be a way to protect against collision of intent names (Just prefix the app name by default as a simple form of name spaces? By default an app can only handle it’s own intents?)
- We have the intent filter to filter which intents to handle in a dialog session. But I think we also need the opposite: Have intents that only trigger when used in an intent filter. That would allow apps to use pretty common phrases like “yes”, “no” without any collision issues.
- Something like “global” or “always on” intents and intents that only trigger when in a active session with an app.
- The same is valid for slots
- To give the user transparency and control it would be good if the Rhasspy UI would show intents installed by apps in a special menu grouped by apps. Maybe even allow to modify/disable them.
A way to integrate a configuration page for an app into the UI (also to disable it) would make it more user friendly and ties in with the whole configure by UI concept of Rhasspy.
I thought about writing simple Home Assistant app using your service and the home assistant api. But without adding intents/slots that’s not yet possible. How much work is need on the Rhasspy side to make that possible?
I thought about this point and some of your other points too, I opened an issue about this a week ago. Maybe you can chime in there with your remarks/ideas, because this feature definitely requires some changes in Rhasspy.
Great job
I developed a more or less similar solution, but want to give your solution a try. So I ported some of my work and it works so far.
In my case I need additional arguments to run my app, but the arguments parser isn’t accessible. The better solution could be adding the parser as an optional parameter like this:
def __init__(self, name: str, parser: argparse.ArgumentParser = None):
"""Initialize the Rhasspy Hermes app."""
if parser is None:
parser = argparse.ArgumentParser(prog=name)
With this I can add arguments before starting the app.
What do you think about?
Yes, this was already in the back of my mind, this looks like a good solution. I added your change, thanks!
There seems to be enough interest in this library, I’ll see if I can publish a first package on PyPI one of these days, then it’s easier to use it in your projects.