Ideas for multiple Intend Handlers (e.g. one per Intend)

orca8119 · April 8, 2020, 11:18am

Hello @all,

I thought a little bit on how multiple intend handler might fit into the Rhasspy 2.5 Service Architecture.

And that’s the result. It’s just a rough Idea and not all of this must be build together (or at all). Simply what to put this for discussion.

So let me explain a little bit:

The basic idea is, to replace the two intend handling services with a new one. Basically, some kind of dispatcher service, which prepares the event, acquire the corresponding handler and dispatches the event for execution.

The new one has:

Intend Handler Types, which are pretty much one or several classes matching a defined interface (e.g. create, setConfiguration, handleIntend). 3 in default (HTTP Server, Local Command, Home Assistant) additional handlers can be added via “skill repository” or per default later on (e.g. SEPIA - which I saw here). An Skill Handler might be a Python module/class, matching the correct interface which is the dynamically acquired either during start or during handling.
a “skill repository”, which is basically an Intend Handler Type (e.g. HTTP, local Command, Home Assistant Event, whatever) and an matching configuration for that handler (e.g. the command and arguments for the command handler, the home assistant url and access token for the home assistant event and the remote server for HTTP).
an intend to skill mapping. This gives the possibility to provide community based skills (e.g. Daenaras development) and Intends defined by the user (e.g. intend GetTime is mapped to “Skill” SayTime, possibly with additional parameters).

There are a couple of open points though. E.g. what is the skill repository (an directory with sub folders containing a manifest and a configuration).

PS: forgive me, if I:

Either missed something important on my analysis
made a couple of mistakes in my text (grammar and spelling wise) as I’m not a native speaker (as you already found out for sure)

PSS: I’ve checked rhasspyremote_http_hermes, and, if I understand right, it is a mixup of remote ASR, TTS, Wake and Intend handling. Is this possible?

winandus · April 8, 2020, 5:30pm

Hi,

I have been looking into something similar to this. I put it in a forum topic a few days ago, although that is already outdated by now. Have a look, maybe there’s something useful for you in there:

orca8119 · April 16, 2020, 1:53pm

My ideas didn’t catch much attention until now. So basically, I’ve tried and implemented a (more or less working) prototype with multiple intent handlers (one per intent by now) with some kind of a “skill-functionality” (pretty much a directory with a manifest.json defining the handler type and some required parameters by now)

Currenty, this is pretty much only a modification of the home-assistant-intent handler (because it currently brings all which is needed), with three build in handler types (home assistant, remote-http and local command) which can be defined in a local kind of repository.
The interface is not really clear, it is not really clear what a “skill” might be and so on.

@synesthesiam: I’m willing to contribute this development (for Ideas and/or development) if the dev team is (really) interested in this. So, if you are interested (in concept or source) please let me know.

RaspiManu · April 16, 2020, 4:41pm

I not that deep into programming to understand all of the things in this topic, but I think I understand most of it and what is planned and I really like it

synesthesiam · April 17, 2020, 9:04pm

Thanks for taking the time to look into this, @orca8119

Another approach I’ve considered is this: have a master intent handler (similar to what you’ve developed), but have it communicate privately with multiple other intent handlers when dispatching.

This seems possible by having Rhasspy start up each intent handler with a unique siteId (so it won’t interfere with other services), and then start the master intent handler with all of its child siteIds and the information needed for dispatch (e.g., siteId X is Home Assistant, siteId Y is local).

I tend to think of Rhasspy “skills” as just MQTT services that respond to intents or other messages appropriate. I have a checklist skill, for example, that listens for a special message and then walks the user through each item in a checklist to or confirm or not. At the end, it publishes a message with the results.

Skills like that seem more reusable and composable (can be combined) to me. What do you think?

orca8119 · April 18, 2020, 10:00am

Hello @synesthesiam,

I’ve appreciate the work you are doing. But you can asume by now, that I don’t agree to all your points.

Generally I like the idea of seperating the single services communicating via MQTT, so they are replacable. But for the intend handler, I’ll see a couple of problems (but maybe I don’t see the correct picture):

(1)
As I understand siteId in the hermes protocol, it represents a site like livingroom, bathroom, kitchen (they spoke of a object with at least a microphon). So I would not use this to address intend handlers, as the dispatcher service then needs to “reroute” an intend (e.g. an intend on a site) to another site. Something like "SwitchOnLight on site livingroom to site livingroom-handle-command-switchlight. Something like that is my imagination. This seems to be confusing.

(2)
If you think of multiple “skills”, lets use reallly simple things for now. E.g. something like: SayTime (response to GetTime), SayTemperature (response to GetTemperature), SwitchLight (the example-ones) can be really simple (python) scripts for handling. For each of these simple script calls, you need a full blown python interpreter with a MQTT client running that will start a script. I assume that a whole lot of the skills are rather simple, so this seems to be ressource consuming (on smaller devices like a Pi3). As I stripped the current development apart, the real “doing-part” is rather small. Command and remote_http about 100 lines long, Homeassistant about 200 (including any comment and empty line).

(3)
Everybody, who wants to develop a simple skill (e.g. I wanted to build a skill for saying the temperature of a sensor in Home Assistant) needs to “implement” either a hermes conform MQTT client (even if it is only via inheritance). This is possible, sure. But I thing a dedicated “Skill-Interface” might speed up development for the community (as I said, no Idea, how this interface might me good designed).

(4)
If you connect skills (only) via MQTT, you maybe need a way to communicate certain environmental settings (like the home assistant acces for example), so that each skill can benefit from a central maintenance (e.g. multiple home assistant skills dont need separate configurations).

I had a look at your skill already, and I thought on checking the technical details out. But forgive me, but I crashed my rhasspy 2.5 pre installation again. Don’t know, maybe I tinkered to much.

My current Idea is to bypas the MQTT and handle things directly in the intend handler (e.g. via command call or direct python module communication). But maybe my idea dies when a dialogue comes into play. Or maybe it’s a combination of both, and what I build is a skill named “script integration”, or maybe a master intend handler dispatches "simple skills like scripts or remote-http and homeassitant-events) directly and dispatches the rest to a “real skill”.

Does anyone know how snips handeled skills? I only know, that skills were python scripts in its own venv. Maybe they had a good idea.
By now, to use simple skills (all which can be handeled by a script call) everybody, who wants to use this “skills” (or better name them scripts) from the community, must implement its own "dispatcher-script, which means at least a little bit of development knowledge. A central “repository” makes it easier - Install “skill” (e.g. copy to certain location or via installer script. whatever) - Configure “skill”, e.g. define requried information (think of map intends to actions, a remote-url, required access-data something like that), use it.

I’ll see If I can publish my current development on Github. Simply for the sake that you have an Impression what I had in mind until now.

Forgive me, I seemed to be to rude. It was not my intension. I don’t belive that my Ideas are necessarily the only “correct” ones.

Regards

orca8119 · April 18, 2020, 4:52pm

Hello again,

I’ve read a little on how Snips Skills worked, and yes, the all connect separately to the MQTT event topic (and run forever). The skill server is only a server responsible for starting and restarting as far as I found out in a couple of minutes. As Rhasspy follows the Hermes protocol (what is a good idea in my oppinion), I think you are right to follow this aproach - even with the (possible) disadvantages I mentioned earlier.

Nevertheless, I’ve pushed my development to Github @https://github.com/mk-81/rhasspy-intentaction-hermes. For now it supports all currently availabe intent handlers (without a web gui of course).

So: Ho does this Idea sound:

Stick to the Hermes protocol as it is now in Rhasspy 2.5. Skills are what you described: a self containing MQTT Client process (e.g. like your check list and like a snips skill). I’m not totally sure, but if a Snips skill is kompletely Rhasspy kompatible (especially things like TTS, Dialog Management etc.), than it would be possible to use also snips kills which might increase the available functionality. If supervisor is used (like now) this can serve as a kind of skill-server (at least the part for starting and restarting).

So, where does this leave my development? Lets think of my development as some kind of “skill” (as I said earlier). The skill is to handle “generic” user defined intents. Might be usefull for all, not caring about MQTT and so on, only caring to write a small script handling an intend. My feeling in the forum is, that many people are interested in such thing.

I’m sure that a skill system is not available in Rhasspy 2.5. There is much to do (repository concept, installing, supervisor enhancements are some coming to my mind).

So we could use my development directly as intermediate functionality for all, wanting to handle multiple intends with different scripts, without a dispatcher script (my development supports script, remote_http and homeassistant in parallel). If Rhasspy moves to a skill system, we can move the functionality to a skill and the functionality will stay (downward compatibility).

How does this sound?

Regards

orca8119 · April 20, 2020, 4:11pm

And hello again @synesthesiam & @all

I’ve thought a little bit about your concept for skills, and come up with another idea (by now you propably assume that i’m stupid).

Current basis is, that each skill is a seperate (component like) process (listening on the MQTT) - as you suggested. Secondly we want to have multiple intend handlers, ideally for existing scripts. So, what about the following idea:

There is a management process, responsible for running all (active) skills on the machine. E.g. a supervisor (like the concept, which is currently used for the main rhasspy processes). In order to determine the skills, which should be started, it uses some kind of registration mechanism. The easiest one would be, each skill musst be placed as sub directory in a certain directory (e.g. /home/{user}/.config/skills or /srv/rhasspy/skills, whatever). Maybe, multiple registration mechanisms are possible

We break up the current handler (homeassistant-hermes and remote-hermes) into three separate modules: Homeassistant, Command and Remote-Http as some kind of build-in “skill”-handlers (same funtionality as today).

The new skill manager process (described above) will now parse the skill directory (or registration) and analyses what kind of skill is in the sub directory. One of the possibilities would be a manifest, describing a Homeassistant/Command/Remote-HTTP Skill with its parameters (including the Intents to listen for). For each such Registration, the Process-manager will launch a “build-in” handler process with the parameters from the manifest.
Sounds abstract? Lets see on an example:

In the skill directory, we create a sub directory with a manifest file with the following parameters (e.g. in JSON Format):

intent: MyScriptBasedIntent
Handler: buildin.command
command : ./myscript.sh (relative path is in the registration directory, absolute path possible)

Than the skill manager will start a process: rhasspy-buildin-command-handler.py --intents=“MyScriptBasedIntent” --command="./myscript.sh". The rhasspy-buildin-command-handler than will listen for MyScriptBasedIntent and launching myscript.sh, when MyScriptBasedIntent is published.

Or:

intent: MyIntent1,MyIntent2,MyIntent3
Handler: buildin.remote_http
url: https://my-server:port/endpoint
headers : {map}

than the skill manager will start process:

rhasspy-buildin-remote_http-handler.py --manifest="/path/to/skill/manifest.json" (as an idea, maybe because the headers is to complex for a command line)

or

rhasspy-buildin-remote_http-handler.py --intents=“MyIntent1,MyIntent2,MyIntent3” --url=“https://my-server:port/endpoint”

the same goes for homeassistant.

Optionally, the skill-manager might check the rhasspy profile.json for the “legacy handler” and start the config there as fallback.

For other Skills, the skill-manager might support other kompatible skill types (e.g. snips) and launch these. Question is, what will define a Skill for this Manager (I think snips tried to find a file named action-* in an venv). So either we use this concept (file with fixed name pattern), or we might use a manifest file.

I think this should directly match your Idea of a skill:

component like
independent from the main rhasspy processes (except for the skill manager, which is responsible for starting and restarting)
seperate (each skill is its own process and should not influence any other)

We would gain:

downward compatibility (the original registration of an intent-handler from profile.json can be integrated)
existing scripts would work, if you register it as an “skill” (e.g. via manifest). So nobody needs to write a new “skill” including the MQTT Client
(possibly) open for existing skills
we don’t need a skill dispatcher process (only a manager starting the skills)
possibly things I did’t see

Disadvantages:

each Skill is an own python process, which consumes ressources (don’t know how problematic this might be)
conflicting intents (multiple skills might listen to the same intent resulting in an inconsistent behavoiur). E.g. your Checklist Skill and “my” (hypothetical) Checklist Skill might listen for the same event. Possibly, we can leave this to the skill buildes (flexible intent via skill config)
Parameters in the Sentences must excatly match the Skill (propably not that much of an disadvantage)
possibly things I did’t see

Kind regards.