Fully support the Hermes or Hermod protocol

synesthesiam · December 21, 2019, 9:32pm

This sounds great, @fastjack, @romkabouter, and @koan!

For the non-Docker case, we could have the hotword service listen to local (internal) MQTT broker and publish to a remote one. We could even create a special topic for the internal broker that is meant for raw audio chunks instead of tiny WAV files to avoid overhead. Another option is to accept raw audio directly from an external program or over UDP with gstreamer.

It seems like the web UI could be split into the part that lets you test and train, and the configuration portion. I’m not sure what configuration becomes when Rhasspy is split into services. Does the web server generate a Docker compose file or a supervisord.conf file?

koan · December 21, 2019, 10:08pm

I hadn’t thought of that, awesome idea!

romkabouter · December 21, 2019, 10:52pm

Maybe we can have a common shared space on the host that every docker service used to read/write configuration files.
HassIO does that as well, folders like config/share/ssl and such.
I use the proximanager for certicate auto-renewal and the files are placed in /share
My HassIO instance is used these same ssl files.
That way, you could have 1 UI controlling the configuration files, while the services use each their own specific config just like now.

As much as I like the MQTT support, I still believe that the excellent ways of using Rhasspy should be kept so that MQTT is always optional.

fastjack · December 21, 2019, 11:27pm

The underlying communication layer can be either websocket events on a single endpoint or a MQTT broker. Both work pretty much the same way. They can be secured using tls and credentials so it should be fine.

As long as the websocket events are the same as the MQTT topic/message it should be easy.

The base DialogueManager/MainManager can provide a default websocket endpoint for all the other services (asr, nlu, tts, satellite) to subscribe (like MQTT).

The satellite manager can provide the same for local communication.

An additional MQTT bridge service can eventually also forward websocket events to an MQTT broker and relay MQTT message from this broker as websocket events for Hermes protocol over MQTT compatibility if required.

That way Rhasspy does not depend on another piece of software to work out of the box.

I also like MQTT a lot but it does not seem like a really required dependency.

fastjack · December 21, 2019, 11:44pm

The web UI main objective should indeed be intents/slots management (creation, edition, etc), training and testing (unit-tests à la Snips would be awesome).

banderson · December 21, 2019, 11:58pm

New user here…been a lurker for awhile. I like how this is proceeding! I am not a domain expert in voice systems, but have been interested in this technology for quite some time. I have a Matrix Voice and ReSpeaker gathering dust that I hope to dust off and use soon! Especially the Matrix Voice.

Some comments about the architecture…

It seems that things are moving in the direction of a “distributed” system. That is, one (or more?) “base” stations whose responsibility is to consume audio from a satellite, perform ASR, NLU, TTS (sending the audio back to a satellite for output) and possibly dialog management whilst the satellite responsibility is to produce audio (for input to ASR) and consume audio (presumably from TTS). From an architectural point of view, It seems important to keep in mind that the satellite might (in the future) be a (standalone) device with limited resources (such as a Matrix Voice).

With this in mind, whilst it might be the path of least resistance for an initial implementation to implement a satellite node using an “internal” MQTT broker this certainly seems like overkill in the long run for a “local” service to manage the satellite. As long as the internal MQTT broker is an implementation detail of the satellite and does not architecturally “leak” outside the boundaries of the satellite, I don’t see a problem. We should keep in mind the possibility of implementing a satellite on a standalone device with more limited resources such as the Matrix Voice ESP32. As such, as long as the satellite implements the correct “interfaces” that the base station requires, then we keep the option open to reimplement the satellite functionality in a more compact and resource efficient manner.

It occurs to me that exposing wake word, led control, etc. outside of the satellite is really unnecessary and inappropriate as this is arguably an implementation detail of a “particular type of satellite” and need not be globally exposed. One might envision other types of satellites that don’t operate the same way (with a wake word) such as activation via face recognition, a button, or presence detection. I’m not saying that these necessarily make sense right now, but the architecture should not preclude other types of satellites. Thus, there should be no need to expose these details outside of the satellite…its just a producer of recorded audio and a consumer of audio for playback. One might envision a sort of “plugin” architecture for the satellite that lets one plug in various “local” functionality in order to operate some arbitrary satellite device. Home Assistants plugin and event architecture come to mind as one possible architectural example. A purpose built software for (e.g.) a Matrix Voice is another example of how one might implement a satellite.

There was also a question about how to “configure” all of the various “nodes” in the system. Why not via the system wide “global” MQTT broker? The broker is easily accessable to all nodes in the system (base and satellite) as well as providing a simple form of persistence for the configuration. That means that each satellite (and base station) only needs a small “bootstrap” configuration to operate, i.e. its assigned ID (node ID/site ID?) and the URL and credentials necessary to access the global MQTT broker. Using the MQTT broker would also allow the web interface to easily author and view the configuration for each satellite (and base station) and publish it the an appropriate MQTT topic. If the satellite is listening to that topic, it can easily reload the config dynamically. Same for the web interface, all it needs is the URL and credentials to access the global MQTT broker to configure satellites and the base station.

I know that some of the above assumes a global MQTT broker is part of the core architecture. A pub/sub broker such as this is pretty useful in a distributed system, and MQTT is a proven service. Maybe there are ways of doing the same thing using HTTP and/or websockets, but I’m not familiar with how these might be used in the same manner. This is not to say that some HTTP or websocket APIs might also be useful, but it does seem as if MQTT is a reasonable choice to base the architecture on.

My thoughts. Keep up the good work! This is pretty cool!

ba

synesthesiam · December 22, 2019, 4:10pm

Thanks for the feedback!

I agree that the details of how the satellite is implemented need not be a concern of the base station(s), but it may make some sense when those services are mixed and matched on different devices. If microphone audio is recorded on a satellite, but wake word detection is done elsewhere, the how the audio gets to the wake word detector matters.

My instinct is to have services support MQTT by default, but have additional options for services that deal with streamed audio (e.g., raw udp).

Yes, except for the case where you need to do something different with an intent depending on which wakeword/face/etc activated the assistant. Luckily, Hermes seems to have a customData field that could be used for this purpose!

Some kind of hub will be necessary. I think the Rhasspy web server could be used in a pinch, but we should probably default to a global broker to avoid multiple hops for messages between services. A lot of this is going to depend on the user’s exact configuration. Ideally, the same set of services could be used in the “all-in-one” scenario as well as a multi-base, multi-satellite setup.

banderson · December 22, 2019, 8:27pm

Ah, yes. I see. Sort of like an external “trigger” to tell the satellite to start listening for a voice command. That and some “events” issued by the satellite (via MQTT or whatever) to inform interested parties that the satellite was:

triggered via an e.g. internal to the satellite wake word (with ID), external trigger (again with an ID? such as presence, face, button, HA event, etc)
is listening for a voice command
is recording a voice command,

Any thoughts about configuring base and satellites via MQTT?

FredTheFrog · December 29, 2019, 1:53am

After reading the entire thread, I only have one personal preference to express. I don’t have an MQTT broker installed or in use, and I don’t really care to set one up. If there’s going to be an ‘internal’ MQTT server for Rhasspy, please please please make it as transparent as possible, with as little configuration necessary as possible.

synesthesiam · December 29, 2019, 3:55am

The HTTP API seems better suited for configuration, since it may involve downloading files from a central Rhasspy hub. I could see MQTT being used to send URLs around, but probably not the data itself.

synesthesiam · December 29, 2019, 3:58am

My plan thus far is to write the core Rhasspy services in a way that they can function as standalone Python modules or as MQTT services. I’ve also come across hbmqtt recently, which is an MQTT broker that can be installed via pip. If we absolutely need MQTT, I’ll probably use hbmqtt (internally configured).

banderson · December 29, 2019, 3:40pm

Hmmm. Let me try to explain my idea again. The web interface can use whatever it wants to communicate with the “hub” to “author” (create/edit) the configuration for a satellite. At some point, it “publishes/applies” this configuration after editing is complete. At this point, the hub “publishes” this new (json) configuration to the global MQTT broker. The satellite, having previously subscribed to its configuration topic gets a change notification at which point it (the satellite) can apply the new configuration. The only “file” necessary on the satellite is the bootstrap configuration necessary to communicate with the MQTT broker (URL, credentials) as well as its “assigned” ID. No other configuration file(s) necessary.

I have to say that I am a bit concerned about the proliferation of HTTP URLs in this architecture, especially for event notification (PUSH). This does not necessarily scale well. Websockets seem like a better solution (for both event delivery and command issuing). It was designed as a bi-directional, real-time communication channel between client (satellite) and server (hub). As a case study, consider HASS itself which migrated from HTTP to websockets for event delivery to clients and command issuing from clients. Also, AppDaemon is an interesting example for “hosting” services (apps) that consume HASS events and product HASS commands. In fact, I believe AppDaemon is architected in a way that it could be used for something other than HASS to host apps. Hmmm

My 2c/p.

fastjack · December 29, 2019, 4:48pm

@banderson Loving the configuration distribution and we can throw in the satellite registration and even satellite/intent control.

I agree that a centralized broker (builtin or standalone, MQTT or websocket) is the way to go for a distributed system. As MQTT is a IoT standard nowadays I’m rooting for it

KiboOst · January 4, 2020, 11:11am

Reading the doc on mqtt published topic:

hermes/intent/<INTENT_NAME>

Rhasspy publishes a message to this topic on recognition of an intent.
The payload is a JSON object with the recognized intent, entities and text.

hermes/nlu/intentNotRecognized

Rhasspy publishes a message to this topic when it doesn't recognize an intent.

hermes/hotword/<WAKEWORD_ID>/detected

Rhasspy wakes up when a message is received on this topic.

Publishing to hermes/intent/<INTENT_NAME> seems strange. Does it seems mqtt should susbribe to one topic per intent_name ??
Why not publishing all intents on hermes/intent/intentRecognized topic with intent_name in the payload ? Would be a lot more efficient no ?
Same for wakeword, publish payload with wakeword id and site id on hermes/hotword/wakewordDetected ?

Also, didn’t find how to install and set mqtt on the pi. Checking mqtt in interface settings doesn’t work, I guess we have to manually install it.

fastjack · January 4, 2020, 11:38am

I’ve asked myself the same question… It would be easier indeed. I do not know if there is a specific reason for this topic per intent, wakeword specification.

I suspect this is due to the MQTT way of doing things (one topic per device/metric for instance)

Apart from Hermes compatibility with third party systems, this should possible to implement alongside the existing topics.

Probably using a topic like hermes/nlu/intentRecognized to avoid collision.

Maybe Rhasspy can send both?

What do you guys think?

KiboOst · January 4, 2020, 11:49am

Here are the topic the jeedom snips plugin subscribe to:

const TOP_INTENTS = 'hermes/intent/#';
const TOP_SESSION_STARTED = 'hermes/dialogueManager/sessionStarted';
const TOP_SESSION_ENDED = 'hermes/dialogueManager/sessionEnded';
const TOP_HOTWORD_DETECTED = 'hermes/hotword/default/detected';

const TOP_START_SESSION = 'hermes/dialogueManager/startSession';
const TOP_CONTINUE_SESSION = 'hermes/dialogueManager/continueSession';
const TOP_END_SESSION = 'hermes/dialogueManager/endSession';

So it subscribe to ‘hermes/intent/#’ then get the name like this : $payload->{‘intent’}->{‘intentName’}

This work for sure, but dunno the impact on performance of publishing to lot of different topics and subscribe to all of them with # (wildcard sub)

koan · January 4, 2020, 12:24pm

This is just the way MQTT works, and I find it very well designed:

If you’re interested in a specific intent, subscribe to the intent’s topic hermes/intent/<INTENT_NAME>.
If you’re interested in all intents, subscribe to the intent wildcard topic hermes/intent/#.

Note: you may think you’re interested in all intents, but in many situations you’re not. Because of the distributed nature of MQTT, other programs can send intents too to the MQTT broker that your program is not aware of (and shouldn’t be). So if you implement something like hermes/intent/intentRecognized with an intent name in the payload, you’ll end up having to parse the payload to filter intents you’re not interested in. That seems much more cumbersome and error-prone than just subscribing to the intents you’re interested in and being sure in the rest of your code you only get events for these intents.

I don’t think we have to worry about performance now. Premature optimization is the root of all evil.

maxbachmann · January 4, 2020, 1:26pm

since the messages are all posted to hermes/intent/<INTENT_NAME>/<message> the correct way to subscribe would be the wildcard hermes/intent/+, so it really only receives the intent messages and when someone decides for whatever reason to add the topic hermes/intent/<INTENT_NAME>/<NEW_FEATURE>/<message> you do not receive this message.

In terms of speed I am not even sure whether subsribing to one topic like hermes/intent/intentRecognized is in any way faster, since when subscribing to it you would receive all the messages even when your not interested in them -> you need to filter them yourself, which most likely is not faster than the implementation paho-mqtt is using (but then again yes this is definetly premature optimization)

fastjack · January 4, 2020, 1:29pm

I agree that it is not a priority for now. Let’s stick to the protocol and we can extend it if required/useful in the future when Rhasspy is fully compliant and all the services are done and installable easily.

KiboOst · January 4, 2020, 1:35pm

Yes sure, and this is how snips works and was not a problem !