Fully support the Hermes or Hermod protocol

After reading the entire thread, I only have one personal preference to express. I don’t have an MQTT broker installed or in use, and I don’t really care to set one up. If there’s going to be an ‘internal’ MQTT server for Rhasspy, please please please make it as transparent as possible, with as little configuration necessary as possible.

2 Likes

The HTTP API seems better suited for configuration, since it may involve downloading files from a central Rhasspy hub. I could see MQTT being used to send URLs around, but probably not the data itself.

1 Like

My plan thus far is to write the core Rhasspy services in a way that they can function as standalone Python modules or as MQTT services. I’ve also come across hbmqtt recently, which is an MQTT broker that can be installed via pip. If we absolutely need MQTT, I’ll probably use hbmqtt (internally configured).

1 Like

Hmmm. Let me try to explain my idea again. The web interface can use whatever it wants to communicate with the “hub” to “author” (create/edit) the configuration for a satellite. At some point, it “publishes/applies” this configuration after editing is complete. At this point, the hub “publishes” this new (json) configuration to the global MQTT broker. The satellite, having previously subscribed to its configuration topic gets a change notification at which point it (the satellite) can apply the new configuration. The only “file” necessary on the satellite is the bootstrap configuration necessary to communicate with the MQTT broker (URL, credentials) as well as its “assigned” ID. No other configuration file(s) necessary.

I have to say that I am a bit concerned about the proliferation of HTTP URLs in this architecture, especially for event notification (PUSH). This does not necessarily scale well. Websockets seem like a better solution (for both event delivery and command issuing). It was designed as a bi-directional, real-time communication channel between client (satellite) and server (hub). As a case study, consider HASS itself which migrated from HTTP to websockets for event delivery to clients and command issuing from clients. Also, AppDaemon is an interesting example for “hosting” services (apps) that consume HASS events and product HASS commands. In fact, I believe AppDaemon is architected in a way that it could be used for something other than HASS to host apps. Hmmm :slight_smile:

My 2c/p.

3 Likes

@banderson Loving the configuration distribution and we can throw in the satellite registration and even satellite/intent control. :+1:

I agree that a centralized broker (builtin or standalone, MQTT or websocket) is the way to go for a distributed system. As MQTT is a IoT standard nowadays I’m rooting for it :slight_smile:

3 Likes

Reading the doc on mqtt published topic:

hermes/intent/<INTENT_NAME>

Rhasspy publishes a message to this topic on recognition of an intent.
The payload is a JSON object with the recognized intent, entities and text.

hermes/nlu/intentNotRecognized

Rhasspy publishes a message to this topic when it doesn't recognize an intent.

hermes/hotword/<WAKEWORD_ID>/detected

Rhasspy wakes up when a message is received on this topic.

Publishing to hermes/intent/<INTENT_NAME> seems strange. Does it seems mqtt should susbribe to one topic per intent_name ??
Why not publishing all intents on hermes/intent/intentRecognized topic with intent_name in the payload ? Would be a lot more efficient no ?
Same for wakeword, publish payload with wakeword id and site id on hermes/hotword/wakewordDetected ?

Also, didn’t find how to install and set mqtt on the pi. Checking mqtt in interface settings doesn’t work, I guess we have to manually install it.

I’ve asked myself the same question… It would be easier indeed. I do not know if there is a specific reason for this topic per intent, wakeword specification.

I suspect this is due to the MQTT way of doing things (one topic per device/metric for instance)

Apart from Hermes compatibility with third party systems, this should possible to implement alongside the existing topics.

Probably using a topic like hermes/nlu/intentRecognized to avoid collision.

Maybe Rhasspy can send both?

What do you guys think?

Here are the topic the jeedom snips plugin subscribe to:

const TOP_INTENTS = 'hermes/intent/#';
const TOP_SESSION_STARTED = 'hermes/dialogueManager/sessionStarted';
const TOP_SESSION_ENDED = 'hermes/dialogueManager/sessionEnded';
const TOP_HOTWORD_DETECTED = 'hermes/hotword/default/detected';

const TOP_START_SESSION = 'hermes/dialogueManager/startSession';
const TOP_CONTINUE_SESSION = 'hermes/dialogueManager/continueSession';
const TOP_END_SESSION = 'hermes/dialogueManager/endSession';

So it subscribe to ‘hermes/intent/#’ then get the name like this : $payload->{‘intent’}->{‘intentName’}

This work for sure, but dunno the impact on performance of publishing to lot of different topics and subscribe to all of them with # (wildcard sub)

This is just the way MQTT works, and I find it very well designed:

  • If you’re interested in a specific intent, subscribe to the intent’s topic hermes/intent/<INTENT_NAME>.
  • If you’re interested in all intents, subscribe to the intent wildcard topic hermes/intent/#.

Note: you may think you’re interested in all intents, but in many situations you’re not. Because of the distributed nature of MQTT, other programs can send intents too to the MQTT broker that your program is not aware of (and shouldn’t be). So if you implement something like hermes/intent/intentRecognized with an intent name in the payload, you’ll end up having to parse the payload to filter intents you’re not interested in. That seems much more cumbersome and error-prone than just subscribing to the intents you’re interested in and being sure in the rest of your code you only get events for these intents.

I don’t think we have to worry about performance now. Premature optimization is the root of all evil.

3 Likes

since the messages are all posted to hermes/intent/<INTENT_NAME>/<message> the correct way to subscribe would be the wildcard hermes/intent/+, so it really only receives the intent messages and when someone decides for whatever reason to add the topic hermes/intent/<INTENT_NAME>/<NEW_FEATURE>/<message> you do not receive this message.

In terms of speed I am not even sure whether subsribing to one topic like hermes/intent/intentRecognized is in any way faster, since when subscribing to it you would receive all the messages even when your not interested in them -> you need to filter them yourself, which most likely is not faster than the implementation paho-mqtt is using (but then again yes this is definetly premature optimization)

1 Like

I agree that it is not a priority for now. Let’s stick to the protocol and we can extend it if required/useful in the future when Rhasspy is fully compliant and all the services are done and installable easily.

Yes sure, and this is how snips works and was not a problem !

I agree. The “#” and “+” wildcards allow for a lot of flexibility and, with the right implementation, the performance impact is minimal. It’s much better to subscribe to a lot of topics than to JSON parse everything that comes across the wire.

3 Likes

Hi here!

I just discover this project after I eard of the end of Snips solution.
I’m so excited !
I always want to put some “magic” into my home automation (voice control) and I’m really concerned by privacy.

So I’ve read a lot of topics (from Snips forum, to HA then here), documentation and so on.

Today, I have an Hass.io installation into a raspberry 3, with MQTT addon installed.
My next step would be to build a “satelitte” (rpi 0 + ReSpeaker ? Matrix Voice ?) for 3 differents rooms and control my home automation from voice.

So I have some questions reading this topic :

  1. When we except have this new architecture in place ?
  2. Did you have some milestone in mind ?
  3. When the pi zero will be 100% support ?
  4. Can we already used the Matrix-Voice-ESP32-MQTT-Audio-Streamer with a Rhasspy server ?

Thanks folks for all the great job already done and thanks to Snips to having boost this project :wink:

1 Like

The new architecture is mostly complete, but is missing the ability to re-train and download profiles. It currently works if you have an existing Rhasspy profile, but is still rough around the edges. I’m anticipating an alpha version with all the pieces in place in the next two weeks.

I’m making some modifications to Rhasspy’s training system so that it no longer depends on pre-compiled C++ libraries (opengrm, openfst, and phonetisaurus). Once that’s done, the only thing that needs to be pre-compiled is Kaldi, which will significantly simplify Rhasspy’s deployment. This is the milestone I’m looking at for the transition.

Kaldi is the only piece that I haven’t gotten working on the Pi Zero. I need to double check that Porcupine and Snowboy are working there, though. If someone could figure out how to fix Kaldi, it would really speed things up.

I believe that’s what @romkabouter is using here.

2 Likes

I use the streamer with the addon for Hassio, but that is basically the same :slight_smile:

Love those answers :sparkling_heart:

Good to know
I guess I’ll do the same in my house.

Or use a pi zero + respeaker 2-mic when the “satellite” architecture will be 100% support on this platform.

My main “problem” with the Matrix Voice is the price compare to the Pi Zero

Yes, this is an issue indeed.

I’m looking for some feedback on training in the MQTT/Hermes architecture for Rhasspy. With slot programs and converters, things are a little more complicated.

I’m thinking of having the main Rhasspy server initiate the training process when you click Train or do a POST to /api/train. At that point, the server could gather your various sentences.ini files, run any slot programs, and then packages up sentences + slots to send over MQTT. Here’s how I’m imagining the flow:

  1. User requests training from Rhasspy server
  2. Server sends sentences + slots over MQTT (JSON? gzip?)
  3. ASR service generates dictionary, language model, etc. from sentences + slots (using rhasspy-nlu for parsing)
  4. NLU service generates graph, etc. from sentences + slots

When the NLU service receives a query, it will do recognition and then run converters locally before sending out a Hermes intent message. So, in this view:

  • Rhasspy web server is responsible for running slot programs
  • NLU service is responsible for running converters

Anyone see any issues with this?