Fully support the Hermes or Hermod protocol

since the messages are all posted to hermes/intent/<INTENT_NAME>/<message> the correct way to subscribe would be the wildcard hermes/intent/+, so it really only receives the intent messages and when someone decides for whatever reason to add the topic hermes/intent/<INTENT_NAME>/<NEW_FEATURE>/<message> you do not receive this message.

In terms of speed I am not even sure whether subsribing to one topic like hermes/intent/intentRecognized is in any way faster, since when subscribing to it you would receive all the messages even when your not interested in them -> you need to filter them yourself, which most likely is not faster than the implementation paho-mqtt is using (but then again yes this is definetly premature optimization)

1 Like

I agree that it is not a priority for now. Let’s stick to the protocol and we can extend it if required/useful in the future when Rhasspy is fully compliant and all the services are done and installable easily.

Yes sure, and this is how snips works and was not a problem !

I agree. The “#” and “+” wildcards allow for a lot of flexibility and, with the right implementation, the performance impact is minimal. It’s much better to subscribe to a lot of topics than to JSON parse everything that comes across the wire.

3 Likes

Hi here!

I just discover this project after I eard of the end of Snips solution.
I’m so excited !
I always want to put some “magic” into my home automation (voice control) and I’m really concerned by privacy.

So I’ve read a lot of topics (from Snips forum, to HA then here), documentation and so on.

Today, I have an Hass.io installation into a raspberry 3, with MQTT addon installed.
My next step would be to build a “satelitte” (rpi 0 + ReSpeaker ? Matrix Voice ?) for 3 differents rooms and control my home automation from voice.

So I have some questions reading this topic :

  1. When we except have this new architecture in place ?
  2. Did you have some milestone in mind ?
  3. When the pi zero will be 100% support ?
  4. Can we already used the Matrix-Voice-ESP32-MQTT-Audio-Streamer with a Rhasspy server ?

Thanks folks for all the great job already done and thanks to Snips to having boost this project :wink:

1 Like

The new architecture is mostly complete, but is missing the ability to re-train and download profiles. It currently works if you have an existing Rhasspy profile, but is still rough around the edges. I’m anticipating an alpha version with all the pieces in place in the next two weeks.

I’m making some modifications to Rhasspy’s training system so that it no longer depends on pre-compiled C++ libraries (opengrm, openfst, and phonetisaurus). Once that’s done, the only thing that needs to be pre-compiled is Kaldi, which will significantly simplify Rhasspy’s deployment. This is the milestone I’m looking at for the transition.

Kaldi is the only piece that I haven’t gotten working on the Pi Zero. I need to double check that Porcupine and Snowboy are working there, though. If someone could figure out how to fix Kaldi, it would really speed things up.

I believe that’s what @romkabouter is using here.

2 Likes

I use the streamer with the addon for Hassio, but that is basically the same :slight_smile:

Love those answers :sparkling_heart:

Good to know
I guess I’ll do the same in my house.

Or use a pi zero + respeaker 2-mic when the “satellite” architecture will be 100% support on this platform.

My main “problem” with the Matrix Voice is the price compare to the Pi Zero

Yes, this is an issue indeed.

I’m looking for some feedback on training in the MQTT/Hermes architecture for Rhasspy. With slot programs and converters, things are a little more complicated.

I’m thinking of having the main Rhasspy server initiate the training process when you click Train or do a POST to /api/train. At that point, the server could gather your various sentences.ini files, run any slot programs, and then packages up sentences + slots to send over MQTT. Here’s how I’m imagining the flow:

  1. User requests training from Rhasspy server
  2. Server sends sentences + slots over MQTT (JSON? gzip?)
  3. ASR service generates dictionary, language model, etc. from sentences + slots (using rhasspy-nlu for parsing)
  4. NLU service generates graph, etc. from sentences + slots

When the NLU service receives a query, it will do recognition and then run converters locally before sending out a Hermes intent message. So, in this view:

  • Rhasspy web server is responsible for running slot programs
  • NLU service is responsible for running converters

Anyone see any issues with this?

Since there seems to be a big push to mqtt (which I love) to even lower this to not even having to do the pings. We could do something similar to what the tasmota code does on my light switches. It creates a LWT (Last Will and Testament) message.

When the satellite starts, it sends a “Status” message that says “Im Online” with a LWT = “Offline” if the satellite code holds subscription to the master open, then that state will always = Im Online. But if it disconnects (pi restarts, network, power ect) the LWT will kick in on the master updating Status to “Offline”

Just a thought.

3 Likes

@koan or @fastjack, do you happen to know which service is responsible in Snips for detecting silence and the end of a voice command?

I’m trying to decide when to send out a textCaptured event. It seems like the ASR service should be detecting silence, but then should the hermes/asr/stopListening message force a textCaptured if no silence has been detected yet?

I was just thinking about this yesterday. They use Kaldi online endpointing feature in the ASR to detect that the utterance has ended. This avoids using an additional VAD system.

The stopListening topic is to explicitly tell the ASR to stop listening on a specific siteId. So I think that this topic should force a textCaptured message of what was transcribed between startListening and stopListening.

This is probably used as a timeout system by the dialogue manager in case the endpointing logic is not able to detect silence (noisy environment, tv background voices, etc.). I think I remember that there was a lot of people on the Snips forum that complained about this timeout happening too soon (5sec if I remember correctly).

2 Likes

I don’t know the specifics of the Snips services, but @fastjack’s interpretation seems sensible to me: I agree that a hermes/asr/stopListening message should trigger a hermes/asr/textCaptured message, and this is corroborated by the Hermes reference section of textCaptured:

When the ASR is listening, it transcribes voice to text in real time. This process stops when a longer period of silence is detected, after which the transcription results are posted, as described here.

However, I’m not sure whether the ASR has a timeout system. I thought that in Snips, the timeouts are in the dialogue manager. For instance, if you look at the Snips configuration file, the only timeout values you can configure are:

  • lambda_timeout (default value: 5 seconds): timeout between dialogue and lambda for endSession, in seconds
  • session_timeout (default value: 15 seconds): internal timeout if one component doesn’t answer, in seconds

Both values are used by the dialogue manager, not by the ASR. I’m not sure what the lambda means in this context, by the way.

So it seems to me the dialogue manager has the responsibility to send hermes/asr/stopListening to the ASR when there’s no silence detected after a configured time (session_timeout or lambda_timeout?), and after that the ASR sends the hermes/asr/textCaptured message.

Or maybe the ASR does have a timeout in Snips, but it’s just not configurable? Maybe it’s good to have a look at what Hermod and Project Alice are doing?

As I said the timeout should not be in the ASR but in the dialogue manager to ensure the platform does not become unresponsive. The ASR only stop listening by itself when it detect a silence via endpointing.

Ok, then we’re saying the same :slight_smile: I just wasn’t familiar with Kaldi’s endpointing feature.

2 Likes

The dialogue manager looks like a central piece in the architecture (orchestrating the other services).

Looks like a « master » or « base » to me… :thinking:

I still need to add session timeouts to the dialogue manager. I don’t have the lambda timeout in the ASR either, just the silence detection (using webrtcvad).

I’ll leave the silence detection as an ASR responsibility, so we can do something different with Kaldi, Pocketsphinx, etc. For now, both use the rhasspy-silence library, but we can change that in the future.

Thanks!

1 Like

This is really great, I have struggled a little to have a good separation of concerns between base and satellite.
Is there a target version for this functionality? Or any other way I can check if it’s released yet? I am considering pausing my tinkering until those changes are rolled out.