HomeAssistant, Rhasspy, hermes MQTT on standalone matrix voice

Hey guys,

Home assistant and Rhasspy running on raspberry pi 4b, matrix voice standalone connected via MQTT, on device wake word 15 - 20 second delay from the time you say the command to the time it is processed and sent to HA. Wake word works almost immediately. No errors in the log, everything functions well aside from the delay.

I asked the same question a few months ago over here (https://community.matrix.one/t/matrix-voice-esp32-mqtt-audio-streamer/2323/14) but the solution may be on the rhasspy side so i thought i’d post here. I’ve scoured the forums here and on the matrix community and haven’t found this issue anywhere else yet.

A few months ago i was successful at setting up the matrix voice standalone hooked in rhasspy and then to home assistant based on romkabouter’s example, but with one strange quirk that does not appear in his demo - there is a long delay 15 - 20 seconds (maybe more in some cases) between the stated command, and the intended action.

Currently i have the device configured to use on the on device wake work “Alexa”, so if i say alexa then the LEDs turn green pretty quickly, and i say my command. 15 - 20 seconds later it works, the light will turn off, or whatever command i gave to homeassistant. Right after the command processes the LEDs on the matrix voice go back to blue, and it’s ready for another go. So the whole process works properly, there is just this unexplained delay.

Few other notes:

Looking at the MQTT server i can see that the audioFrame topic starts to roll once i saw the wake word, but it continues to receive data during this 15 - 20 second window before the command actually happens. This makes me wonder if for some reason rhasspy can’t detect that there is no more voice activity.

In this setup i was trying to change the voice detection to see if it made a difference, but the behavior is the same if i use webrtcvad or MQTT.

I don’t see any errors in the rhasspy logs, just a gap from when it starts listening to when the intent is sent to home assistant.

Anyway, i decided to revisit this after a few months and start looking for a solution. Any insight is appreciated. Thanks for everyone’s work on this cool tech!

Maybe a timeout issue? Which version of Rhasspy?

Rhasspy version 2.4.20, installed from home assistant add-ons. I updated yesterday.

Not sure if it’s related to the update or if i just missed this, but i do see a timeout in the log that i haven’t seen before. Watching the live updating log in rhasspy, as soon as that timeout pops up everything proceeds.
The delay occurs right after this line
[DEBUG:64980663] HermesWakeListener: listening -> loaded

then after the delay these 2 lines appear right after eachother

[DEBUG:65010689] HermesCommandListener: listening -> started
[WARNING:65010686] HermesCommandListener: Timeout

The log is below.

[DEBUG:64443572] HomeAssistantIntentHandler: POSTed intent to https://(removedurl):8123/api/events/rhasspy_ChangeLightState
[DEBUG:64443565] urllib3.connectionpool: https://(removedurl):8123 “POST /api/events/rhasspy_ChangeLightState HTTP/1.1” 200 54
[DEBUG:64443504] urllib3.connectionpool: Starting new HTTPS connection (1): (removedurl):8123
[DEBUG:64443502] WebSocketObserver: {“intent”: {“name”: “ChangeLightState”, “confidence”: 1.0}, “entities”: [{“entity”: “state”, “value”: “on”, “raw_value”: “on”, “start”: 5, “raw_start”: 5, “end”: 7, “raw_end”: 7, “tokens”: [“on”], “raw_tokens”: [“on”]}, {“entity”: “name”, “value”: “office lights”, “raw_value”: “office lights”, “start”: 8, “raw_start”: 8, “end”: 21, “raw_end”: 21, “tokens”: [“office”, “lights”], “raw_tokens”: [“office”, “lights”]}], “text”: “turn on office lights”, “raw_text”: “turn on office lights”, “recognize_seconds”: 0.002010739000979811, “tokens”: [“turn”, “on”, “office”, “lights”], “raw_tokens”: [“turn”, “on”, “office”, “lights”], “wav_seconds”: 0.0, “transcribe_seconds”: 0.0, “speech_confidence”: 0.02691910517856184, “wakeId”: “default”, “siteId”: “default”, “hass_event”: {“event_type”: “rhasspy_ChangeLightState”, “event_data”: {“state”: “on”, “name”: “office lights”, “_text”: “turn on office lights”, “_raw_text”: “turn on office lights”}}, “slots”: {“state”: “on”, “name”: “office lights”}}
[DEBUG:64443496] DialogueManager: recognizing -> handling
[DEBUG:64443493] DialogueManager: {‘intent’: {‘name’: ‘ChangeLightState’, ‘confidence’: 1.0}, ‘entities’: [{‘entity’: ‘state’, ‘value’: ‘on’, ‘raw_value’: ‘on’, ‘start’: 5, ‘raw_start’: 5, ‘end’: 7, ‘raw_end’: 7, ‘tokens’: [‘on’], ‘raw_tokens’: [‘on’]}, {‘entity’: ‘name’, ‘value’: ‘office lights’, ‘raw_value’: ‘office lights’, ‘start’: 8, ‘raw_start’: 8, ‘end’: 21, ‘raw_end’: 21, ‘tokens’: [‘office’, ‘lights’], ‘raw_tokens’: [‘office’, ‘lights’]}], ‘text’: ‘turn on office lights’, ‘raw_text’: ‘turn on office lights’, ‘recognize_seconds’: 0.002010739000979811, ‘tokens’: [‘turn’, ‘on’, ‘office’, ‘lights’], ‘raw_tokens’: [‘turn’, ‘on’, ‘office’, ‘lights’], ‘wav_seconds’: 0.0, ‘transcribe_seconds’: 0.0, ‘speech_confidence’: 0.02691910517856184, ‘wakeId’: ‘default’, ‘siteId’: ‘default’}
[DEBUG:64443488] DialogueManager: decoding -> recognizing
[DEBUG:64443487] DialogueManager: turn on office lights (confidence=0.02691910517856184)
[DEBUG:64443481] PocketsphinxDecoder: turn on office lights
[DEBUG:64443480] PocketsphinxDecoder: Transcription confidence: 0.02691910517856184
[DEBUG:64443477] PocketsphinxDecoder: Decoded WAV in 1.3532297611236572 second(s)
[DEBUG:64442113] HermesMqtt: Subscribed to hermes/asr/stopListening
[DEBUG:64442111] HermesMqtt: Subscribed to hermes/asr/startListening
[DEBUG:64442098] PocketsphinxDecoder: rate=16000, width=2, channels=1.
[DEBUG:64442097] DialogueManager: awake -> decoding
[DEBUG:64442095] HermesCommandListener: listening -> started
[WARNING:64442093] HermesCommandListener: Timeout
[DEBUG:64412072] HermesWakeListener: listening -> loaded
[DEBUG:64412070] HermesCommandListener: started -> listening
[DEBUG:64412067] DialogueManager: asleep -> awake
[DEBUG:64412067] DialogueManager: Awake!
[DEBUG:64412064] HermesWakeListener: Hotword detected (default)

I had a similar experience with my matrix based MQTT setup but I was using wake word detection on the rhasspy server rather than locally. I reduced the timeout in the profile at first to assist so at least the wait wasn’t as long.

Then I change voice detection from hermes to webrtcvad to listen for silence. That seemed to be able to detect the silence reliably.

Thanks, I too was able to mitigate the delay by reducing hermes command listener timeout to 5 sec. It still has the “delay” and times out, but it doesn’t take 30 seconds to do so anymore. It makes it usable. I’ll keep playing around with it for a while.

"command": {
    "hermes": {
        "timeout_sec": 5

Sounds like the command listener fails to detect silence after you have spoken the command and thus runs into a timeout eventually. Maybe you could try to tweak some settings: https://rhasspy.readthedocs.io/en/latest/command-listener/

Is it possible that your microphone outputs a lot of static noise? I’m not sure how well webrtcvad handles that.

I suppose it’s possible that it is outputting static, i’ve never actually listened to the audio it records. It’s a new Matrix Voice, so aside from it being defective I wouldn’t think so. I will play with the webrtcvad settings as you suggested and see what i can do. At this point it can does a pretty good job at consistently recognizing my commands (i just have to say it within 5 seconds before it times out… :stuck_out_tongue: ) so it doesn’t seem to be picking up any extra static. Thanks.

Just to close this out… After upgrading to Rhasspy 2.5 this is no longer an issue. The same setup and hardware works well and quickly with none of the issues mentioned above.

Thanks everyone for your input!

1 Like