Hey guys,
TL;DR
Home assistant and Rhasspy running on raspberry pi 4b, matrix voice standalone connected via MQTT, on device wake word 15 - 20 second delay from the time you say the command to the time it is processed and sent to HA. Wake word works almost immediately. No errors in the log, everything functions well aside from the delay.
I asked the same question a few months ago over here (https://community.matrix.one/t/matrix-voice-esp32-mqtt-audio-streamer/2323/14) but the solution may be on the rhasspy side so i thought iâd post here. Iâve scoured the forums here and on the matrix community and havenât found this issue anywhere else yet.
A few months ago i was successful at setting up the matrix voice standalone hooked in rhasspy and then to home assistant based on romkabouterâs example, but with one strange quirk that does not appear in his demo - there is a long delay 15 - 20 seconds (maybe more in some cases) between the stated command, and the intended action.
Currently i have the device configured to use on the on device wake work âAlexaâ, so if i say alexa then the LEDs turn green pretty quickly, and i say my command. 15 - 20 seconds later it works, the light will turn off, or whatever command i gave to homeassistant. Right after the command processes the LEDs on the matrix voice go back to blue, and itâs ready for another go. So the whole process works properly, there is just this unexplained delay.
Few other notes:
Looking at the MQTT server i can see that the audioFrame topic starts to roll once i saw the wake word, but it continues to receive data during this 15 - 20 second window before the command actually happens. This makes me wonder if for some reason rhasspy canât detect that there is no more voice activity.
In this setup i was trying to change the voice detection to see if it made a difference, but the behavior is the same if i use webrtcvad or MQTT.
I donât see any errors in the rhasspy logs, just a gap from when it starts listening to when the intent is sent to home assistant.
Anyway, i decided to revisit this after a few months and start looking for a solution. Any insight is appreciated. Thanks for everyoneâs work on this cool tech!