I am trying to push wav files to /api/speech-to-intent. However, it’s not possible to handle the request for a specific siteId like it is for example for /text-to-speech.
The processing of the intent seems to be done for the siteid of the rhasspy instance. However, I am using multiple satelletes on one instance, therefore passing the siteId for example with ?siteId=xx parameter would be useful for me.
I have the following setup, each in a docker container:
signal-cli web api → handles incoming messages from signal messenger and can send messages
rhasspy → handles my main site id (livingroom) and 2 satellites (office->another rhasspy instance, signal->injecting and extracting text commands and responses)
homeassistant → provides hardware abstraction for all my smart devices and provides the entities to node red
node red → handles all the automation flows and incoming and outgoing signal messages in connection with signal-cli web api
In general I have a signal messenger group with ‘my home’ and me and text messages in that group are treated like voice commands and the responses are sent back to the group. It essentially looks like this:
I Receive text messages via signal messenger (sideId: signal) in node red and pass them to /api/text-to-intent?siteId=signal (text-to-intent supports the siteId parameter and then treats it as the text message came through that siteId).
To get the responses, I subscribe via websockets to /api/mqtt/hermes/tts/say and filter for siteId: signal. I extract the text and send it back to my group chat in signal messenger.
Additionally I now extract voice messages from signal (they come in the format aac), convert them to wav and then pass them to /api/speech-to-intent to treat them as voice commands. I wanted to get the response as in the previous text based use case and send it back as text to my signal group, however the end point does not support the siteId parameter as /api/text-to-intent does and automatically uses the configured default siteId for this particular rhasspy instance.
A workaround would probably be to run another rhasspy instance for siteId: signal and forward everything to the main instance for handling the intents and then use /api/speech-to-intent from this instance, but this seems like a lot of extra steps for this particular use case, especially as it’s already working with text-to-intent end point.
I do not know if this will work, but this may be an idea. It uses the “normal” flow.
The part which might not work is that the audioFrame is usually chucks of small wav files and not 1 large wav.
Post a message to hermes/dialogueManager/startSession with your siteId
Post your WAV to hermes/audioServer//audioFrame
Intent will be posted to hermes/intent/ with siteId and all the rest