/api/speech-to-intent not working with siteId


I am trying to push wav files to /api/speech-to-intent. However, it’s not possible to handle the request for a specific siteId like it is for example for /text-to-speech.

The processing of the intent seems to be done for the siteid of the rhasspy instance. However, I am using multiple satelletes on one instance, therefore passing the siteId for example with ?siteId=xx parameter would be useful for me.

Is there another way to realize this?

It is not in the API, but maybe if you explain your use-case there might be another solution

I have the following setup, each in a docker container:

  • signal-cli web api → handles incoming messages from signal messenger and can send messages
  • rhasspy → handles my main site id (livingroom) and 2 satellites (office->another rhasspy instance, signal->injecting and extracting text commands and responses)
  • homeassistant → provides hardware abstraction for all my smart devices and provides the entities to node red
  • node red → handles all the automation flows and incoming and outgoing signal messages in connection with signal-cli web api

In general I have a signal messenger group with ‘my home’ and me and text messages in that group are treated like voice commands and the responses are sent back to the group. It essentially looks like this:

I Receive text messages via signal messenger (sideId: signal) in node red and pass them to
/api/text-to-intent?siteId=signal (text-to-intent supports the siteId parameter and then treats it as the text message came through that siteId).

To get the responses, I subscribe via websockets to /api/mqtt/hermes/tts/say and filter for siteId: signal. I extract the text and send it back to my group chat in signal messenger.

Additionally I now extract voice messages from signal (they come in the format aac), convert them to wav and then pass them to /api/speech-to-intent to treat them as voice commands. I wanted to get the response as in the previous text based use case and send it back as text to my signal group, however the end point does not support the siteId parameter as /api/text-to-intent does and automatically uses the configured default siteId for this particular rhasspy instance.

A workaround would probably be to run another rhasspy instance for siteId: signal and forward everything to the main instance for handling the intents and then use /api/speech-to-intent from this instance, but this seems like a lot of extra steps for this particular use case, especially as it’s already working with text-to-intent end point.

I do not know if this will work, but this may be an idea. It uses the “normal” flow.
The part which might not work is that the audioFrame is usually chucks of small wav files and not 1 large wav.

  • Post a message to hermes/dialogueManager/startSession with your siteId
  • Post your WAV to hermes/audioServer//audioFrame
  • Intent will be posted to hermes/intent/ with siteId and all the rest

Check the reference

That could work, I actually did something similar a while back. However, I still feels like it’s not the “right” way to do it.

At least for consistency with text-to-intent endpoint the feature could be added. What would be the correct way for a feature request?

Or do you know the right place in the code to look into for me maybe? I’m a bit overwhelmed by all the repos.

Yes, I agree that the API should include the siteId.
Maybe its best to raise a defect about it, because the other endpoint do support this.

I haven’t checked the code for a while, but I will check if I can find the part

Where would I post it as a defect - in this github repo: https://github.com/rhasspy/rhasspy?

In parallel I will probably try your proposal but would prefer using the end point in the end.

Yes, on the github page.

I added the issue: https://github.com/rhasspy/rhasspy/issues/282

And I’ll try your workaround today or next week when I’m back from my trip though, thanks for that idea.

1 Like