MQTT Speech to Text No response


i try to use speech to text via mqtt protocoll but i don’t get a result.

According to the docs Speech to Text - Rhasspy

I first should send hermes/asr/startListening with a unique sessionId
then the data in chunks to hermes/audioServer/<siteId>/audioFrame
i also tried hermes/audioServer/<siteId>/<sessionId>/audioSessionFrame

and to finish up hermes/asr/stopListening

What i tried: (“default” is the name of my base station)

topic: "hermes/asr/startListening"
payload: "{"sessionId":"0cfcb630-df3b-46c5-882b-0476d7102de5","stopOnSilence":false,"sendAudioCaptured":true,"siteId":"default"}"

(I chose 8236 because i’ve seen rhasspy using this)
topic: “hermes/audioServer/default/audioFrame”
payload: buffer[8236]

topic: "hermes/asr/stopListening"
payload: "{"sessionId":"0cfcb630-df3b-46c5-882b-0476d7102de5","siteId":"default"}"

I also tried to call recordingFinished

topic: "rhasspy/asr/recordingFinished"
payload: "{"sessionId":"415449cf-f11a-461b-8cff-a4980ba15662","siteId":"default"}"

Afterwards the base station doesn’t show any logs or posts anything on neither hermes/asr/textCaptured nor hermes/error/asr

I’m not sure if the siteId in the MQTT Messages is correctly - i should use the id of the base station which should transcripe the speech into text, right?

Does every data chunk need a wav header in the beginning? is the other question.

Would be very nice iv someone can help me!
@synesthesiam please take a look on this :slight_smile:


But I am wondering what you are trying to achieve here. What is it you actually want to do with the speech to text?
If you want to start a session, you should publish to hermes/dialogueManager/startSession
See this reference: Reference - Rhasspy

I am currently working on a new app to replace the current one.
The idea is to mimic the settings of Rhasspy inside the app.

Therefore i send a sessionStarted event and later i send startListening to the base station so it starts transcribing the audioFrames i am sending until stopListening is called.

I know it’s a little bit different than Rhasspy works. Because on Rhasspy speech to text (MQTT) only works well when dialogue manager is set to MQTT.

Ah ok, to answer your question about the data chunk. Yes, every audioFrame package needs to be a wave file (including headers)
So if you do not that, you must add it.