MQTT Speech to Text No response

Nailik · March 15, 2022, 7:47pm

Hi,

i try to use speech to text via mqtt protocoll but i don’t get a result.

According to the docs Speech to Text - Rhasspy

I first should send hermes/asr/startListening with a unique sessionId
then the data in chunks to hermes/audioServer/<siteId>/audioFrame
i also tried hermes/audioServer/<siteId>/<sessionId>/audioSessionFrame

and to finish up hermes/asr/stopListening

What i tried: (“default” is the name of my base station)

topic: "hermes/asr/startListening"
payload: "{"sessionId":"0cfcb630-df3b-46c5-882b-0476d7102de5","stopOnSilence":false,"sendAudioCaptured":true,"siteId":"default"}"

(I chose 8236 because i’ve seen rhasspy using this)
topic: “hermes/audioServer/default/audioFrame”
payload: buffer[8236]

topic: "hermes/asr/stopListening"
payload: "{"sessionId":"0cfcb630-df3b-46c5-882b-0476d7102de5","siteId":"default"}"

I also tried to call recordingFinished

topic: "rhasspy/asr/recordingFinished"
payload: "{"sessionId":"415449cf-f11a-461b-8cff-a4980ba15662","siteId":"default"}"

Afterwards the base station doesn’t show any logs or posts anything on neither hermes/asr/textCaptured nor hermes/error/asr

I’m not sure if the siteId in the MQTT Messages is correctly - i should use the id of the base station which should transcripe the speech into text, right?

Does every data chunk need a wav header in the beginning? is the other question.

Would be very nice iv someone can help me!
@synesthesiam please take a look on this

romkabouter · March 16, 2022, 10:05pm

Yes.

But I am wondering what you are trying to achieve here. What is it you actually want to do with the speech to text?
If you want to start a session, you should publish to hermes/dialogueManager/startSession
See this reference: Reference - Rhasspy

Nailik · March 17, 2022, 9:01am

I am currently working on a new app to replace the current one.
The idea is to mimic the settings of Rhasspy inside the app.

Therefore i send a sessionStarted event and later i send startListening to the base station so it starts transcribing the audioFrames i am sending until stopListening is called.

I know it’s a little bit different than Rhasspy works. Because on Rhasspy speech to text (MQTT) only works well when dialogue manager is set to MQTT.

romkabouter · March 17, 2022, 4:27pm

Ah ok, to answer your question about the data chunk. Yes, every audioFrame package needs to be a wave file (including headers)
So if you do not that, you must add it.