audioSessionFrame MQTT topic requiring session ID

itsMattShull · January 22, 2021, 11:18pm

Is there a reason for requiring the MQTT topic to have the session ID for audioSessionFrame? I’m trying to get WAV chunks from the current session using audioSessionFrame so I can pass them through to an HTTP node in Node-Red to call the Azure Speech to Text REST API.

Maybe there’s as simpler way to do this too that I’m not seeing. I try audioFrame’s MQTT topic, but that gave me all audio. I only want the audio from a session that’s taken place. Or maybe there’s an MQTT topic that has the full wav file/chunks after it’s done listening that I could use?

Enc3ph4l0n · January 23, 2021, 5:36pm

Have you looked at rhasspy/asr/<siteId>/<sessionId>/audioCaptured?

Alternatively you could use hermes/asr/startListening and hermes/asr/stopListening to capture the session audio and create a WAV file.

itsMattShull · January 25, 2021, 4:30pm

I figured out how to get it. Turns out MQTT has a couple of wildcard options (+ for a single level wildcard and # for a multi-level wildcard).

So I started using hermes/audioServer/+/+/audioSessionFrame but no chunks are coming through. I thought maybe it was Node-Red so I tried hermes/audioServer/+/audioFrame to see if the wildcard would work (because I knew I could get chunks from that topic). I got chunks coming through!

That makes me think it could be a Rhasspy/MQTT matter. Shouldn’t chunks be coming through audioSessionFrame like they do for audioFrame, but only during a session? @synesthesiam do you happen to know?

Enc3ph4l0n · January 25, 2021, 9:09pm

Ah, then I misunderstood your question and wrongly assumed you knew this already

itsMattShull · January 25, 2021, 10:22pm

All good @Enc3ph4l0n! I appreciate the help

synesthesiam · January 26, 2021, 9:56pm

audioSessionFrame was only used in the web UI when you hit the “Wake Up” button to avoid conflicting with audioFrame. As of 2.5.10, though, the “Wake Up” button will just trigger a wake word detection and start a normal dialogue session. So audioSessionFrame will no longer be used.

I’ve debated having the dialogue manager re-transmit audio during a session as an audioSessionFrame, but it would be mean MQTT traffic. It could be enabled by an option, perhaps.

As @Enc3ph4l0n said, though, I’d recommend using startListening and stopListening to bracket some listening function.

itsMattShull · January 26, 2021, 10:57pm

@synesthesiam @Enc3ph4l0n would startListening start a session though? So if I say the wake word, and when the wake word is detected via MQTT I call startListening, wouldn’t it start another session? Maybe I’m confused on how it works.

Enc3ph4l0n · January 27, 2021, 11:05am

You don’t “call” it.

You interact with MQTT topics in two ways: subscribe (listen for messages published to a topic) and publish (send message to topic). As you have discovered you can manipulate topics you SUBSCRIBE to using wildcards. Technical restrictions aside you could have hundreds of clients/scrips SUBSCRIBED to a topic for any MESSAGES published to the topic, and each of those can then react as they wish.

For example, let’s say I have a motion sensor in the living room that PUBLISHES a MESSAGE to Home/Living Room/Motion/Detected when motion is detected. The sensor doesn’t care if anything is SUBSCRIBED at all, nor does it care what happens as a result of it publishing, it’s just publishing. “Hey, just letting you know I’ve detected motion in the living room. If you hear this, do what you like”.

There may be absolutely no clients/scripts SUBSCRIBED, and that’s fine. Equally, there might be several. I could have a light script SUBSCRIBED: “Ah, motion. The light isn’t on and it’s dark in here. I’ll turn the light on”. I could have a security script SUBSCRIBED: “Ah, there’s motion in the living room, and nobody is home. I will notify the owner”. The security script could SUBSCRIBE to all motion sensors using a wildcard for the room Home/+/Motion/Detected: “Ah, there’s motion in the garden, but the owner isn’t home and doen’t need to know this as it’s probably a bird” OR “Ah, there’s motion in the garden, the owner is home and it could be that pesky cat again”.

Looking back at the Rhasspy question, the hotword triggers Rhasspy to PUBLISH to hermes/asr/startListening with a MESSAGE. Rhasspy is SUBSCRIBED to that topic for any MESSAGES published and runs some code. There’s no reason you cannot set up 10x scripts SUBSCRIBED to that topic too, each running different code when it’s published to. You’re not PUBLISHING to it, you’re SUBSCRIBING to it, for any messages that have been published to it and taking action as appropriate.

You can PUBLISH to these MQTT topics too, they’re not restricted to a process. You could trigger Rhasspy to wake by pressing a button if you wanted, as well as a hotword. Program a button to PUBLISH to hermes/hotword/<wakewordId>/detected with a MESSAGE as outlined in the Rhasspy documentation and you’ve got yourself a wake button.

A non technical example would be: you start a group chat with 4 friends about catching a film this evening. The topic is “catching a film this evening”, the message is “the time, film and location”. The subscribers are those who accept the group chat. You are PUBLISHING the message to that topic. Friend 1 doesn’t have signal and sadly doesn’t receive the message (loosely MQTT QoS), friend 2 receives the message and agrees to come, friend 3 ignores you as he’s decided the message isn’t relevant to him as it’s a horror film and friend 4 declines due the time.

Back to Rhasspy. You could listen for StartListening messages to prepare to create a WAV file, then listen for relevant audioFrames, and lastly StopListening to finalise/process/handle the file as you so wish - or something to that effect.

I hope this has helped. I really recommend you do some reading into MQTT. I’m more than happy to provide some sources if you’d prefer but there’s a wealth of material out there if you search. If you have specific questions then reach out.

synesthesiam · January 27, 2021, 2:41pm

To summarize @Enc3ph4l0n:

Subscribe to hermes/asr/startListening, hermes/asr/stopListening, and hermes/audioServer/audioFrame
When you receive a startListening message, do something with audioFrame messages (pass to Azure)
When you receive a stopListening message, ignore audioFrame messages and get transcription from Azure
Send out your transcription as a hermes/asr/textCaptured message

itsMattShull · January 28, 2021, 8:16pm

Perfect, thanks you two!