Multiple satellites hear the same request

Donireland49 · March 30, 2023, 8:19pm

Is Rhasspy able to determine volume levels of the speech it is processing? Is so, this might be a useful way of dealing with having multiple satellites that all hear the same request.

If the Rhasspy receives the same request from multiple satellites, it could assume that the one with the highest average sound volume is the one that was closest to the person making the request and treat that as the primary satellite which would the ONE to receive any response from the server.

If it has the request from multiple satellites and it has trouble understanding a segment in the primary one, maybe it could use the secondary sources to try and figure out what was said.

I currently have two satellites. There have been occasions in which I have spoken the wake word and made a request when BOTH THEM responded. If it was just a simple matter of turning a light on or off with no verbal reply given back, that would be tolerable. But if the request requires a verbal response, it is sometimes difficult to understand when they are both talking.

rolyan_trauts · March 30, 2023, 11:55pm

Not that I know how with Rhasspy but many KWS output an Argmax as in the probility 0.0-1.0 and that also. But the loudest in RMS is likley good or run a VAD or personal VAD.
Some like Porcupine are just a boolean and you can set the sensitivity but the API never gives you the hit value unfortunely.
But best KW hit is prob a better metric if you get use it.

donburch · March 31, 2023, 5:58am

Don, you are assuming that HA (or node-RED in my case) has a choice … which would require both requests to be processed concurrently so that it can compare them.

My experience is that node-RED completes processing of the first MQTT request before starting on the second request.

One previous forum thread suggested that (because of speed of sound) the first request would come from the closest satellite - but In my case (a RasPi Zero and a RasPi 3A+ in open plan kitchen / living room) I think CPU speed is a bigger factor. I also get false positives from the TV programs.

FWIW, my work-around is to

Increase Kaldi’s confidence level to 40% to filter out most false positives. Also check asrConfidence in node-RED

if (msg.payload.asrConfidence < 0.4) {
    msg.payload = '{ "sessionId": "' + msg.payload.sessionId + '", "siteId": "' + msg.payload.siteId + '", "text": "I didnt hear that clearly"}';
    msg.inSession = true;
    return msg;
}

Save the last voice command and its timestamp in a global variable.
Check whether the last voice command was the same command given within last 3 seconds

// Node-RED seems single thread, so we are not currently processing another intent.
// but maybe we only just finished process the last command ?

let currTime = Date.now();
let lastSession = global.get("lastSession");
// node.warn("lastSession.intentName="+ lastSession.intentName +", lastSession.time="+ lastSession.time );

var seconds = Math.abs(currTime - lastSession.time) / 1000;
// node.warn("last intent received "+ seconds +" seconds ago.");
if (seconds < 3) {
    // less than 3 seconds - assume the same intent heard (or mis-heard)
    //  by another satellite
    msg.payload = '{ "sessionId": "' + msg.payload.sessionId + '", "siteId": "' + msg.payload.siteId + '", "text": ""}';  // no voice response
    msg.inSession = true;
    return msg;
}

There are some (mostly amusing) occasions when they both respond, but to different interpretations of the same command; and I am still getting false positives from the TV (though I turn off the living room satellite while TV is on).

I am still interested in suggestions for better (but affordable) voice recognition hardware/software, and better strategies for processing the voice commands.