Multiple satellites / who to respond to?

If you have multiple sattelites setup, lets say i have one in the living room and one in the kitchen. I am standing in the door way between and I say the wakeword. Both pick, both receive the command.

Will it try to execute 2x? or is there something in place to handle this?

thanks, just random thinking :slight_smile:

There is a lot of audio processing missing in the equation but much is quite simple.

The simplest method is the distributed mic with the best VAD rating is the designated Mic.
Also tools like DoA Direction of arrival can help AI decision.
But without an authoritive server it will always be messy without considerable process power.
Having an authoritive server means each satelite can be lower tech and a single higher power processor acts and provides recognition and co-ordination authorisation.

The Google & Amazon units are relatively low accuracy devices as wake word and action only continue if there is no authorisation override from the single server of the cloud.

Some people wish more privacy from big data and its possible with low cost, effectively with shelf devices such as satelites sharing a single edge device of processing power that is the ‘home cloud’ of the system and private. Means satellites can be quite cheap and just one edge processor is needed.

So it depends on your setup if just isolated clients and no co-ordination or master-statelite with an authoritive server.
Some have mesh networks but IMHO opinion that means each satelite is much more than a relatively dumb distributed mic & sound source and one likely to cost much more, with much dormant process power.
Because of diversification of use, a single master can be authoritive to quite a number of satelites with much less cost and less system redundancy.

So the Master will tell you!

They will never pick-up the wakeword at exactly the same time. So most probably the one who picks it first will execute the command.
Also, it will execute only once because during a voice command the hotword is not triggered again.

In all cases: just try it out :slight_smile:

I do not think that Rhasspy dialogue management handles session de duplication. The Hermes protocol does not provide any way to handle that natively either.

I think the command will probably be executed multiple times (one for each triggered satellite).

@synesthesiam If I am right this might be a good addition to Rhasspy 2.5 dialogue service.

To handle this in my home made dialogue manager I just cancel a new session if a session has already been started with the same wake word on a différent site for 500ms. That was the best method I could find and it works pretty good.

I think this is exactly how Alexa is doing it.

1 Like

I think Alexa is authoritive and the cloud is picking the best weighted recognition or its some form of VAD/Beamforming from the device or in the cloud.

I have seen several videos where Alexa always chooses the the echo nearest to the voice, so its not just a first in basis.

If you mix and match hardware then the latest and fastest will always be first and from what I have seen its not working that way. Somehow on proximity and quality of recieved voice alexa is selective across device generations.
In fact the whole thing is in the majority authoritive the devices are relatively low acuracy with acuracy requirements of only needing to inititate a session and everything is cloud side with authoritive overrides and cordination.

Beamingforming in a wide array microphone system is very easy as its basically volume and clarity mixing of the best performing mic(s).
But yeah its some form of best signal selection and many of the Chinese systems seem to be doing it via VAD. I think VAD is not much more than spectral RMS.

The reason why Alexa keeps recordings as it does rather than just keep recognised data is that its an authoritive self training model also based on usuage recording and data.
From the point of click and initial wake work a stream is created to an authoritive cloud so that response is latency reduced to a minimum.
If your recording files and waiting for asumed lengths of silence, splitting, sending and recieving. The round trip is going to grow latency expotentially.
Snapcast is probably a better audio transit for satelite systems than Hermes audio as it can stream latency adjusted audio in both directions.
I have a hunch Alexa has a data protocol like Hermes but audio is a seperate continous compressed stream from active wake word process to action or cancelation.


In the end of life of “open” snips, dev team had added this functionnality.
In config, we were be able to “group” satellites. So if the same command arrived from satellite in the same group, snips processed only one. Or maybe with hotword…

I did not remember well because i did not have enough time to test and i had not a need for that.

Psycho88 had worked before on a solution too :


1 Like

I am not sure if the Hermes-audio-server is the way to go for satelites in that streams are far more beneficial than encapsulated wav files.
The Hermes protocol for control and co-ordination is still very much needed but thinking snapcast might be just the thing for satelite audio.

I could have x2 or x4 satelites in a room and rather than multiroom I am going to setup latency adjusted multispeaker single room audio.
That my satelites are also my media speakers and they will latency sync to a single audio source on a control protocol.
I need to have a closer look at snapcast and exach room may have to be a seperate server instance and I am also really interested if you can do it the other way round.
Can you have several servers latency sync to seperate client instances running on a host for mic input?

In fact I am not sure how much of the Hermes protocol is needed with satelites as much can be encapsulated in stream metadata.
I am planning to have a play with all that after my satelite hardware quest even if it means some form of docker swarm on the authoritive server.