i love the opportunity to build a voice controll for our smart home run by FHEM.
Thanks for everyone who works on and develops the system!
I live in a bilingual household, so I would like to be able to speak in two languages: German and Spanish.
And now I got two issues:
Multiple Satellites:
To overcome the problems that come with the (compared to Alexa/googlehome) poor microphone performance, I installed multiple RPi3B+Respeaker2Hat-powerd Satellites in our living room.
How do I make sure, that only one is reacting to the wakeword/commands, even if two have correctly understood?
I thought I read something about “grouping the devices by naming them properly” but I do not manage to finde the thread/manual anymore.
Can someone please give me a hint?
Multiple Masters
I run 2 Masters on Proxmox - one German and one Spanish.
I would now like to connect a Satellite to both Masters so that I do not need so many hardware.
What is already possible is to restrict the handling of satellite input by the site_ID. A master is only responding to preconfigured satellites.
What I would need (so it is perhaps more a kind of a feature request) is to define a handling by wakeword.
So if i say “Porcupine” (bad example: I think Porcupine does not allow multiple wakewords) the satellite starts streaming and only the German Master is listening. But if i say e.g. “Torro Loco” the Spanish Master is responding.
Such a feature is not only useful for different languages, it could also help to overcome some problems with the STT components: One master does not have to “learn” all the words. You can split the vocabulary and by doing so reduce missunderstandings.
Or you can set up an English speaking master to handle all you media stuff as most of my songs/playlists/bands have English titles.
Is this possible or at least imaginable?
greetings
Philipp
PS: I also installed squeezelite on the satellites which uses a USB-soundcard for soundoutput. So its a combined voicecontrol-multiroomsound-system, yeah!
There isn’t really as I have been banging on about this for some time as if you have 2x sat mics of the same hardware running the same model the the argmax and stream of those 2 mics is a goldmine of data.
If you are thinking of going that way then scrap the idea of the respeaker hats as the 2nd mic is a whole load of pointless, its drivers are a pia and more importantly the mic is omnidirectional rather than unidirectional.
Omni’s are great as they don’t suffer from mic proximity but without the DSP algs to process the signal from an array they have little advantage over old style electret unidirectional mics.
With 2x Omni’s irrespective of where you are there is going to be very little difference to select a Mic and noise from all directions will be present at high volume.
A unidirectional mic that have been stage mics or telephone mics for decades especially in multi mic environments have a peculiar situation that all their flaws for array dsp actually become advantages.
Due to the directionalty and mic proximity affect, the near mic will provide a far better signal than the far mic and that golden piece of data should be used to select the stream for that session.
Each mic should be grouped into a room zone and the best KW hit should initiate and select the stream for the current room session.
When it comes to Rhasspy that simple answer is no and its a shame as that simple near/far mechanism is instantly a way you can challenge the big assistant manufactures as multiple mics if positioned correctly can be voice/near noise/far often by merely occupying opposing corners of a room.
Even if there was a mechanism to select a mic in rhasspy you current mics and hardware offer little criteria to make selection as likely KW in the presence of noise will be poor and similar.
Your USB analogue mic in soundcards with a signal from a unidirectional mic would be much better but the current crop of KWS that rhasspy has are extremely poor when it comes to current methods never mind state of the art.
When it comes to KWS and state of the art Google are the ones coming to your rescue as they have just published some opensource with some amazing KWS examples that accompanies what they are doing with tensorflow lite.
https://github.com/google-research/google-research/tree/master/kws_streaming is a framework that represents a collection of KWS models that all you have to do is feed them an audio stream.
I did some benchmarks here and a state of art 20ms streaming KWS will run in as liittle as 20% load of a single core on a Pi3.
I copied there repo just to make it present a little easy but making models is extremely easy and inference is a couple of lines of code.
If it interests you you can ask me questions here https://github.com/StuartIanNaylor/google-kws as can share some scripts and tips on how to run and install.
Squeezelite is great even though I am a fan of snapcast but I am not a fan of installing speakers with your mic as apart from the noise problems as the optimum speaker setup is really the opposite to the optimum mic setup.
Mics from opposite corners to side walls isn’t really where you would situate your speakers which generally are on a facing wall with max stereo separation.
A PiZero and a USB sound card even if additional for speakers alone is still low cost and should work great with either squeezelite/snapcast.
But you can still run a wire from a mic to a speaker or from a speaker to a mic as having input and output on the same soundcard is critical for the EC (Echo cancellation) we have.
Its called echo cancellation but it just strips the played sound from the mic input synced by the clock of the sound card and actually works quite well.
Unidirectional here also due to the rear diaphragm that creates directionalty also has natural directional noise suppression.
If you look at the low load of the Google KWS models then its not only possible to run multiple KW its probably possible to run multiple instances of multiple KW on different mics by simple adding another usb soundcard where again a HaT is totally useless.
If promox is running on X86_64 I would have a look at using https://speechbrain.github.io/ for the brain as there is no need to run 2x instances as it would be very possible to run language KW models with the KW and send a signal to the brain to load that language model for that that session.
I am the audio guy who always bangs on about how in reality many hats are arrays of pointless sold only because they sell to those who presume they have some advantage that they do not.
I also often bang on about I don’t like the infrastructure of Rhasspy and don’t use it so someone else well have to tell you if it can model switch or even do a single thing that your looking at as my inclination is really its a no. Hence you might be better with a supposed ‘easy to use toolkit’, google kws & some diy as there is some amazing state of the art opensource that its a bit frustrating for the brain of a Platform as all you need is just a touch more than a Pi 4. There some amazing quality cpu based TTS now that doesn’t need much more than a Pi4.
Pi’s make great Squeezelite/Snapcast / KWS and what I call shelf devices that due to the diversification of multiroom voiceai you can do much more by sharing a single more powerful brain.
Thanks for the reply. But everything sounds very complicated and to much high tech.
I am quite satisfied with the wakeworddetection and also with the intentrecognition.
And I am willing and able to set up more satellites and/or more masters.
All I have is software/protocoll issues which can be solved by software and are, as i am now aware already implemented in some way:
I looked around in Dialogue hermes.
and there is already everything there, that I think i need:
Configure a Master to only listen to a set of wakewords (only start a session for) (around line 825):
elif isinstance(message, HotwordDetected):
# Wakeword detected
assert topic, "Missing topic"
wakeword_id = HotwordDetected.get_wakeword_id(topic)
if (not self.wakeword_ids) or (wakeword_id in self.wakeword_ids):
async for wake_result in self.handle_wake(wakeword_id, message):
yield wake_result
else:
_LOGGER.warning("Ignoring wake word id=%s", wakeword_id)
and to put sites in different groups (around line 666):
if group_id:
# Check if a session from the same group is already active.
# If so, ignore this wake up.
for session in self.all_sessions.values():
if session.group_id == group_id:
_LOGGER.debug(
"Group %s already has a session (%s). Ignoring wake word detection from %s.",
group_id,
session.site_id,
detected.site_id,
)
return
But I have no clue, how to configure these things. In the WebUI there is no place to define the wakeword_ids for the Dialoguemanager, neither is there a way to define a group seperator that is needed for the grouping.
So i am getting blocked always - what did I do wrong? how do I configure a propper allowed wakeword.
I use Rave as wakewordsystem which works quite good with two wakewords
You can simply select a different wakeword for each master. Let them all connect to the same broker, set with external MQTT. Both could be the Home Assistant MQTT addon or some other broker.
Master A (german): connect to broker X, wakeword W1
Master B (spanish): connect to broker X, wakeword W2
All satellites: connect to broker X
You can set the group_separator to “.”, but only via the “Advanced” setting.
Then use similar settings like this:
Where do I configure these thing? Where can I find the “advanced” settings? Only in the profiles.json?
Same for the wakewords per Master? Where can I configure those?
This is not documented in any way?
I configured these things now “hardcoded” to play around a little.
I connected the Spanisch Master to the German master MQTT-Server and it did not work out:
The session was indeed handled by the corresponding wakeword-master. But the services of both masters were activated:
E.g. when the german master asked for STT, kaldi responded twice, both resultes where given to both NLUs and finally I ended up with 8 answers.
(I already closed my logs and notebook, so the description is a little fuzzy, sorry, but I hope you understand what I mean).
Tomorrow I will try with a completly external MQTT-Server.
It might be that this setup does not work due to the overlapping settings conncted to just 1 broker. Rhasspy was not designed that way so it’s possible that this just does not work.
If so, the setup should be separated from each other.
I have been giving this some thought, I do not think multiple masters will work. This due to the fact that there is no distinction on hermes topics.
Master G does not know if a message belongs to him and should react on it.
Master S does not know that either, so you will always have double repsonses I think.
You can trigger spanish or germen with the setup I suggested, but both G and S will respond the messages on the broker.
I did not know that I can edit the profile.json via the WebGUI and I always did it via an external editor.
But even in the WebGUI there is no proposal to edit a group_seperator, so that you just have to know that this function exists (see 3)).
Those are the settings for the wakeword service. I need those settings for a master which does not have a mic on its own and is only served by satellites.
hardcoded: I edited usr/lib/rhasspy/rhasspy-/rhasspy/init.py directly
And concerning the design:
It is a real pity because it works quite well when you restrict a master to certain site_ids. Then you can have to masters running on the same MQTT-Server and only one is serving the requests.
I will do some more tests but right at the moment I can see two pathes to victory:
I configure the satellite and the master to use external MQTT servers and I bridge everything so that I have a strutcture like this:
sat <—> german
sat <—> spanish
As the dialoguemanager triggers the services there should be no overlapping - we will see…
Copy the behaviour of the site_id to the wakeword_id. This is more work because the wakeword_id has to be part of the payload and there must be a place to configure this, but on the other hand perhaps there is no need to invent something because one can just copy the mechanism of the site_id.
The second seems to be the better one because there is no need for a special external configuration and for my usecase it is obvious that the ID is a combination of site_id and wakeword_id.
But as I do not really understand python, I will do some more digging and perhaps come up with a robust feature request which “just” needs implementation but not much thinking…
For the group_separator, you are indeed correct. It is missing here:
The wakeword is listening on the hermes/audioServer/<sitedID>/audioFrame topics.
If you add the siteid’s in the “Satellite siteIds:” field, the master will listen on those topics and the wakeword triggers when you speak the wakeword.
You have to set the wakeword (and other settings) on the master and disable it on the satellite
You can follow this tutorial for that:
1: probably fails: they both use the same broker, so both masters and sats will (try to) react
2: probably fails: the problem is you want every satellite connected to every master. Currently there is no way of telling which is which. You need some way to track which master was triggered, so that when the master G starts a dialogue session, only master G handles responses. Currently every master will respond to messages, that is why you get multiple answers. So to support your setup, a masterid property has to be introduced to the messages or something like that.
There is a field customData, but I do not know if that is usable.
The wakeword is listening on the hermes/audioServer//audioFrame topics.
If you add the siteid’s in the “Satellite siteIds:” field, the master will listen on those topics and the wakeword triggers when you speak the wakeword.
You have to set the wakeword (and other settings) on the master and disable it on the satellite
You can follow this tutorial for that:
But as it is stated, to reduce traffic on the network, one can keep the wakewordservice on the satellites and only stream it to the masters, once the wakeword is detected.
And that is what I want to do. For now, I am satisfied with my solution (put it right in the sourcecode) because in the end, once you have a found a good wakeword, you do not want to change it all the time. And as it is a positive list, even if you add a new master with a special wakeword, you do not have to touch the existing one.
1: probably fails: they both use the same broker, so both masters and sats will (try to) react
Like in your tutorial: on the satellite I got only Audio Recoding and Playing and the Wakeword activated and the rest on the Masters.
And then I send (bidge) all the topics from the satellite to both masters. My analysis shows, that in this configuration usually the satellite only publishes three things:
rhasspywake_snowboy_hermes: Publishing 178 bytes(s) to hermes/hotword/alexa/detected
rhasspyspeakers_cli_hermes: Publishing 99 bytes(s) to hermes/audioServer/lebenXKochen/playFinished
and the continuous audioStream, one activated.
hermes/hotword//detected will be analyzed by both, and one will drop it as configured
And the audioServer Messages will (hopefully) not be confusing to the non addressed master.
All MQTT-Messages from the masters will be send (bridged) only to the satellite. So the other master does not listen to them.
I think all depends on the extra latency and whether it is possible to bridge audioFrames.
2: probably fails: the problem is you want every satellite connected to every master. Currently there is no way of telling which is which.
I am still looking - if a service can decide based on the site_id in the payload, whether to handle the message or not, than it should be possible to upgrade the function to also take a look on the wakeword_id, which obviously needs to become part of the payload.
So to support your setup, a masterid property has to be introduced to the messages or something like that.
Exactly: And right now, I assume, that this masterid is the combination of the site_id and the wakeword_id. The funny thing is, that those things are already the first part of the session-id:
Edit 1: First Result from connection MQTT-Servers (at this stage one satellite and one master, just to test bridging)
all topics from Sat to German and all topics from german to Sat results in a loop.
Correct, but the problem with that setup is that the wakeword will trigger both masters.
If you set the wakeword on the master, it will only trigger that specific master
If you find and create a good solution for this, the Rhasspy developers will probably be happy to accept a PR for the changes
First bit of light at the end of the tunnel:
Two masters on Proxmox running in a virtual machine. Each using their internal MQTT-broker on 12183
One satellite using an external MQTT-broker on localhost:1883 with the following additions to the conf-file:
#connection bridge2Spain
connection bridge2Spain
address 192.168.2.122:12183
try_private true
topic hermes/audioServer/+/audioFrame out
topic hermes/audioServer/+/playFinished out
topic hermes/hotword/+/detected out
topic hermes/asr/# in
topic hermes/audioServer/+/playBytes/# in
topic hermes/dialogueManager/# in
topic hermes/hotword/toggleOff in
topic hermes/hotword/toggleOn in
topic hermes/intent/# in
topic hermes/nlu/# in
topic hermes/tts/# in
connection bridge2Germany
address 192.168.2.121:12183
try_private true
topic hermes/audioServer/+/audioFrame out
topic hermes/audioServer/+/playFinished out
topic hermes/hotword/+/detected out
topic hermes/asr/# in
topic hermes/audioServer/+/playBytes/# in
topic hermes/dialogueManager/# in
topic hermes/hotword/toggleOff in
topic hermes/hotword/toggleOn in
topic hermes/intent/# in
topic hermes/nlu/# in
topic hermes/tts/# in
Additionally you have to edit this file of each master:
#self.wakeword_ids: typing.Set[str] = set(wakeword_ids or []) #ZENTRALINSTANZ DE #self.wakeword_ids = [“alexa”]
self.wakeword_ids = [“snowboy”]
(or the wakewords the other way round)
And I only got an anweser from the specific, addressed master!
Now I need to reimplement the groupseperator and try to scale it a bit and try it with multiple satellites.
The final architecture would be one MQTT-Server for all satellites which makes all the bridging to the masters.
Nice work
I don’t understand why you change the init.py though, you can set different wakewords for each master in the config right?
Also, I use docker so I can’t change init.py (well I could, but rather not)
Ah yes, you already mentioned that sorry about that.
If this works, it might be worth a pull request but you would have to make the wakeword id’s configurable through the webui.