Hi!
I want to setup a multi room environment and place a microphone in each room. In case I use rhasspy on a synology NAS in a docker container (the NAS has an Intel Atom C2538, 16 GB Ram):
What would you suppose for hardware to use? Audio playback ist at second priority. The important thing is the recognition of speech.
The added power cosumption should be minimized.
Rooms: ~6
Housings can be 3D printed…
I am trying to do something similar and the pitfalls I have encountered so far are the satelites have to be more powerful than Raspi 0Ws, so I went with Raspi 3s and some of the features on the main server might require the AVX instruction set on the CPU to actually work acceptable, like Larynx.
AVX is hard to find on a low end CPU and the Atom CPUs do not have it.
The lowest CPU that offer AVX is the Pentium Gold 6500Y and 7505 and then its the Core series and up from there.
Sadly, because these low power raspies would be ideal for that. What I do not understand until now:
Why not just use a small Raspi to record and send the audio stream?
Does this mean you have to have more compute power on the base site?
That is actually the way a satellite works, but it still require a little CPU power.
The satellite has to record all the time and search for the patterns for the wakeword and when it finds it, then record the voice command while it makes a search in the record for the silence that indicates end of the voice command. Finally it has to send this over the network.
The Raspi 0 can do this, but it adds a really noticable delay in the process compared to a powerful unit.
I did not test the Raspis between a 0 and 3, since I had some requirements that ruled them out and some of them was also not available for purchase anymore.
The base site does not really need a lot more power unless you have many users activating the satellites all the time and at the same time. I guess most of the time it will only be parsing one voice command at a time and then the number of satellites means nothing.
Well, to be honest, it may be, that there is better microphone hardware around. I really don’t know. I bought those hats some time ago because I didn’t know better. And still don’t
I use both the Seeed Respeaker 4-mic linear array HAT and the 2-mic HAT.
The 4-mic array is slightly better when I listen to music and try to give commands and the distance is a bit better too, but not much.
I think there are better options out there, but like schnopsi, I bought those before knowing much about it.
I agree with WallyDW that a Raspberry Pi Zero W is a bit underpowered even as a satellite (where it is listening for keyword but then passing the serious processing off to a base station). It does do the job, but a Raspberry Pi 3A+ or the new RasPi Zero 2 W respond noticeably quicker to your wakeword, so you don’t have to pause so long before giving your command, and without needing the extra I/O hardware and expense of a RasPi 3B.
I suspect that (while several devices are mentioned in the Rhasspy documentation) there is no formal recommendation for satellite hardware because no combination has yet proved significantly better
I personally have a reSpeaker 4-mic HAT, reSpeaker 2-mic HAT, adafruit Voice Bonnet … and they are all made from almost identical hardware and use the same seeed driver.
While these multi-mic devices have the hardware capability, it appears to be left to system integrators to add Digital Signal Processing (DSP) software into their products. It is the DSP software which will integrate the multiple mics and provide Acoustic Echo Cancellation (AEC), Beamforming, Noise Suppression (NS), etc. As Rolyan has often pointed out, we are tricking ourselves by assuming that multiple mics are automatically better; when it is actually the DSP software which is key. Without DSP, we might as well use a USB sound card with regular microphone and speaker - and yes, one of my satellites is using that and giving much the same result.
I am hoping that the ESP32-S3 (discussed in this thread) will prove to have the desired AI and DSP features - and at a much better price than a RasPi with multi-mic HAT.
As for the base station, I have only a 2 bedroom apartment with 2 humans, and I am still happily using a RasPi 4 running Home Assistant OS and the Rhasspy add-on …though I expect that upgrading to a “better” PC is in my future
For example reSpeaker USB 4 Mic Array claims to feature “built-in AEC, VAD, DOA, Beamforming and NS”. Note that this is NOT a serious product suggestion as its cost $US69, and it looks as though seeed stopped supporting it, so the driver is likely to be out of date