what is the current recommendation for a good Mic array (with pre-processing like beamforming to better work with background noises) to be used together with Rhasspy?
Reading in the MATRIX forum
it looks like their projects like the MATRIX Voice are basically dead with no support anymore.
There are many on here like myself using reSpeaker multi-mic HATs with Raspberry Pi’s as Rhasspy satellites. They are a fairly easy, fairly cheap combination … but (as rolyan repeatedly points out) they do NOT have the software (in firmware, device driver or Rhasspy) to do beamforming or any of the other Digital Signal Processing (DSP) that is the justification for multi-microphones.
There are some mic arrays with DSP built-in … but at price points which make Alexa and Google Home a more obvious choice; and so not discussed here very often.
The more I have thought about it, the DSP and multi-mic devices are pretty much targeting the business conference room microphone market - where there is little background noise … nothing like me trying to give Rhasspy a command in my living room with the TV playing
first thanks a lot for your really quick reply which is much appreciated!
I was also considering the reSpeaker 4 or 6 Mic products, but I was reading on several locations that you then either simply combine all inputs of all microphones (which tends to make things worse) or just pick a single one of the microphones as an input (which completely misses the point of a microphone array). That’s why I was not considering the reSpeaker products any further. Or are these limitations no longer the case?
I understood beamforming in a way that it should detect the “loudest” audio source and tries to cancel out other background noises. This might work well also in scenarios where e. g. music is playing in the background. That’s why I would expect such DSP would be suited also for a “living room scenerio” if one would be OK with a bit higher costs. Anybody here tried such a DSP product?
Best regards and again thanks for your quick reply
Andreas
That was also my impression, also not too much more going on in the reSpeaker GitHub repositories, seems a bit abandon as well.
I wouldn’t mind paying 100 EUR or so for a proper DSP microphone array if that works well, but would be good to know if there is some (ideally positive ) experience with one of the products like the one from miniDSP I mentioned above.
Are there any other DSP alternatives besides the miniDSP one available that would be worth looking at?
Just don’t because its not just a matter of not having algs all the frameworks don’t interface to the DSP mic so it merrily focuses its beam to the loudest input from any direction.
Its not just algs there is a complete absence of any of the basic tracking and separation and enhancement functions because what we have are a collection of isolated projects just packaged in python framework but very little actual development on what those projects do or interactive integration. They are just packaged together.
The problem why I have to keep repeating is people do gladly part with their money to find actually beamforming alone or any alg alone doesn’t seem to provide much benefit when its not integrated in a system.
There is a complete absence of knowledge of how to fit these systems together to maximise the effect through the audio chain which they can and that they haven’t been adopted speaks volumes.
If you have a quiet environment where you are going to be the only speaker absent of noise then what we have works quite well or otherwise spend your money on a much cheaper commercial alternative.
Its strange as you get comments like it works well as a mic as really it doesn’t as it works just like any other mic but presents itself as some cool new technology with loads of flashy leds and there is a reason why its dead as for $75 it does no difference as a mic than a $1.99 lapel mic other than it will constantly network broadcast and you can flash leds to your hearts delight.
Matrix voice was one of a collection of voiceAI sites that sprung up that when you do boil down the claims really it amounts to BS as can be read quite often in forums.
All beamformers are relatively pointless if you don’t control what it beamforms to and there is no mechanism for that here.
If you want to spend some money buy a 2mic hat clone as at least the spend is minimal if still relatively pointless but it will allow you to play and get a feel for what is really needed because its another as mic it works quite well.
Not me, but investigate if that device does all those audioprocessing by itself or if you have to write software for it.
In the latter case, nevermind. That is what Matrix advertised as well. That is a nice mic to have, but no use in a noisy environment for use as a voiceAi mic.
The hardest thing to do is get accurate keyword spotting in noisy environments and I have not see a device other than the commercial devices (google/amazon etc) that can do that well on its own.
It is the one thing holding me back on using Rhasspy as my voiceAI at the moment.
The esp32-box does a pretty good job with its aec->snr->bss but even with commercial Googles voice-filter-lite does beat Amazons beamforming as it is noticeable. Esp32-box is not just hardware its software but closed source blobs but freely available.
Google with there huge resources in AI is leading the way from offline ASR the above voice-filter-lite that from a couple of words is able to do targeted speaker extraction, which I would love to get some code for.
What is weird to me though is there are examples and code out there that can be used in the projects and frameworks available there is a total absence of any integration of that tech in the crucial initial audio processing chain.
Mycroft are about to employ a beamformer (SJ201 rev 8!) and they have had a change of direction with a new CEO but still have my doubts and curious to how wide the Snips fever pandemic was.
Mark II supposedly released in September and October but still waiting to see if that happens and how use pans out.
UMA-8 is very similar to the respeaker-usb think its same chipset on a brief glance and its the same story its an isolated beamforming, aec, ns mic that just outputs an audio stream and that is as far as its integration goes as can cope with audio played but external noise from appliances, media or other voice it will not cope with as its just a fraction of the system the likes of google and amazon use.
In fact Google as said dropped that method in favour of targeted speaker extraction.
I had one in an Anker powerconf which really didn’t like Linux and its strange as I also have there c300 2 mic camera which is great for its audio pickup far field and NS but is just a webcam whilst the powerconf with multi mic array and xmos is completely inferior when its a speaker/phone ?!!!
Never tried the respeaker but generally its reviews by reasonably technical tend not to be too great either.
Give it a go if you have the cash to spare as an interest project to check what we are saying, as I have repeated myself to try and save you a couple of quid, but maybe you just want to check for yourself.
the “UMA-8 USB mic array - V2.0” device has two modes: Either it outputs all the 7 microphone channels raw (which you don’t want obviously) or you enable DSP and you then receive on a stereo audio stream after all the DSP pre-processing.
Indeed, seems similar to the “ReSpeaker Mic Array v2.0” product, but I’m a bit hesitant to try this as it looks like support is pretty poor like an abandoned product. miniDSP at least replied quickly to a first question from myself, if they also reply to my follow-up now in a satisfying way, I guess I’ll give their product a try and report back here thereafter.
Indeed, for testing I’ll use a Raspberry Pi Zero 2 which should be OK for some tests, but later on I would consider doing the actual processing beyond hot word detection on a central server like a Nvidia Jetson which should result in quicker reactions.
Anyway, I’ve ordered this miniDSP product for a test yesterday (after getting another quick reply from their support); let’s see, will report back.
yes, I can definitely NOT recommend the miniDSP UMA-8 product!
It works for some hours and then simply stops working and “hangs” until a power cycle which is obviously not acceptable for a product that ideally is capable of running 24/7.
I’ve tested that both with the Raspberry Pi on Linux as well as on a Windows 10 system - both the same.
miniDSP support replies, but is not helpful at all and just check their forum - this product seems dead for me:
I’ve ordered the ReSpeaker Mic Array v2.0 for testing as well and this at least works.
If you want a microphone array including integrated DSP functionality, this seems currently the only working product available unfortunately.
I have reSpeaker 4-mic HAT, reSpeaker 2-mic HAT on different Raspberry Pi’s, and have tried one of the other 2-mic look-alikes. They are all quite satisfactory.
But following rolyans advice I have a simple microphone on my third satellite, which works just as well at a lot lower cost.
Having said that, I really like the visual feedback provided by HermesLedControl on the RasPi 3A / reSpeaker 4-mic HAT combo
Sadly that one is not future proof. I don’t think there are working drivers for arm64 or any of the newer kernels. I don’t think the respeaker4 is worth investing in anymore
Not sure whether you are referring specifically to the reSpeaker 4-mic or HermesLedControl; or the concept of FOSS or electronic devices generally.
I agree that reSpeaker is not worth investing in - but I think for very different reason.
I was very disappointed to find that seeed had abandoned support for the reSpeaker range several years before, and you needed to downgrade OS to install the seeed drivers. I then found that HinTak has taken on the task of updating the reSpeaker drivers for newer kernels. Since then he has even been updating the official reSpeaker repository, and replied to an issue only 12 days ago. He is not actively developing, so it is likely that the driver doesn’t support 64 bit OS.
And pretty much all of the other multi-mic boards seem to be based on the same seeed driver
Similarly the HermesLedControl repository appears to have been updated within the last month.
Personally i think the “future” will be ESP32-S3 and similar chips which combine processor and ADC, with enough grunt to do the DSP on-chip.
Exactly that is what I was referring to. As far as I can see from the github issues, you need to compile a custom kernel to get 64bit support even somewhat working. Only respeaker that has “support” is the 2-mic because that does not depend on seeed drivers
Yes … then I asked myself if I need 64-bit on a Raspberry Pi that I am using as a satellite … and the answer was “no”. Probably on a Base station, but that doesn’t have a microphone/speakers, so not an issue.
I haven’t paid attention to Seeed’s USB interface reSpeakers - but other I’m pretty sure all the other models use the same driver.
Its a real shame that on the Pi there isn’t a good multichannel hat as the respeaker 4/6 mic hats show that it can be done and all that is missing is channel sync, so you don’t get the random channels you currently get with the Seeed drivers.
I find it weird that Seeed continue to sell a product that a couple of year ago they stopped supporting as EOL.
Still fixed geometry hats for software implementation especially with the geometries we have are fubar.
Aliasing with an endfire means a max spacing of somewhere in the region of 30mm and broadside 60mm max, whilst what was provided was something that looked like a mic array minus any audio engineering.
The mems mics on board are analogues and why they are on board and not just an ADC board with daughter mic boards on dupont jumpers or ribbon, so it can be used with any multichannel analogue signal is curious and makes it near impossible to provide any vibration damping and audio insulation.
Audio wise from drivers to audio engineering its devoid of any possible use and leds just sparkle to hide the fact its a piece of total c-rap.
The 2mic & the USB version are really the only audio working mics Respeaker do, but still have some pretty dodgy fixed geometries that on a hat directly on a pi provide near choice in mic positioning.
Maybe you should ask yourself again if you need 64-bit as every module from KWS, ASR to TTS use models that can be quantised to 8 bit that the 64bit neon co-processor of a Pi can operate x8 in one instruction whilst 32bit is a max of x4 and why on 64bit almost a x2 speed or load reduction is provided over 32bit with quantised models, but hey enjoy the bright lights.