Recommendation for Mic Array

Hello,

what is the current recommendation for a good Mic array (with pre-processing like beamforming to better work with background noises) to be used together with Rhasspy?

Reading in the MATRIX forum

it looks like their projects like the MATRIX Voice are basically dead with no support anymore.

Somebody tried “UMA-8 USB mic array - V2.0” from miniDSP already?
https://www.minidsp.com/products/usb-audio-interface/uma-8-microphone-array
This at least sounds promising with miniDSP being a well-established company unlikely to abandon a project too quickly.

Thanks in advance for any feedback or other recommendations!

Best regards
Andreas

Welcome Andreas,

There are many on here like myself using reSpeaker multi-mic HATs with Raspberry Pi’s as Rhasspy satellites. They are a fairly easy, fairly cheap combination … but (as rolyan repeatedly points out) they do NOT have the software (in firmware, device driver or Rhasspy) to do beamforming or any of the other Digital Signal Processing (DSP) that is the justification for multi-microphones.

There are some mic arrays with DSP built-in … but at price points which make Alexa and Google Home a more obvious choice; and so not discussed here very often.

The more I have thought about it, the DSP and multi-mic devices are pretty much targeting the business conference room microphone market - where there is little background noise … nothing like me trying to give Rhasspy a command in my living room with the TV playing :frowning_face:

Hello donburch,

first thanks a lot for your really quick reply which is much appreciated! :slight_smile:

I was also considering the reSpeaker 4 or 6 Mic products, but I was reading on several locations that you then either simply combine all inputs of all microphones (which tends to make things worse) or just pick a single one of the microphones as an input (which completely misses the point of a microphone array). That’s why I was not considering the reSpeaker products any further. Or are these limitations no longer the case?

I understood beamforming in a way that it should detect the “loudest” audio source and tries to cancel out other background noises. This might work well also in scenarios where e. g. music is playing in the background. That’s why I would expect such DSP would be suited also for a “living room scenerio” if one would be OK with a bit higher costs. Anybody here tried such a DSP product?

Best regards and again thanks for your quick reply
Andreas

Yep, that seems to be the case. The device works pretty good though as a mic, but I would not buy one at this point any more

Hello Paul,

thanks for your feedback!

That was also my impression, also not too much more going on in the reSpeaker GitHub repositories, seems a bit abandon as well.

I wouldn’t mind paying 100 EUR or so for a proper DSP microphone array if that works well, but would be good to know if there is some (ideally positive :wink: ) experience with one of the products like the one from miniDSP I mentioned above.

Are there any other DSP alternatives besides the miniDSP one available that would be worth looking at?

Best regards
Andreas

Just don’t because its not just a matter of not having algs all the frameworks don’t interface to the DSP mic so it merrily focuses its beam to the loudest input from any direction.
Its not just algs there is a complete absence of any of the basic tracking and separation and enhancement functions because what we have are a collection of isolated projects just packaged in python framework but very little actual development on what those projects do or interactive integration. They are just packaged together.

The problem why I have to keep repeating is people do gladly part with their money to find actually beamforming alone or any alg alone doesn’t seem to provide much benefit when its not integrated in a system.
There is a complete absence of knowledge of how to fit these systems together to maximise the effect through the audio chain which they can and that they haven’t been adopted speaks volumes.
If you have a quiet environment where you are going to be the only speaker absent of noise then what we have works quite well or otherwise spend your money on a much cheaper commercial alternative.
Its strange as you get comments like it works well as a mic as really it doesn’t as it works just like any other mic but presents itself as some cool new technology with loads of flashy leds and there is a reason why its dead as for $75 it does no difference as a mic than a $1.99 lapel mic other than it will constantly network broadcast and you can flash leds to your hearts delight.
Matrix voice was one of a collection of voiceAI sites that sprung up that when you do boil down the claims really it amounts to BS as can be read quite often in forums.
All beamformers are relatively pointless if you don’t control what it beamforms to and there is no mechanism for that here.
If you want to spend some money buy a 2mic hat clone as at least the spend is minimal if still relatively pointless but it will allow you to play and get a feel for what is really needed because its another as mic it works quite well.

no, you do not have to actually

Not me, but investigate if that device does all those audioprocessing by itself or if you have to write software for it.
In the latter case, nevermind. That is what Matrix advertised as well. That is a nice mic to have, but no use in a noisy environment for use as a voiceAi mic.

The hardest thing to do is get accurate keyword spotting in noisy environments and I have not see a device other than the commercial devices (google/amazon etc) that can do that well on its own.
It is the one thing holding me back on using Rhasspy as my voiceAI at the moment.

The esp32-box does a pretty good job with its aec->snr->bss but even with commercial Googles voice-filter-lite does beat Amazons beamforming as it is noticeable. Esp32-box is not just hardware its software but closed source blobs but freely available.
Google with there huge resources in AI is leading the way from offline ASR the above voice-filter-lite that from a couple of words is able to do targeted speaker extraction, which I would love to get some code for.

What is weird to me though is there are examples and code out there that can be used in the projects and frameworks available there is a total absence of any integration of that tech in the crucial initial audio processing chain.
Mycroft are about to employ a beamformer (SJ201 rev 8!) and they have had a change of direction with a new CEO but still have my doubts and curious to how wide the Snips fever pandemic was.
Mark II supposedly released in September and October but still waiting to see if that happens and how use pans out.

UMA-8 is very similar to the respeaker-usb think its same chipset on a brief glance and its the same story its an isolated beamforming, aec, ns mic that just outputs an audio stream and that is as far as its integration goes as can cope with audio played but external noise from appliances, media or other voice it will not cope with as its just a fraction of the system the likes of google and amazon use.
In fact Google as said dropped that method in favour of targeted speaker extraction.
I had one in an Anker powerconf which really didn’t like Linux and its strange as I also have there c300 2 mic camera which is great for its audio pickup far field and NS but is just a webcam whilst the powerconf with multi mic array and xmos is completely inferior when its a speaker/phone ?!!!
Never tried the respeaker but generally its reviews by reasonably technical tend not to be too great either.
Give it a go if you have the cash to spare as an interest project to check what we are saying, as I have repeated myself to try and save you a couple of quid, but maybe you just want to check for yourself.

Hey guys,

first really appreciating all your replies!

According to their datasheet

the “UMA-8 USB mic array - V2.0” device has two modes: Either it outputs all the 7 microphone channels raw (which you don’t want obviously) or you enable DSP and you then receive on a stereo audio stream after all the DSP pre-processing.

Indeed, seems similar to the “ReSpeaker Mic Array v2.0” product, but I’m a bit hesitant to try this as it looks like support is pretty poor like an abandoned product. miniDSP at least replied quickly to a first question from myself, if they also reply to my follow-up now in a satisfying way, I guess I’ll give their product a try and report back here thereafter. :wink:

Best regards
Andreas

Sounds interesting, maybe you can try it out.
Although is it an USB device, so you need some extra hardware to utilize this

Hello Paul,

Indeed, for testing I’ll use a Raspberry Pi Zero 2 which should be OK for some tests, but later on I would consider doing the actual processing beyond hot word detection on a central server like a Nvidia Jetson which should result in quicker reactions.

Anyway, I’ve ordered this miniDSP product for a test yesterday (after getting another quick reply from their support); let’s see, will report back. :wink:

Best regards
Andreas

1 Like