Mic Arrays are rubbish

rolyan_trauts · August 8, 2020, 8:14am

Sure that will act as bait for some and not sure why microphone arrays have perked my interest so much since my introduction to Mycroft, they just have.

I am not a sound engineer, I did a bit of audio late teens but that was a long time ago.

Firstly to pull the bait array microphones without fancy dsp are just an array of microphones.
As libs on a Pi3/4 you can do echo cancellation which is handy but apart summing the array which also sums noise many of the non DSP microphone arrays we have are practically pointless.
On a zero the load is too heavy even for Echo Cancellation (EC) and all you can do is increase sensitivity.

Increasing sensitivity is OK for far field in a silent room but apart from that its also pointless as any louder volume source will just swamp the signal and that sensitivity will count for nothing.

Even many of the DSP often USB mics are not that great but they are getting better, but even then with inbuilt EC they still suffer from other non distributed noise sources.
Meaning if your voice is the predominant noise source they work quite well, but against singular competing noise sources not so and the problem with home TV & HiFi are those types of noise source.

As well as being critical of for many options of array, omni directional without DSP is also pretty poor as without DSP you don’t have any form of control.

If you don’t have DSP then actually uni-directional cheap old electrets plugged into a cheap USB sound card especially on a zero or where your struggling with load for EC can be much better.
Its not high tech but they just have holes in the back so sound waves hit the back of the diaphragm and act as noise cancellation and give directionality.

The directionality actually gives you some form of control but again making your own gets a bit geeky as you have to tune a mic so that you sound waves are tuned to cancel.

But anyway here is a uni-directional electret quite a good one as is 16K as you do lose frequency.

https://uk.rs-online.com/web/p/condenser-microphone-components/1710881/

Sensitivity is less than mems but the Kingstate is a bit pricey just for a capsule but -37db isn’t bad.

https://uk.rs-online.com/web/p/condenser-microphone-components/7542104/

You will see they just have holes in the back of, Primo do really excellent ones but the prices are a bit crazy.

Uni directional microphones are noise cancelling and have an advantage over software EC that does DSP on the playing audio and subtracts from mic input.
The problem is when audio comes from another source as you don’t have that PCM on your sound card to cancel out its input to your Mic.
A directional Mic does if its facing the right way and often how you place a unit can very much give you a 3rd party noise cancelling solution.
If you stick it on top of your voice AI then it becomes a problem as directionality of that noise can be instantaneous at both back and front.

If you do have a Pi3 or Pi4 though you use software EC to cancel local unit noise and use the natural noise cancelation of a uni-directional microphone in conjunction to get the best of both worlds.

You might only have a single mic on a cheap sound card but in many situations it can be superior to a relatively pointless array that you have no control of.
Sensitivity becomes much less of an issue when you can cancel noise and have a ceiling where you can turn up the gain.

Basically they are uni-directional lapel mics that start really cheap.

You can find one for a couple of $ that will go with a couple of $ sound card but even what looks like it might be quite reasonable isn’t actually earth shattering.

But you can go more pro and you will notice more design on the rear cancelation part of the mic.

Or go all out with a shot gun mic but you Rhasspy is going to get very Steampunk.
https://www.ebay.co.uk/itm/Unidirectional-Condenser-Microphone-Shotgun-Interview-Mic-for-DV-Camcorder/401655533631?

I can not say how well those work on a cheap usb soundcard as sometimes the gain can be pretty poor, hence why I have been taking a more DIY route with preamps as rather than passive I can make an active circuit with things like AGC and controllable hardware gain.

My fave of the moment is the MAX9814 but grabbing cheap clones and replacing the omni-directional.

The AGC timing cap Adafruit choose is rather low unfortunately but tacking on the top another cap is much easier with SMD than I thought, fiddly little things still drive me mad though.

There is a whole range of quite interesting low cost mic hardware.

Less than $3 and it has a noise gate & compressor built in…

I also can not tell you which cheap USB sound cards have decent gain as its a really mixed bag some are dire and others are great and even more confusing you can seem to buy identical but somehow they don’t seem to be.
Also with the mono USB cards I can’t telly you because I have been concentrating on the more expensive but much rarer stereo ones.

So far
Edimax Dreambass which is a VIA VT1620A
Syba SD-AUD20101 which is a Cmedia CM6533
Also the likes of AXAGON ADA-17 or https://www.aliexpress.com/item/4001184939273.html

Hence can not say about sensitivity as with preamps and active the problem is too much gain and often in alsamixer set to 0db to about 9db gain depending on module I am using.

Yeah I know stereo and what I said about arrays but its the arrays generally at the moment we have available that seem to make little sense.
I have been playing with the idea of running one channel and then have an input for another Mic on a wired extension for extra coverage and playing with VAD to control channel mixing.

So a long winded post but for some rather than some $90 USB speakerphone, but if you can find one with decent gain a cheap sound card and Lavalier Lapel in the right situation will definitely give it a run for its money and even be in certain circumstances out perform.

If we get DSP especially advances in current DSP then planar Mems arrays are another story, but without they are a bit rubbish and sort of pointless.

rolyan_trauts · August 21, 2020, 12:24pm

Been a bit lazy as my engineering of interference tubes hasn’t happened yet.
Plastic tubes with holes in them are not the hardest but just haven’t got round to it.

I did place a uni-directional electret in free air and face both directions to he same levels of supposed near far.
1st facing near (Stephen Fry)
2nd facing far (Example wav)

https://drive.google.com/file/d/1ErN28qRi30ILYJCwCpOY6ji6MNmzrFse/view?usp=sharing

https://drive.google.com/file/d/1PqrMqf8TB0Tj2lOlW9juVPufRS9GIRDR/view?usp=sharing

These are just cheap old plain telephone style electrets and the noise reduction is substantial and for many situations it is possible to place your mic to get considerable noise reduction of all noise not just echo.

It is possible to create simple ‘shotgun mic’ interferance tubes that I still mean to play with to garner super cardioid and narrower patterns.
It did occur to me that I could do something that I haven’t seen before and have a reverse ‘shotgun mic’ where the interference tube is on the rear, maybe both and front to test different field patterns.

If I find the elusive chuck key to my pillar drill then I will prob post those also.

Still playing but also forgot to post the 3rd option with EC and uni-directional NS mic.

https://drive.google.com/file/d/1i-yp3tBbH8PfpeAK7vHRQEetIuwEPg8f/view?usp=sharing

Also a omni with no ec as reference.

https://drive.google.com/file/d/1xj3VrmRXRrHRUHZ7TFARMNwiYjLvAbB-/view?usp=sharing

Last reference omni with ec that does a fine job via speexdsp but will only attenuate played and not other sources as the unidirectional will.

https://drive.google.com/file/d/1d3it9i9gtAjK0f9tg1488ZgoPCCGFnuU/view?usp=sharing

LordQuasar · August 21, 2020, 1:30pm

very interesting
this one sounds best to me

where can I get this microphone? looks like the music not intervering the speaker

rolyan_trauts · August 21, 2020, 2:35pm

Just uni directional electret that are on aliexpress prob ebay.

You can get x10 for just over $4.

You will find them in ‘shotgun’ & ‘Lavalier’ mics if you do some shopping approx $5-10

I find making them active via cheap china preamp modules and input to say a stereo usb soundcard such as the Syba SD-AUD20101 make really good pick ups.

EC works really great with audio playing but external noise like TV then nope.

Unless you have seem really top quality omnidirectional array beamforming going on then maybe you can do this with sound technologies that have been around for many decades.

I was thinking about my late teens and a brief audio flirtation and how there where many tricks like 2x cardiods @ 90 degrees would give a stereo pickup.

There are probably a lot of old studio techniques that could be used that are really cheap as SNR & THD expectations and needs for purely voice recognition are much less and likely much cheaper.

Build your own or do some browsing for cheap & cheerful it can be a bit hit and miss as like all some are just complete crap but there are some really good ones also really cheap.

The start of line audio processing in terms of opensource seems to be completely amiss and have just been playing with some idea that with cheap components modules, ec & vad there are avenues that maybe could produce good results without high end asic DSP.

RS is another source here in the UK but presume you will have a similar distributor.

https://uk.rs-online.com/web/c/passive-components/sounder-buzzer-microphone-components/condenser-microphone-components/?applied-dimensions=4294448942,4294448940

candle · August 21, 2020, 3:41pm

I’ve tried a lot of microphones by now.

The Playstation Eye has a microphone array, but that’s not actually used when you plug it in. If you just select the micrphone in Linux, then it will only give you the input form the first of the 4 microphones. If you want noice cancelling, this has to be done in software. It took me a while before I discovered this.
I bought a ReSpeaker USB microhone for $80. It has a build in soundcard that can output sound, so the theory was you could play music through it, and then the microphone would still be able to discern your voice despite the music playing. It would ‘substract’ the music from the signal as it were. The reality is very different however. It only supports 22khz audio out (low quality), and it doesn’t have software controls (ALSA scontrols). Essentially, you can’t change the volume of the output as you would with any other soundcard. My advice: avoid it.

Funnily enough, the best microphone I’ve found is the $12 conference mike on Aliexpress. It’s really solidly built (it’s wonderfully heavy), has great range, and even has a mute button on it. You can find a picture and link here.

I’ve not looked into these lapel mics though.

rolyan_trauts · August 21, 2020, 6:53pm

I got a link to candle candle https://www.candlesmarthome.com/voco-privacy-friendly-voice-control

There is a something very similar to Iphone marketing with many hi tech beamforming arrays and much is total BS.
Also in terms of the SoCs we have that are absolutely great prices its just nuts for a mic / soundcard for a voiceAI to be 300% of the SoC and create a opensource product 400% more than commercial.

I have found that any active powered mic fed into a USB soundcard will have a shed load of gain available.
Because we have +3/5 Vdc on the pi its very easy with electret or analogue mems to create an active mic with an output of 500mv to 2v.
You can amplify much more and with simple AGC you can garner effect far field on a cheap usb soundcard any active mic as passive mics will only return about 200mv max and being single ended rather than differential can quickly pick up noise.

Adafruit do one ( with a google modules are avail much cheaper)

Only shame is that its omni and the built in AGC is really far too fast for voice as that Max9814 is a mighty chip with great SNR/THD that on a cheap soundcard $3 is actually great. PS if you grab the datasheet it is possible to tack on another cap and get a AHD (Attack/Hold/Delay) of 1-2 sec that is much better for for AGC. My old eyes and shaky hands just wish adafruit supplied it as so.

Part of the MFCC process or should be is to dump low energy spectra (noise reduction) so you can gain up to levels of backgroud hiss that for recording would not be acceptable but for recognition work extremely well.
Then again it actually doesn’t matter about noise as if you record your model with the noise it produces the spectra will be the same.

The respeaker as far as I am aware when you download the single channel firmware (the only one that works) is 16Khz not that great and has strange hissy vocoding effects at times.

Its a shame passive designs with 2v bias became the norm for soundcards as a 3-5v dc supply and active preamp can give really excellent results.

SpeexDSP has an Alsa AGC plugin if your cheap sound card doesn’t. Search the forum as Raspbian doesn’t install by default.

The only reason to have an array is to do beamforming/doa as when you sum inputs you also sum noise and its relatively on par as a single source with more gain.
Your probably better to do like they do in high end conference systems and have multiple distributed mics.
Its actually nuts how beamformers are put into single puck devices with a individual mic spacing that is struggling with the speed of sound.
A soundbar would be a much better format as like radio telescopes size equals accuracy, but 3 or 4 mics does return depth as well as 360’ direction.

There probably are USB mics out there that amongst the plastic are excellent its just a matter of finding and posting.

Something like this might well work great but for software EC you have to have playback/capture on the same soundcard so no for that.

Maybe even one of these but with DiY I know where I am at and can use software EC

rolyan_trauts · August 22, 2020, 12:13pm

@LordQuasar

The 3rd option is with software EC running Software EC with voice-engine / ec I should say and isn’t just the Mic.

I was worried that the natural noise reduction of a uni-directional mic might hinder the software EC but both together seem to work quite well.
Software EC on a Pi3 above works really well as in the above link and can run via Alsa or Pulseaudio.

The 4th option is an analogue Mems which really wasn’t fair in comparison as the test is for external far noise like a TV and really EC can not run because it has no input of that noise to the soundcard to subtract from the mic recording as software EC does.

Its just something I have been puzzling over as if rhasspy is playing media then software EC and any mic on a soundcard that has both playback & capture will work great even if the speaker is blasting into the mic it will attenuate the noise to a large extent.
With a bit of thought and some isolation in an enclosure of a mic you can make a great EC mic.

What it doesn’t do is cancel any external noise like a TV or HiFi and just to add to the examples here are 2 recordings of an omni mems in free air in the same position and orientation as the first 2 recordings above of the omni-directional.
This is a passive noise reduction test that uni-directional mics have but omni-directional don’t.
Also you get to hear the slight directionality of a mems when its not planar and the mic hole points in the same manner are the uni-directional.

https://drive.google.com/file/d/1B4qPoWiIKp1swDNg-r-BOu9nnh9kdos8/view?usp=sharing

https://drive.google.com/file/d/1GJENtcFDkBzL7fopJ4y6K32uPZCboWHA/view?usp=sharing

You get a slight bit of directionality with an omni-directional mic if you face the peep hole of the mems in a non planar manner.
In planar orientation I am not going to bother to send a wav as there is zero directional.

As you can tell I have probably been playing with microphones far too much and for far too long, but there is always the huge problem of external noise where even beamformers don’t cope well if its a predominant noise source and software ec omni-directional systems only cope with local playing noise.

I started thinking back to my hazy roadie days of youth that we used to accomplish much with simple directional mics and a room setup is just like a stage but in reverse.
On stage we used to have high quality mics with phantom power which is the original mic version of POE for mics to reduce cable core count.
When you have Pi near by its not so much of a consideration so you use some really cheap passive electrets/mems and preamps.

The omni-directional mic used was x10 for $4

The problem is that so many of electret preamps have a unidirectional already soldered direct.
So I removed the original electret and added some wires to connect one the the above uni-directional as I really like the Max9814 preamp modules.

The soldering is a pain though and if you really want those to be correct the timing cap for the (attack/hold/release) is too low and needs to be replaced or another cap soldered on top.

The mems analogue omni I used is this one.

But if you shop around there is are many preamp and mic modules to choose from that are extremely cheap but good quality.

The soundcard I used is the Syba SD-AUD20101 USB 2.0 as its a stereo ADC so 2 channel mic card but with single channel any really cheap card will do.
The Syba SD-AUD20101 is my favourite Stereo ADC card which now I have quite a few usb sound cards and think its the best, as I may swap between a omni and unidirectional based on vad feedback but again something I am playing with.
You can pick up a single channel mic card for a dollar.
If I do a amixer contents you will see with a preamp how low my gain settings are so how much scope I have for increasing gain and increasing far field.

pi@raspberrypi:~ $ amixer -c2 contents
numid=9,iface=CARD,name='Keep Interface'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=3,iface=MIXER,name='Mic Playback Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=4,iface=MIXER,name='Mic Playback Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=37,step=0
  : values=15,15
  | dBminmax-min=-15.00dB,max=22.00dB
numid=7,iface=MIXER,name='Mic Capture Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=8,iface=MIXER,name='Mic Capture Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=30,step=0
  : values=3,3
  | dBminmax-min=0.00dB,max=30.00dB
numid=5,iface=MIXER,name='Speaker Playback Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=6,iface=MIXER,name='Speaker Playback Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=45,step=0
  : values=22,22
  | dBminmax-min=-45.00dB,max=0.00dB
numid=2,iface=PCM,name='Capture Channel Map'
  ; type=INTEGER,access=r----R--,values=2,min=0,max=36,step=0
  : values=0,0
  | container
    | chmap-fixed=FL,FR

numid=1,iface=PCM,name='Playback Channel Map'
  ; type=INTEGER,access=r----R--,values=2,min=0,max=36,step=0
  : values=0,0
  | container
    | chmap-fixed=FL,FR

The mics and recordings where just in free air so worst scenario really but an enclosure is little more than a plastic tube.

I have a pair of bookshelf speakers with a Pi mounted on the back and a 50watt amp a cheap sound card and DiY mic and its also my airplay stereo system and the whole lot cost the price of one of those stupidly overpriced and not so great USB beamformers.

You could even go posh for and get the dynamic cartridge that is used in Shure SM58 $100 mics!

There are some cheap dynamic preamps also I have just been concentrating on trying to find the cheapest that is suitable for recognition.

candle · August 22, 2020, 1:50pm

Interesting.

Maybe I should create a different topic (or perhaps it exists), but I have a question:

Is there any software that can ‘subtract’ the outgoing music from the incoming microphone audio before doing speech recognition in it?

rolyan_trauts · August 22, 2020, 1:52pm

That is what echo cancellation is but really its feedback cancellation but called EC

SpeexDSP on the pi seems to be the only capable EC but has to be on the same soundcard as the mic as clock drift kills its use.

Pulseaudio webrtc aec doesn’t seem to work well on the pi.

It only attenuates but does a very good job, but that still leaves 3rd party noise sources where maybe passive noise suppression is the only option and why I combined both above.

Pulseaudio webrtc AEC does seem to cancel rather than attenuate but when the echo gets above a certain volume in respect to overall mic input it just fails completely whilst the speex DSP continues to attenuate.

Pulseaudio webrtc_AEC is like the name pulseaudio only the voiceen-ec can be setup for Alsa or Pulseaudio.

rolyan_trauts · August 22, 2020, 3:31pm

@candle

I found your mic on Aliexpress.

I will leave you to test with Voice-en EC but you will find that it acts as a separate input sound card to the one you play on and due to the clock drift between the 2 will in comparison to the above I have posted provide little or no EC.
I am not sure why clock drift matters so much with SpeexDSP but it does.

WebRTC is supposedly much better and can cope with clock drift but for some reason runs extremely poorly on the pi.
Maybe the clock speed of the Arm socs just isn’t high enough as X86 but to be honest it didn’t seem to work that great on my desktop version of Ubuntu also.
It might be the Freedesktop port of WebRTC_Audioprocessing as pretty sure it works in Chromium as confused to why it seems to be so poor.
It should be able to use your usb mic and another soundcard.

rolyan_trauts · August 23, 2020, 7:15pm

In terms of modules and Mics I have sort of narrowed it down to a couple now and these are my faves out of quite a number I have tried connected to a USB soundcard.

I have been playing with Stereo ADC USB sound cards so my cable has been one of these.

To be honest without DOA & Beamforming there is no real advantage to having a stereo pair that I have noticed.

So I have just recorded a snippet on my fave setups on a cheap USB sound card.

Max9814 very high output AGC on board and gpio selectable gain. They come with a unidirectional mic and with AR=float and gain=gnd we get the following just using 6db gain on a cheap mono USB soundcard.

https://drive.google.com/file/d/165hhGRVL75uPAEpxnXXD8k_EUJ2LIoq9/view?usp=sharing

I actually prefer a directional mic but desoldering the omnidirectional to solder a uni-directional electret on is a bit of a pain. Also the AR (Attack/Release) is also very quick and prob pointless for voice so I tack on a 680uF to Ct to create 780uF and a better 1.5sec attack.
Dunno if the hassle is worth it but uni-directional with 780Uf AR=float gain=gnd.

https://drive.google.com/file/d/1dlw3Tc26jzDzTB9vHON6QwQh1cYXSrLh/view?usp=sharing

So dunno but they are really cheap and are extremely sensitive prob not great for broadcast as with the gain we get hiss but for recognition low order energy is cast off during MFCC, so matters not a whole lot.

Then if you want mems and maybe do want a stereo planar here is a analogue mems I like.

https://drive.google.com/file/d/1MA6nf3x4pqsepnVl-LvaIv2nxS1I1g6w/view?usp=sharing

Everything was tested at the same levels with the same usb card so you will notice output if much lower as no dedicated preamp as such on the mems apart from its built in.

There is also another preamp I have been playing with as it has a noise gate and compressor just never wired up 2x potentiometers yet to see what overall effect the can have.
Here it is with the Mems with just its default settings.

https://drive.google.com/file/d/1huH-MxEQTqeZDCgSRJ0KqoLEw-AtZe6J/view?usp=sharing

I haven’t found a cheap and cheerful passive mic that uses the soundcard bias output with enough gain, plenty sound great if your up close but forget far field pickup.

I dunno why as I am using stereo with 2x Max9814 with uni-directional but really x1 omni will do the job just use the Speex AGC alsa plugin and think you will be pretty impressed with range.

With sound cards the stereo dac are usually about $10+ but you have to shop around.

The Syba SD-AUD20101 is a fave stereo ADC of mine.

There are some ridiculously cheap and effective single channel cards but struggling to recommend one as with experience the internals can change.
Its why I have become a fan of the board based Sanwu cards as the components are well specced its a couple of $ and you can see that it is the same.
Also because bias is so ineffective in terms of gain because of the above you may want to remove that so it can have no effect on your circuits. (remove r6 on the cm108)

https://www.youtube.com/watch?v=m0FjQ-X04Jk

candle · January 25, 2021, 4:13pm

One thing I’ve implemented is to mute audio output after the wakeword is detected. That really helps with hearing what the user wants while they are playing music over the speaker

AlexisMori · July 14, 2021, 7:23am

Hi…I think you hit the nail on the head as they do all depend on this but in terms of open source not one thing of sound preparing is employed. They have inserted DSP and giving something that looks comparative doesn’t cruel in besides in terms of sound handling it is really effective. We don’t have any DSP its not indeed investigated as a repo. We will utilize EC but for a few reason that’s too not portion of the project. What you’ll do is buy in equipment with closed source DSP at costs for sound that are products of the cost of driving brand total units for generally destitute execution in comparison of sound preparing.

rolyan_trauts · July 14, 2021, 9:41pm

Speechbrain have as its a university led opensource project but unfortunately it runs on pytorch and pytorch audio is hardcoded to use intel MKL math libs that don’t compile for arm.

I have been wondering this for some time as it is really strange for the many opensource voiceAI projects a complete lack of attention to the start of the audio chain is totally absent (sort of like having a database without data entry !?!)

There have been various complaints on the pytorch github about vendor specific libs and think next version will be 100% arm compat as the annoying thing its only the audio math routines that don’t seem to be.
So maybe pytorch and the routines speechbrain published will become a thing as they work quite well in my Intel Nuc.

What has puzzled me more is with some lateral thought distributed mics through placement can do this very simply by position that one will be voice=near / noise=far.
I have banged on about zones and multiple remote mics that KW hit probability is used to select the stream from a distributed array.
Easy to accomplish just my python isn’t that great and if it wasn’t for my MS it would already be done.

Again lateral thought is not to use omnidirectional mics and go back to simple low cost unidirectional electrets which are a really good fit for a Pi0 cheap usb sound card with multiple units as a distributed array.

The esp-32-s3 when eventually it gets full release is likely going to be the best low cost single array or distributed array as likely eventually to land about the same price as the esp32 currently is. Its obviously had much focus on image & voice AI and if successful likely to follow a similar price curve.

There are a number of projects that are on a cusp of having there time as various options are close and guess irrespective of effort generally much will become obsolete and maybe that would be better to have a few complete mainstream projects rather than the current plethora that without initial processing are incomplete.