State of Microphones?

rolyan_trauts · May 10, 2021, 1:38am

Unfortunately because they sell respeaker has a whole range of relatively useless that sell quite well.
The ‘maker’ market got profitable and it got lubricated with snakeoil.

I suppose someone was hoping somewhere that the algs might be released but think its unlikely as either you have loads of clock speed or a rtos where extreme close timings can be guaranteed. The Pi is lacking both.
We would of seen at least one manufacturer boast and release software by now as its been a long time gripe but we haven’t and that sort of speaks volumes.

But it goes back to fixed hardware platforms where everything is custom trained for and the relatively false idea you can just cheaply DiY odds and sods and compete with the likes of Google or Amazon.

There a are niches but with mics they are much the same and often the advantages of others are not really worth the price difference.
Like I say it the catch-22 of the bring your own open approach to hardware and also a collection of software.

I was browsing @sskorol github and noticed he does https://github.com/sskorol/matrix-voice-esp32-ws-streamer

Which might be better than some others as quite a few on here know I have a utter hatred of raw wav’s over MQTT but even broadcasting raw wav’s for me seems crazy seeing how 20 years ago the Ipod made a codec of some type sort of mandatory.
ESP32 or over cheap low cost mics that are distributed and you can select the best stream from a array of distributed mics that broadcast from KW to silence or kick.

Maybe he might be interested in getting a CNN running on the ESP and using AMR-WB as a codec as the g-kws has a CNN ready with a MFCC front end with goodies to run TF4MC

Audio “frontend” TensorFlow operations for feature generation

There is the ESP32 Alexa that Atomic made but he used the spectrgram tutorial and also the benchmark dataset Google command set as its deliberately hard with expectations nothing will get 100% as it is a benchmark KWS dataset otherwise its the most lousy trashy piece of work Google have ever released .

I actually think MFCC is the killer codec as for an ASR its lossless as its what it uses anyway the 16:1 compression can also run through gzip and be absolutely tiny in a similar way Google is boardting about Lyra.

JGKK · May 10, 2021, 6:23am

Depending on how comfortable you feel training your own language model you can also look at doing it all in node-red.
I develop and co developed a few speech control related nodes for node-red.
I have had really great experiences using deepspeech recently in my set up. With a domain specific language model/scorer its fast enough to do real time or faster streaming asr on a pi 4.
The good part about deepspeech is that its a lot easier to start training your own language models and add new vocabulary to combine them to a scorer than it is to add vocabulary and train models for kaldi asr (vosk).
You can have a look at this collection of voice related node-red nodes here:

There is also things like a jsgf (the grammar format that is also used as the base for rhasspy rules) permutator which can be used to quickly create a text corpus for language model training that i wrote. This can also be used to create a tagged corpus to do very basic fuzzy intent recognition.

On the microphone side i fully switched to using max9814 electret microphone breakouts connected to a usb soundcard inspired by @rolyan_trauts as i found them much better than the 2mic seeed card or the iqaudio hat. I actually get quite decent range and detection with this set up.

Johannes

rolyan_trauts · May 10, 2021, 12:58pm

Did anyone have a look at Speechbrain as I went on about it but never did give it a try as I think it tries to make Kaldi easier. Its all very new and need to have a look one time.

Yeah I am not really a fan of the IQaudio hat either but as opposed the respeaker its just more flexible with the 3.5mm and aux-in but as it comes with the onboard omnidirectional mems its just 2x the price of the respeaker 2 mic.

I think most of it is those cardioid electrets as they do have reasonable sensitivity and SNR but boy it was many I tested but definitely a preamp with the max9814 especially the one with its own regulator seems the best.

Part of the problem and why I haven’t tested the iqaudio hat that much is the absolute overkill and complexity of there alsamixer config which is super complex and would seem to be undocumented.

The Lavalier into a sound card is just the non DIY to get the advantages of a cardioid that doesn’t cancel background noise it just picks up better from the front and that is really useful and as audio equipment have been used for decades.

From PS3eye to respeaker I have had to repeat time and time gain the secret sauce is the DSP or otherwise you are purchasing PCB mounted mics that are omnidirectional that often take all the GPIO and also near impossible to acoustically isolate.

I presume though because @JGKK is custom training with the hardware of use that actually the accuracy is quite good but likely doesn’t have complete datasets of the hardware of use like the big guys do, but can get very good results.

If I use the term bemused I am sure for some it will bring a grin but yeah I find it completely bemusing that a voice application seems to shy away from initial audio processing and maybe it is so complex or there are conflicting interests and an overestimation of the resolution and results DSP beamforming and Algs can produce.

I can actually beamform with a stereo soundcard and 2x angled cardioid and use the threshold hit of a KWS to select my stream but the project doesn’t have any method to select best KW hit and just uses the 1st in.
USB are really handy as they don’t steal your GPIO and you are not limited to one as with the above and 2x stereo soundcards I can run a single beamformer on a single core of a Pi3-A+ omnidirectional with 4x 90’ cardioid beams as I have a KWS that can run in less than 20% single core load, or at least Google do.

I am just an oddball though as again and again the request for omnidirectional beamforming and when I look around I never see its need apart from the adverts showing how great it is as a conference mic but in use it never seems to be central and often is on a table or shelf plugged in somewhere close to a wall.

Light · May 10, 2021, 7:47pm

Yes, a lot of your speculation about the proliferation of these devices does sound plausible to me. Unfortunately, much of the rest of your message goes completely over my head. At this point I’m not seeing a particular reason to stick with the Matrix product line, as it seems to be in the very least in hibernation and by @sskorol’s analysis inferior to the ReSpeaker Core, anyway.

Light · May 10, 2021, 7:52pm

Thank you for your input. I was actually looking at moving away from using node-RED. It was expedient to set up my test case, but felt a bit unnecessary in the long run considering it is ultimately driving an executable written in C++.

I honestly have no idea about training my own language model. I’m not against trying it, I just don’t really know where to begin.

Thank you for suggesting the microphones. Which USB soundcard did you choose?

Light · May 10, 2021, 8:03pm

Yeah, I don’t have a need for beamforming, as far as I know. Noise suppression and accurate recognition from a moderate distance are much more usable to me.

rolyan_trauts · May 10, 2021, 9:04pm

Noise suppresion prob not as software NS can leave artefacts as its sounds like your going to use a universal non custom trained which will of been trained without NS so prob just go without.

You can try a version of NS that comes with Speexdsp and for some reason the asound2-plugins are lagging behind revision on debian but you can do an update here.

Sounds like you have what you need with what you currently have setup as after you do go round the houses you tend to come back to any mic can do or will at least have to.

Light · May 11, 2021, 6:58pm

Honestly I still feel a bit like my head is underwater, I’ve gotten tons of great information from several people, but with my lack of experience it’s hard to sift through it all and act on it. Previously you mentioned a preference for a unidirectional microphone connected to a USB soundcard. Can you elaborate on specific microphones and soundcards? What kind of software are you using in your setups?

rolyan_trauts · May 11, 2021, 8:08pm

Just plain rhasspy but a cheapo USB card and unidirectional mic.

I use a BOYA BY-MM1 for testing and stuff as use it also as a desktop mic and its on a little mini camera tripod so its just handy.

Like all china product can be a bit hit and miss as sometimes you never know what is inside as if Intel chipset you lucked out as they are pretty bad.

The white ones seem to be relatively consistently ID 1b3f:2008 Generalplus Technology Inc.
The black ones which build wise are a bit better quality seem to more often be the bad intel chipset than the above.

You always know what your getting with these as its a CM108 and they can not hide that as its not in a case.

For no solder jobs starting with cheap but cardioid (unidirectional) cardioid means heart shape and the bottom of the heart is the front of the pickup pattern.
So very cheap not that sensitive as it works backwards as a theoretical no loss mic has a sensitivity of 0db

-52dB ± 2dB 3.5mm TRS (tip ring sleve) 3 pole 3.5mm jack plug (phones are 4 and the contacts are in a diff place, you can get convertors but hey)

The sensitivity on this looks really great but it a TRRS 4 pole that is for a phone as an example

But you can get adapters make sure it says for microphone as the tip & ring are stereo out

It does get quite confusing as often labelled wrong and often ignore the description and go off what you are looking at.

To the Boya I use

Or what looks the same unbranded but they are half mics and bigger than lavalieres where the mount is the camera type which for me works well as have a mini desk tripod.

Or you can go full DiY and get a preamp

Make a 3.5mm lead and connect to a usb sound card

Electrets you can buy from me as it will save you buying x25 but for price and sensitivity + SNR they seem to be the best and you can try to see if you can get elsewhere but seems quite hardwork so why I bunged a few up on ebay.

Its just 2 wires to the preamp board and no more components

Light · May 12, 2021, 3:30am

Thank you, this gives me a bunch of stuff to try. I will hopefully report back after I’ve had a chance to try some of this stuff.

rolyan_trauts · May 12, 2021, 5:33pm

The White ID 1b3f:2008 Generalplus Technology Inc. seem to have the best hardware AGC and gain over any and should say they are worth a punt as even if you get a wrong one they are very cheap and its not a major dent to source elsewhere.

Should say I often scour ebay abd aliexpress but PiHut should guarantee its the correct type.

Light · May 12, 2021, 5:54pm

Yeah, I think I located the same one over at Adafruit here in the States.

rolyan_trauts · May 12, 2021, 6:19pm

If lsusb reports 1b3f:2008 Generalplus Technology that is the one they are not quietest but the range of the AGC is really good and negates the need for a preamp.
The CM108 prob gives a cleaner signal but the lower levels make a preamp module preferential for far field but they do add an extra step that allows you to get more gain a bit more cleanly.

If you want a cheap hub then my 1st buy worked out really well but stay clear of these.

I thought hey that is just like the 1st one but also with a header array of all the usb pins which is handy but for some reason seem to disconnect and freeze on the Pi3A+ I have tested on.

Really cheap though and seem to work well which was my 1st buy of a cheap ‘board’ hub.
So blue seems OK and purple maybe stay clear but like all these modules they do seem to vary and maybe I was just unlucky.

Light · May 12, 2021, 7:03pm

Yeah, I plan to check when they arrive.

rolyan_trauts · May 21, 2021, 4:30am

PS I gave a bum steer as not a fan of the respeaker USB so my memory is foggy at best.
Its not a beamformer that has to be applied by software its just AEC + AGC.

Light · May 21, 2021, 11:11pm

No worries. I just received the USB soundcard and electret microphone breakout board, but haven’t had a chance to test either of them.

rolyan_trauts · May 22, 2021, 8:40am

Just to plug in with no DiY I did check out one of these and very similar results to my boya but much smaller and cheaper.

Light · May 28, 2021, 6:29pm

Nice, that looks like a pretty convenient option. My friend is helping me assemble the microphone breakout board now, but I may consider something like this if we run into problems.

rolyan_trauts · May 28, 2021, 6:33pm

If you have the max9814 run the gain with the jumper at low or medium at most high is definitely over optimistic and noisy.
I use the gnd from the 3.5mm jack and then 5v to board 5v.
Electret -+ just needs to go to -+ on the input and that should be your lot.

But yeah if you run into trouble the above will just plugin has a little mini gooseneck and actually is hard to tell the difference to my £20 Boya.

Light · May 31, 2021, 2:57am

Okay, thanks. My friend and I usually only meet once a week, so perhaps we will get to try it soon.