Simple cheap USB Microphone / Soundcard

Probably the cheapest, easiest with a touch of soldering and most effective mic / audio out is a cheap soundcard and a Max9814 board.

The Max9814 is a very cheap module with electret microphone onboard.
https://www.ebay.co.uk/itm/MAX9814-Microphone-AGC-Amplifier-Board-Module-Auto-Gain-Control-CMA-4544PF-W/191879408509
£1.60 with free p&p

Couple that with an equivalent USB soundcard


£1.90 with free p&p

You need to buy or find a 3.5mm jack lead and cut in half, so you can solder some Pi jumper wires to the Red/White & Gnd cables of the 3.5mm.
Again a £ or more should get you one on ebay and end up with x2 as you cut one in half.
The other half you can use to wire from the soundcard headphone to a amplifier such as.

You should end up with something like the above with a Pi Jumper wire going to 3.3v on the Pi, Gnd going to the copper screen of the 3.5mm Jack and what turned out to be the white cable on this sound card to the Max9814 out. One cable red or white will not be used as that is the bias voltage to feed a passive mic.

The mono sound cards are for passive mic so they have 1x signal in and 1x bias voltage to power a mic.
We don’t need the bias as we are using a active more advanced powered module in the Max9814.

If you check the Max9814 datasheet it has 2 stages of gain the selectable gain 1st stage and the agc gain 2nd stage.

Make a Pi jumper to go to 3.3v and that is cut and has 2x connectors one side as its going to be Vdd but also the VDD for the 40db selectable gain as the mono soundcards are extremely sensitive.
If you wire to a stereo soundcard that will be line-in (less sensitive) so have a double connector on the gnd going to the soundcard (gain select) instead.
The A/R pin can be left disconnected as that gives the biggest ratio for attack/release of the AGC.

This recording due to me forgetting how sensitive the max9814 is is set with gnd on the gain pin so its 50db and it was lazyness that left it that way as you can see in the above pic.
My sugesstion is use VDD and 40db overall gain as that is a lot of gain, but guess leaving the 3.3v as a single and having double gnd connectors for 50db is more tidy and not that big a thing.

Near
https://drive.google.com/open?id=1rm6bzhlDUpMYuFY6E6SHvcWdoUG_E9Ae
Far @ approx 3m
https://drive.google.com/open?id=1_FOwSbcHtRYjYxu8JJ-t5BYKPWzqR4B2
In the cli alsamixer F6 to select the soundcard and set the sound card to 0db gain.

The card also has AGC and there is software AGC that can also add gain that will likely add less noise than doing it above the 40db gain of the MAX9814.
I do prefer the directional electrets but those are really fiddly to solder and the omnidirectional comes with the board.

A couple £ you can get and effective microphone with decent far field for voice AI and don’t worry about the AGC noise ramping up on silence as this is of no bother for recognition unlike a broadcast setup would be.
What the AGC does is limit gain on clipping (overload) and just keeps raising gain to the max if no clipping or signal is found.
The attack to drop gain on clipping is fast .24ms and the release time to get to full gain is 1sec (960ms).
Prob really be better if slightly longer and double that and you can change the capacitor Ct but too fiddly to bother with the minimal diference.

5 Likes

Thanks for the tutorial :+1:t2:
Although I must say that i don’t find the 3 m performance that impressive compared to the diy and soldering you have to do. Especially once you have to spend the extra time to hook up an amp and some rgb leds separately and build / design your own case that doesn’t look bad in a living room.
Considering all that it becomes much less attractive for me.
But of course your mileage may vary.
Whats your experience with it when you try to do wake word detection in a room with some tv/music noises. How nicely does it play with the stt models for kaldi or deepspeech? I ask this as i found that the character of the used microphone can sometimes have a bigger impact than the actual sound quality on how good it works as much depends on the characteristics of the audio the models were trained on.
What is your real life experience for things like that like compared to the mics that a lot of people use here.
Id be really interested as purely a sound sample unfortunately doesn’t tell the whole story in this case.

3m is the limit of my workroom not the mic as you may notice there is little noticeable difference apart from more slightly room reverb at 3m with the above far & near.
I am unsure to distance max as current walls get in the way, but yes you are correct and that is the whole point as any cheap microphone with AGC should be able to get quite distant far field with reasonable SNR and not much impressive about that.
Its got a line out and doesn’t incorporate a toy like amp that restricts choice as you can choose any amp of choice for the right environment.
That is the problem with all-in-ones as they put huge design restrictions with fixed placement of mics, amps and pixel rings that are often purely redundant and as a hat dictates placement.
Because a USB soundcard shares the same clock control for audio out/in the AEC algorithms to enable barge-in work on the pi.
Speex AEC is currently the most effective on the pi and the criteria is to share the same clock for the near/far reference and input and dacs or products like the respeaker with 4 mics and lovely pixel ring will not allow barge in as we don’t have working AEC for them.
Another problems with all-in-ones is microphone isolation as a small wired module is much easier to isolate from speaker noise than any onboard mic that really makes isolation impossible.
Electrets also have some advantages due to round case that are easier for DiY mounts and isolation and are much easier to resolder than SMD devices and also have directional noise suppressive/studio versions and can extremely cheap or expensive and again choice.

Because of the nature of the MFCC and the way low order energy is dropped due to the process it is why a multitude of microphones do work with the likes of kaldi or deepspeech.
There frequency response differences are far far less than the huge differences age, dialect and gender provide for the human voice where microphone colour pales into insignificance as its an impossible match unless you are Google or Amazon where you enforce the microphone of use.
Deepspeech uses the common voice dataset which has a very similar recording environment of the submitted samples by general PC mics and headsets whilst Kaldi tends to use ‘over’ clean datasets captured from Ted talks and Youtube but again this pales into insignificance when compared to the difference of gender, age, dialect and pronunciation of the spoken word.
This is also similar to the sample rate and bit width of the audio where 16Khz 16bit gives a voice nyquist of 8Khz range as the microphone isn’t the major consideration.

It is why you are likely better with a cheap easily available microphone rather than a studio mic or one with beamforming artefacts as currently its more likely the source dataset was also recorded that way.
Its why Google and Amazon do have an advantage as they dictate a microphone and also the datasets that they collect through use.

Its a cheap solution that is extremely flexible due to being wired that has a collection of components that can mix and match with choice or even an analogue array.
Due to being analogue you can add analogue modules without need for further processing load such as a compressor/noise gate or replace the omni directional electret with a directional or studio grade one where the cost of the latter is likely not worth it as its likely models are rarely recorded with one, because the environment of use is a living one not a studio.

https://uk.rs-online.com/web/p/condenser-microphone-components/7542104/

Its a £1.90 USB sound card that you may already have or equivalent and a £1.60 mic module that is a close call to the INMP401/4 analogue mems that do offer better clarity (SNR) that start about £2.50 for a module that wire the same minus the selectable gain as they are fixed.
You can use any soundcard as their quality for a long time has surpassed the recording quality of the environment of use.

Just because you have chosen an all-in-one solution and enclosed directly inline with a speaker that makes AEC impossible doesn’t mean that is true of other solutions that are often much less cost and more flexible.
Its the same for DACs or ‘microphone only’ products as AEC will not function correctly dependent on clock drift on the processing power available, unless using built-in DSP.
Where with criteria to kaldi and deepspech models and microphones there is absolutely no difference as the recording microphone(s) is always an unknown.

If your solder averse use a quick splice connector or something like


Or just crimp, I am just old school. Soldering iron and insulation tape but if you have hot air gun

Or, but for me a bit big and clunky
Are real easy, its a shame they are 3.3v and not 5v as we could power from the usb and leave the gpio completely clean.
But you can use a something like a step-down ldo, again choice and flexibility and may give a cleaner VDD as an LDO as opposed to a buck (really should try that as have some somewhere).

But the DiY done is 3 wires and presume for many that is not much of a chore its just a 3.5mm jack to jumper leads and presume you can buy them but easy to create and your mic can be less than £1.60 if you shop around!

As I said im interested in real life use experience you’ve had with this combination? As you said yourself hardware doesn’t tell the story with its price.
You list alot of theory about why it should work but how did it actually perform in your living room? How many false positives with which engine (raven/precise/snowboy/porcupine) did you have compared to other hardware combinations in a room with tv/music. How high would you say was your stt hit rate with kaldi or deepspeech?
Thats all I want to know because that is what counts in the end and would make me consider the hardware instead of for example a 2 mic pi hat which costs the same in the end once i shop all the components to have feature parity. So i would be very thankful if you could share those real life experiences.
I for example often set up the different mics at the same time when i get a new one to try so that I can have a few days of using them around the clock in production side by side and have that in use comparison. So I would be very thankful if you could share your in use experience with Rhasspy/Voice2json/Linto/Mycroft and this hardware.

actually i see the usb soundcard as a weakness. Not from an audio standpoint as you state its positives from a hardware perspective but from a design standpoint.
Having the necessity to have anything attached to the usb ports makes it harder to design small minimal satellites that dont look like a Raspberry or anything too much diy. This is of course a question of taste but for something I have in every room and my partner has to be happy to use too its easier to build a clean compact solution when everything is attached to the gpios.

I really wish there were better all in one solutions in this field as i think a majority of the people trying something like Rhasspy are just fleetingly familiar with the commandline and dont want to solder and buy a number of handpicked components and than diy a case or have the naked concoction lying around. You yourself titled this post simple usb mic and it might be for you but for many people dipping there feet into something like Rhasspy the hardware you described above will be an advanced diy project.
For an adoption that is more widespread than now and for communities like this to grow I think we also have to offer easy hardware buying advice that is plug and play. We need cases which can easily ordered on treatstock to be printed (maybe even with a Rhasspy Logo?) that fit those common hardware choices. Hardware knowledge cant be the entry hurdle.

People like you @rolyan_trauts and your knowledge are very important for this community and i dont want to discourage you or start a fight i just want you to see that side of the equation too.
Maybe you could design a pcb one day with those components that you found and investigated and we could than get seeed to manufacture a small run of a Rhasspy aec agc rgb led low cost sound hat. Id be happy to built / design a case for that.

Johannes

A microphone in itself will make no difference to hardware combinations to a room with predominant noise from TV/music if its omnidirectional.

If its directional then you will get an element of noise suppression through directionality.

Unless you have advanced beamforming and even then when there is predominant noise via 3rd/party tv & media it will kill any recognition.
Beamformers work exceptionally well in distributed noise fields of say industrial locations but what could be common in domestic of TV voice or loud music of a single source that floods voice not so.

I have been using a whole collection of mics from Kaldi, Precise, Snowboy, procupine, deepspeech, Mycroft & Linto plus various tensor flow models hacked by myself and compared to room noise, models and my own dialect the microphone makes absolutely no difference and I am not going to start recording empirical evidence of going through everything I have tested again just to satisfy you of what I already know.

This is what I am saying as you can get a £1.60 mic module and it will be equally as bad as your respeaker 2 mic in your case as opensource doesn’t have the algs Google & Amazon print on silicon and we process on a Pi! not Google & Amazon AI accelerated state of the art data centers.

Your mic doesn’t matter Jack when compared to what does! As long as it can record with a level of clarity and sensitivity and we are all in the same boat to missing what really matters and a £1.60 mic is as good as any other.

So its the same performance in a more diy and less feature complete package as its just a mic. The price advantage is non existent as i would have to get all the other components separately.
I see the positives being choosing your own amp and maybe doing ec if playing music from the same device.
Thank you thats all I wanted to know.

It can use equally cheap directional electrets and likely perform better as it will garner an element of noise suppression if placed where directionality can have effect.
The components you use is choice but all you need is a usb soundcard & mic module and because they are components any upgrade upgrades only the component not the whole.

The https://www.microchipdirect.com/product/ZL38063LDG1 does seem reasonably priced and how it compares to the Xmos chip in the respeaker or my Anker powerconf I dunno but not all that impressed with the Xmos chip or even Google or Amazon units when 3rd party TV or media is the predominant noise.

A small run would just produce a costly product for very little gain but if someone like Raspberry uses its economies of scale that would be a different matter.
I guess if Raspberry gets to a stage of providing a cost effective AI accelerator then its likely we will see something like the ZL38063LDG1.

If we where to make something as a community I wouldn’t advocate any beamformer or the current infrastructure rhasppy uses as all that would remain is the py of the language that is programmed.

Rather than try the impossible with far field beamformers it is getting at a cost level with ESP32 boards to create near field multiple distributed mics.
The ESP32-A1S has a built in codec (the one in the pi 4 mic hat) that could interface directional electrets.

http://www.ai-thinker.com/Uploads/file/20190715/20190715141756_57655.pdf

Its capable of running a KWS that runs locally but provides KWS hit score so that a central HAL(2001) type rhasspy can choose the best mic for a single ASR session.
Opensource has been copying the commercial product we see and not providing solutions to rival commercial infrastructure.
You can provide a central system with GPU with multiple room mics to service a room and if the system is also the audio server then we can provide EC for all problematic domestic noise.

But a lesser stage would be to provide a Rhasspy sound bar that the TV can provide audio pass through and then you have solved the problem of 3rd party media as you bring it to the system and it is no longer 3rd party.

Here is a Pi 2 mic hat.

Near
https://drive.google.com/open?id=1w6qK8BYCd-gtfwXP4eh6dfyflELzZ0yM
Far @ 3m.
https://drive.google.com/open?id=119ifVznzy57AT-LkkQ9a_Ev5q6nNnxkh

I actually had this up and running and thought I would post.

I tried with the hardware ALC & Noise gate and it just didn’t seem right so turned it off set the gain of the 2mic to 34(9db)

Added this to /etc/voicecard/asound_2mic.conf

# The IPC key of dmix or dsnoop plugin must be unique
# If 555555 or 666666 is used by other processes, use another one


# use samplerate to resample as speexdsp resample is bad
defaults.pcm.rate_converter "samplerate"

pcm.!default {
    type asym
    playback.pcm "playback"
    capture.pcm "capture"
}

pcm.playback {
    type plug
    slave.pcm "dmixed"
}

pcm.capture {
    type plug
    slave.pcm "array"
}

pcm.dmixed {
    type dmix
    slave.pcm "hw:seeed2micvoicec"
    ipc_key 555555
}

pcm.array {
    type dsnoop
    slave {
        pcm "hw:seeed2micvoicec"
        channels 2
    }
    ipc_key 666666
}

pcm.agc {
 type speex
 slave.pcm "sum"
 agc 1
 agc_level 4000
 denoise no
 dereverb no
}

pcm.sum {
 type plug
 slave {
   pcm "array"
   channels 2
   }
 route_policy sum
}

So its sums to mono then runs agc and recorded via.
arecord -D agc -r16000 -fS16_LE far.wav

I spent considerable time trying to sort the hardware ALC but just don’t have patience with that card any more.
Its software gain and the far @ 3.2m (if we are going to be exact) is far from impressive.

If there is anyone who is a fan of the Pi2mic maybe they can help out kibo with Recognize command with music

Hmm mine doesn’t sound like this:
https://drive.google.com/file/d/1x0jGJLIDa1Sa-AFRYHPE0aTfDNfaWYro/view?usp=sharing
This is the 2 mic hat recording a tv that is ca 4m away. It’s recorded with a sox record command with a compand effect with a transfer function that effectively acts as an agc. The tv is running at normal room level and there is no other effects. Sorry for the amount of base but we have a 2.1 system with a big subwoofer connected to the tv.

Edit here is another one with some talking by me and my girlfriend over the tv between 3 and 4 meters away from the mic and talking away from it:
https://drive.google.com/file/d/1gu3EqmqkDQVDIUmDRfBaHjFeJTKJ_vUL/view?usp=sharing

You need to record the exact same at the same volume at near & far as otherwise its impossible to tell what that should be like.
As we can only tell by comparing the 2 if they are the same source.
Maybe you should post the sox settings & commands you have as it is not likely to sound the same if you are using a different effect.

Do you not have a bluetooth speaker or something that you can move or move the rhasspy to record a prerecorded playback each time.
Also why are you using sox when this 2mic is supposedly so good with all functionality all in?

never said anything about agc

Because Im the developer of https://github.com/johanneskropf/node-red-contrib-sox-utils and this is what i use in combination with voice2json and as I find sox to be a great all around tool offering much more flexibility than arecord or parec. This was recorded straight from nodered. The compand is the only applied effect. Its pretty much the same as doing sox -t alsa plughw:1,0 -L -e signed-integer -c 1 -r 16000 -b 16 test.wav trim 0 60 compand 0.1,0.3 -40,-40,-20,-10,0,-10 -10 -60 from the commandline. The installation of the 2 mic pi hat is stock with no settings changed.

near (1.5 meters):
https://drive.google.com/file/d/146HSYi1_PYQInMD6CSC0V7RXvsj3jXJO/view?usp=sharing

far (4.5 meters):
https://drive.google.com/file/d/1bQPYgqjH-tyE1ItDhXZms-Q3uD4FsbAr/view?usp=sharing

With just default settings no alc, agc or anything with the default gain of 40 (12db)

Running near and far @3m.

Near
https://drive.google.com/open?id=1YRUMobeWg5LJ3W7s1cRYhR05j12AmEY_
Far
https://drive.google.com/open?id=1e8mh0whva8doO9WKRLX03THYVKXFx3vR

It really sounds to me like at least for the 2mic hat the speex agc is doing no favors and your better of without it.

Well that is a problem as have you seen the amplitude of the last far signal and that is only @ 3m.

The USB mic @3 meter looks like this.

Problem is without Speex AGC due to the ALC not seeming to working correctly on the 2 mic your signal is woefully low.

I ve been playing some more with the sox compand. Added a noise gate and will see how I go:

compand 0.1,0.2 -inf,-35.1,-inf,-35,-35,-25,-12,0,-12 -12 -60 0.1

Edit:
Also I think the sox compand transfer function agc is doing a better job than the speex when you look at the far example at 4.5 meters above.

I just posted above a respeaker not running AGC with the sox compand and the picture is there right above and its far field is truly awful.

sox -t alsa plughw:1,0 -L -e signed-integer -c 1 -r 16000 -b 16 test.wav trim 0 60 compand 0.1,0.3 -40,-40,-20,-10,0,-10 -10 -60

Here I have got my posh Max9814 boards that are 3.6v-12v supply with own regulator and jumpers to select A/R & gain.
Its being fed from the 5v rail this time and because once more I ripped the solder pad off the cheaper Max9814 boards trying to remove the electret I dug one these out as the mic connector is on jumper leads.
Apart from that essentially the same but this does let me try out an el cheapo 20p directional electret.

Its the same thing again but this time near-rear has the mic facing the opposite way as its directional.
far-rear has a much lesser effect because of room echo and dispersion of sound so much less hits the rear as it does in the near-far sample.
Also distance sensitivity is much less because again due to room echo and dispersion of sound much more hits the rear of the mic.

This is where a sound card and mic module really shines as its the only form of 3rd party noise suppression available to us without expensive beamforming.

https://drive.google.com/open?id=15fGs3rwbAAO3nI5RPLapGTKAv9x00MvP

https://drive.google.com/open?id=1wKbfUc7-qHCVYSQvVBBqD0xnLi7O9SBO

https://drive.google.com/open?id=1wUTH2DLO7O4OZZP_hFo_o4M0nsOxvcAd

https://drive.google.com/open?id=1nMi5EStkPOv5zZn5QejRvvpER3hD90je

Its placement and using directionality but as you can see it will attenuate noise from the rear.

There are better electrets than the el cheapo 20p china ones but struggled to source them prob would have to buy more than just a couple and need to try and find them again.

They exist like these out of a cheap £10 directional microphone.

I think I ll stay with the 2 mic pi hat and sox for now as building a nice compact satellite out of this with a button as an additional stt trigger, an amp, rgb leds, a speaker and no cables visible apart from the power cable really would be pain and not one I would want to have four times which is how many satellites I have right now. Soon it will be six. I can see this on my workbench for playing but not as something that would be accepted as a general solution in the household that everybody has to be happy with.
It would really be a tiny improvement for hours I’d have to spent to build it.
That’s the charme of the hat. Takes me 3 minutes to assemble another satellite.
But thanks for the noise gate inspiration, I think it will be a nice addition to my sox command.
And at least for me the 2mic works fine up too more than four meters set up like this.

If that is what your happy with the fair enough but when I tried the commands you forwarded for sox with a Pi2Mic with the default settings you said of 40(12db) gain far field was absolutely tragic.

But hey if your happy with that go for it.