Best usb microphone for rhasspy

I have a Jabra 510 I picked up used for ~30 -40 USD.
It can pick up voice anywhere in the main room, and even side rooms if its quiet in the house. I have been very happy with it.

You got a great price on that used Jabra, especially the 510.

Yes, I did.
Which is why after ordering when they changed from free shipping to $5 I didn’t raise hell… it was still <$45. I hadn’t found one for twice that… was very happy.

I found it on some trade site I had never heard of… Mercari I think it was called. I tried finding other stuff, but that was the only thing I’ve found so far that was an actual deal.

I have been very happy with it. Actually just finished doing some custom stuff… created a custom intent handler, and now I can adjust the volume with voice command, and play MP3’s.

In my opinion the PSeye Cam is totally underestimated. Its only 10 bucks at eBay and offers a mic array much better than in any affordable mic you can buy.

Here is a good read with additional links: https://www.cnx-software.com/2019/08/30/using-sony-ps3-eye-camera-as-an-inexpensive-microphone-array/amp/

If you want to be able to stop Rhasspy playing media and barge in with a “Stop” then unless you have an all in one device that has built in AEC or audio out/in on the same card and clock (soundcard) then you will not.

People seem to quote the urban myth that the PS3eye cam that I have actually tested is great or that the mic array has any advantage or is greatly more sensitive as its not.
The PS3 had beamforming algs which with linux we do not so it is merely a mic with bad drivers that was never designed to run on a PC and it runs because someone managed to do a reverse engineer hack and it was never completed in full.

I do play media so the idea of a Gump Rhasspy isn’t one for me and that means all USB mics as you will not get a linux EC alg for it unless its built in.

My ps3eye cam is in my spare parts bin and most people who recommend have never used one with rhasspy.

1 Like

Sorry but that is simply bullshit.

I have a bunch of PSeye Cams here and I am actually using one with rhasspy. It has excellent detection in the whole room, even when there is a soap or whatever running in the TV. Of course there are limits to this with ANY device. If the surrounding sound is just so loud that your commands can’t be identified - well. Maybe you just failed to create a proper setup? There are also directions in the articles I posted where there is described how to properly install that cam on Linux and downmix the 4 array channels to one. However I never tried that as the results are so excellent.

The 2 mic Pi Hats on my satellites perform much poorer. While the one with PSeye gets me almost every time, the hats are just a hassle and will be replaced by PSeye soon, too.

I highly suspect that you are the one that never tried that cam as a mic.

Next thing: It should now be possible to detect hotword detection and mute TV / HiFi sound as long as rhasspy is listening. However, I never tried it as it isn’t necessary here.

There is no BS the PS3eye cam is no more sensitive than any other as the beamforming algs where part of the PS3 software and its that simple.

There is no setup for surrounding sound for any mic that doesn’t share a clock with audio out as Speexdsp will not work and pulseaudio AEC barely works at all on Arm Linux.

I never said the 2 mic hat was good as they are both bad for various reasons but with the PS3eye drivers don’t work currently and you can not even do a bog standard alsactl store without error.
Its sort of in the name aka “PS3” and even though USB it was never released with windows or linux drivers.

Jim Paris the guy who reverse engineered the PS3eye if you email just says it was too long ago to remember and doesn’t know of fixes.
I have used it and emailed the drivers author and its a useless pile of bullshit that has an array that apart from summing which gives zero benefit has a cam that is nowadays relatively useless so ends up being this bulky pointless addition to a voice AI.

Next thing: It should now be possible to detect hotword detection and mute TV / HiFi sound as long as rhasspy is listening. However, I never tried it as it isn’t necessary here.

Go on tell us how or media playing and I will tell you now there isn’t a way with a ps3eye as someone has tried and tested.

Sorry but I honestly don’t know what you are talking about.

My PSeye cam is performing excellent, Hotword detection is fine with TV sound in the room, commands are mostly understood well with background noise, however it sometimes takes longer for rhasspy to notice that the input is finished - thats all I know. Whatever you want to tell me about algroithms and whatever - I just don’t care. Fact is for 10 bucks you get an extremely well performing 4 mic arrayyou will not find elsewhere for that price.

That is exactly what I am telling you as without algs to use a mic array the mic array is no different to any mic.
That any mic only USB also excludes AEC.
Your 4 mic array counts for nothing and your presenting it as some sort of advantage that does nothing.

I am quoting you direct.

Next thing: It should now be possible to detect hotword detection and mute TV / HiFi sound as long as rhasspy is listening. However, I never tried it as it isn’t necessary here.

You reply

Sorry but I honestly don’t know what you are talking about.

So it would seem you don’t know what you are talking about.
Its a cheap Mic that has no advantage and has disadvantages over what you can do with a simple soundcard.

What I am saying and will say again there is no good USB microphone for Rhasspy unless AEC & Beamforming is built-in.
You can not mute or barge in as rhasspy may be listening but will be flooded with what you want to mute.

There is no array microphone that is any good for Rhasspy unless AEC & beamforming is built in as without the array is just a collection of mics with little benefit so any microphone will do.
Even with variance of sensitivity its still going to rely on good AGC to normalise far & near audio.

You can use a $1.50 soundcard and a £1 mic module and get as good a result as your wonderful array that also gives good quality audio and and also can do AEC.
Its also works in entirety without bug or glitch that you clearly have not used and tested enough or you would know what you are talking about, never mind me.

You see, people can believe whatever they want and throw acronyms about AEC stuff on me and talk about algorithms to tell me how elaborate their knowledge is. But it doesn’t change the fact that the PSeye cam with it’s 4 mic array is performing extraordinarily good, much better than most expensive mics.

You don’t need to believe it, but in the end it’s a fact. I see it from my experience and if you check the link I posted you will find links where people tested it on a more evident based way than my feeling or experience.

And thats my last answer to this because I don’t want to spam this thread with “who knows better” bullshit. I just don’t find it ok that you tell people that thing is no good buy or not performing better than a cheap 1$ mic because it is just not true. In the real world I did not find any affordable mic performing better, before for snips and now for rhasspy. Sure, I did not test ten or twenty different mics and their might be as good or better ones, but I don’t know them.

The thing is it doesn’t you are comparing against a respeaker 2mic which for some reason people think is good and only thing I agree is that the respeaker 2 mic is actually a bit mweh.

Its also an array that is absolutely pointless because it has no beamforming and its down to basic audio engineering and physics not acronyms.
From antennas to mic arrays without beamforming the array is no better than a single sensitive omnidirectional mic.
A directional mic will actually be better that an array without beamforming to make it directional.

Also unless the mic SNR is really low and the sensitivity is really high the SNR will sum in an array and can be more than the gain and actually be worse than a single mic.

The fact is a 1$ mic module on a $1.50 usb soundcard uses the same electret microphone module but has a more modern op amp of higher gain.
That is sound engineering, that is what is available today on aliexpress and ebay if you shop around.

Ok, guys. I think you two should just agree to disagree.

Clearly the term “good” has multiple interpretations. What one person thinks as “good” is probably totally unacceptable for someone else.

Maybe @rolyan_trauts can create a feature table of some sort and we can make it sticky.
The table should hold various mics and various features, since it seems to me (also on other topics) he has done a lot of testing.
Please keep in mind that the majority of users is not technically educated with sound so some explaining on those features would be nice. Also, try to judge the mics on the general use case which I believe is not high end consumer ready but a bit more tinker level.
Rank them with that in mind and not for technical perfection.

I do not know if you are willing to do that, but it might actually help a lot of people

2 Likes

That is a good idea @romkabouter

Paul you also do the ESP32 matrix thing and know quite a bit about mics and have an idea to run past you that maybe your C and esp32 knowledge can develop something fairly quickly or at least test.

Someone posted DIY Alexa on ESP32 with INMP441

Which perked my interest as didn’t know tensorflow and a KWS would work that well on ESP32.

Beamforming and the lack of opensource on linux is another thing of interest and this article got me thinking where we can do poor mans beamforming and at least create directional microphones.
I presume this is what many “shotgun mics” do.

The I2S on the esp32 is stereo so could you create a ring buffer to create a delay of the distance of te speed of sound between the 2 and subtract one from the other as in the above application note.
Then meld that into the tensorflow KWS so its uses the audio-kit AMR-W to create a wifi mic that sends on keyword till silence?

I will have a go at a rough table and there are mics available but mass produced product for USB mics focusses almost solely on desktop near field mics and its true most are not that good far field.

I have ordered a M5 Atom Echo for this purpose. I want to see if I can get the alexa code working and after that to see if I can run the AudioStreamer as well
The Matrix Voice should be able to do beamforming, but I did not investigate that

There are 2 basic mic technologies that we use the older type of electret condenser mic that the PS3eye contains and Mems microphone.
There are others but for various reasons mainly cost and size we don’t use.

So Mems & Electret and the only advantage of mems seems to be size and maybe they are a tad more sensitive with lower SNR.
Mems seem to be only omnidirectional as electrets have versions with holes in the back to allow sound cancel on the diaphragm to create noise reduction and directivity but this does seem to lose sensitivity.

https://www.cuidevices.com/blog/comparing-mems-and-electret-condenser-microphones

But the whole range and directionality is here.

The PS3eye https://en.wikipedia.org/wiki/PlayStation_Eye is missing all the great stuff of multi-directional voice location tracking, echo cancellation, and background noise suppression because that part was software by Sony in the PS3.

The only thing is the SNR of 90db but I am really struggling to believe that as that is a ridiculous figure that beats high end recording studio equpiment but might be far field pickup after Sony do their software trickery.

There is a good article here on SNR mems vs electret and ASR.
https://www.arrow.com/en/research-and-events/articles/why-you-need-high-performance-ultra-high-snr-mems-microphones

There is one saving grace with SNR and far field with ASR and its part of the MFCC spectrogram process as low order energies are dropped as part of the MFCC process and produces noise reduction.
So SNR becomes less of an issue but what we want to do is create a similar spectrogram image to the ones contained in our model.

The tensorflow model images are often via desktop / headset microphones that are optimised in volume and normalised so for far field we just need gain and AGC.

I think that is what the PS3eye is good at and that it has some sort of voice optimised AGC built in.
AGC has 2/3 parameters Attack/Hold/decay or Attack/Release and for voice it needs to be slower than what would be needed instruments.
I am not really sure about AGC just that some AGC seems to work much better with recognition than others.
I think if its too quick and reactive it can alter the spectrogram so have a tendency to turn off hardware AGC and use the alsa-plugin SpeexDSP-agc as it was designed for voice.

The 2 mic respeaker is sort of weird as its a planar mic so the mics are actually at 90 degrees which I think makes it less sensitive than the PS3eye that you can point at source.
Its set out like a beamforming mic so there is no obstruction from the measured signal from one mic to the other but has no software to do it?!

You can get a cheap soundcard and cheap active mic module, turn off any hardware AGC up the gain and run with SpeexDSP AGC and get really good far field.
Also it gives you high quality audio out and shares a single codec so you don’t get clock drift so you can also do AEC and cancel any media you are playing.

I have used the 2mic, 4 mic linear and 4 mic hat from respeaker and its just strange to have planer array mics that don’t have software to use the orientation of design or array.
The PS3eye had wonder Algs but unforunately they are contained and embedded in your PS3.

I also have a Anker Powerconf £90 usb conference mic on my desktop and I presume its very similar to Respeaker USB models where for cost its a big mweh from me.
It actually tries to do some things as its a broadcast mic and not a recognition mic that might not be needed at all and has cost that is not needed.

You just need a single mic with good gain and AGC and there is not a good USB recognition mic as generally much of what is included is just a bit pointless.
There is a whole rake of great broadcast usb mics that have slightly different but distinct design criteria.

I did wonder also if you could use a cheap PT2399 analogue audio delay as @ approx 100ms the quality is still reasonable with those.
I spent ages trying to find an alsa-plugin just to create a channel delay as a alsa channel mapping of -1 will invert a channel so then you can just sum them so with 2 mics, stereo sound card or 2mic array you could make directional, but was surprised I could not find anything.
I guess you guys are playing with https://github.com/espressif/esp-sr/ though.

I have been wondering if just to have multiple wifi mics connected to a central ASR and pick the mic with the highest KW hit for ASR.
Its not beamforming just picking best & nearest mic.

Well, the wakenet part is in the streamer, so yeah :smiley:
But I am not using the speech recognition.

Kind like this?

No just far simpler distributed KWS mics that give an array at a central ASR, no far field just nearest.
Guess in way it is as the best KW hit will also be the one your facing but no need for fancy algs.

I might have a go with the ESP32 KWS example but have a n aversion to dev nowadays.

I got round to testing some max9812 boards tiny thing 3.3v and some directional mems.

They don’t have any agc on board so using software which I seem to prefer and don’t come with a mic onboard so makes soldering a little bit easier.
I just use jumper leads and 2.54mm single strip pins.

I tested on my syba usb sound card and needed the gain up high as its a line-in card stereo mic version but they will be extremely sensitive on a mono mic level card.

Did a near & far test @ approx 3m.

https://drive.google.com/open?id=1057XNQe5fyMfuvk3a5PlbgPHKDTDzoB1

https://drive.google.com/open?id=18ZZd67NJYczluiF46KQ5VNaFwcgXLTW6

Noise is excellent as in the above that is just the AGC raising the gain in silence pauses.

/etc/asound.conf

pcm.array {
 type hw
 card Device
}

pcm.sum {
 type plug
 slave {
   pcm "array"
   channels 2
   }
 route_policy sum
}

pcm.agc {
 type speex
 slave.pcm "sum"
 agc 1
 agc_level 2000
 denoise no
 dereverb no
}

Update alsa-plugins with a one liner to reinstall alsa-plugins and updating speex/speexdsp.
curl -sL https://raw.githubusercontent.com/StuartIanNaylor/Alsa-plugins-speex-update/main/Alsa-plugins-speex-update.sh | sudo -E bash -

The syba SD-AUD20101 (£20!) seem to of gone up in price so don’t think stereo and sum is worth it over a single mic and £1.50 usb sound card.
The max9812 I like as said if not using a pre-supplied omni that comes with the max9814

2 Likes