As I said im interested in real life use experience you’ve had with this combination? As you said yourself hardware doesn’t tell the story with its price.
You list alot of theory about why it should work but how did it actually perform in your living room? How many false positives with which engine (raven/precise/snowboy/porcupine) did you have compared to other hardware combinations in a room with tv/music. How high would you say was your stt hit rate with kaldi or deepspeech?
Thats all I want to know because that is what counts in the end and would make me consider the hardware instead of for example a 2 mic pi hat which costs the same in the end once i shop all the components to have feature parity. So i would be very thankful if you could share those real life experiences.
I for example often set up the different mics at the same time when i get a new one to try so that I can have a few days of using them around the clock in production side by side and have that in use comparison. So I would be very thankful if you could share your in use experience with Rhasspy/Voice2json/Linto/Mycroft and this hardware.
actually i see the usb soundcard as a weakness. Not from an audio standpoint as you state its positives from a hardware perspective but from a design standpoint.
Having the necessity to have anything attached to the usb ports makes it harder to design small minimal satellites that dont look like a Raspberry or anything too much diy. This is of course a question of taste but for something I have in every room and my partner has to be happy to use too its easier to build a clean compact solution when everything is attached to the gpios.
I really wish there were better all in one solutions in this field as i think a majority of the people trying something like Rhasspy are just fleetingly familiar with the commandline and dont want to solder and buy a number of handpicked components and than diy a case or have the naked concoction lying around. You yourself titled this post simple usb mic and it might be for you but for many people dipping there feet into something like Rhasspy the hardware you described above will be an advanced diy project.
For an adoption that is more widespread than now and for communities like this to grow I think we also have to offer easy hardware buying advice that is plug and play. We need cases which can easily ordered on treatstock to be printed (maybe even with a Rhasspy Logo?) that fit those common hardware choices. Hardware knowledge cant be the entry hurdle.
People like you @rolyan_trauts and your knowledge are very important for this community and i dont want to discourage you or start a fight i just want you to see that side of the equation too.
Maybe you could design a pcb one day with those components that you found and investigated and we could than get seeed to manufacture a small run of a Rhasspy aec agc rgb led low cost sound hat. Id be happy to built / design a case for that.
A microphone in itself will make no difference to hardware combinations to a room with predominant noise from TV/music if its omnidirectional.
If its directional then you will get an element of noise suppression through directionality.
Unless you have advanced beamforming and even then when there is predominant noise via 3rd/party tv & media it will kill any recognition.
Beamformers work exceptionally well in distributed noise fields of say industrial locations but what could be common in domestic of TV voice or loud music of a single source that floods voice not so.
I have been using a whole collection of mics from Kaldi, Precise, Snowboy, procupine, deepspeech, Mycroft & Linto plus various tensor flow models hacked by myself and compared to room noise, models and my own dialect the microphone makes absolutely no difference and I am not going to start recording empirical evidence of going through everything I have tested again just to satisfy you of what I already know.
This is what I am saying as you can get a £1.60 mic module and it will be equally as bad as your respeaker 2 mic in your case as opensource doesn’t have the algs Google & Amazon print on silicon and we process on a Pi! not Google & Amazon AI accelerated state of the art data centers.
Your mic doesn’t matter Jack when compared to what does! As long as it can record with a level of clarity and sensitivity and we are all in the same boat to missing what really matters and a £1.60 mic is as good as any other.
So its the same performance in a more diy and less feature complete package as its just a mic. The price advantage is non existent as i would have to get all the other components separately.
I see the positives being choosing your own amp and maybe doing ec if playing music from the same device.
Thank you thats all I wanted to know.
It can use equally cheap directional electrets and likely perform better as it will garner an element of noise suppression if placed where directionality can have effect.
The components you use is choice but all you need is a usb soundcard & mic module and because they are components any upgrade upgrades only the component not the whole.
The https://www.microchipdirect.com/product/ZL38063LDG1 does seem reasonably priced and how it compares to the Xmos chip in the respeaker or my Anker powerconf I dunno but not all that impressed with the Xmos chip or even Google or Amazon units when 3rd party TV or media is the predominant noise.
A small run would just produce a costly product for very little gain but if someone like Raspberry uses its economies of scale that would be a different matter.
I guess if Raspberry gets to a stage of providing a cost effective AI accelerator then its likely we will see something like the ZL38063LDG1.
If we where to make something as a community I wouldn’t advocate any beamformer or the current infrastructure rhasppy uses as all that would remain is the py of the language that is programmed.
Rather than try the impossible with far field beamformers it is getting at a cost level with ESP32 boards to create near field multiple distributed mics.
The ESP32-A1S has a built in codec (the one in the pi 4 mic hat) that could interface directional electrets.
Its capable of running a KWS that runs locally but provides KWS hit score so that a central HAL(2001) type rhasspy can choose the best mic for a single ASR session.
Opensource has been copying the commercial product we see and not providing solutions to rival commercial infrastructure.
You can provide a central system with GPU with multiple room mics to service a room and if the system is also the audio server then we can provide EC for all problematic domestic noise.
But a lesser stage would be to provide a Rhasspy sound bar that the TV can provide audio pass through and then you have solved the problem of 3rd party media as you bring it to the system and it is no longer 3rd party.
I actually had this up and running and thought I would post.
I tried with the hardware ALC & Noise gate and it just didn’t seem right so turned it off set the gain of the 2mic to 34(9db)
Added this to /etc/voicecard/asound_2mic.conf
# The IPC key of dmix or dsnoop plugin must be unique
# If 555555 or 666666 is used by other processes, use another one
# use samplerate to resample as speexdsp resample is bad
So its sums to mono then runs agc and recorded via.
arecord -D agc -r16000 -fS16_LE far.wav
I spent considerable time trying to sort the hardware ALC but just don’t have patience with that card any more.
Its software gain and the far @ 3.2m (if we are going to be exact) is far from impressive.
Hmm mine doesn’t sound like this: https://drive.google.com/file/d/1x0jGJLIDa1Sa-AFRYHPE0aTfDNfaWYro/view?usp=sharing
This is the 2 mic hat recording a tv that is ca 4m away. It’s recorded with a sox record command with a compand effect with a transfer function that effectively acts as an agc. The tv is running at normal room level and there is no other effects. Sorry for the amount of base but we have a 2.1 system with a big subwoofer connected to the tv.
You need to record the exact same at the same volume at near & far as otherwise its impossible to tell what that should be like.
As we can only tell by comparing the 2 if they are the same source.
Maybe you should post the sox settings & commands you have as it is not likely to sound the same if you are using a different effect.
Do you not have a bluetooth speaker or something that you can move or move the rhasspy to record a prerecorded playback each time.
Also why are you using sox when this 2mic is supposedly so good with all functionality all in?
Because Im the developer of https://github.com/johanneskropf/node-red-contrib-sox-utils and this is what i use in combination with voice2json and as I find sox to be a great all around tool offering much more flexibility than arecord or parec. This was recorded straight from nodered. The compand is the only applied effect. Its pretty much the same as doing sox -t alsa plughw:1,0 -L -e signed-integer -c 1 -r 16000 -b 16 test.wav trim 0 60 compand 0.1,0.3 -40,-40,-20,-10,0,-10 -10 -60 from the commandline. The installation of the 2 mic pi hat is stock with no settings changed.
Here I have got my posh Max9814 boards that are 3.6v-12v supply with own regulator and jumpers to select A/R & gain.
Its being fed from the 5v rail this time and because once more I ripped the solder pad off the cheaper Max9814 boards trying to remove the electret I dug one these out as the mic connector is on jumper leads.
Apart from that essentially the same but this does let me try out an el cheapo 20p directional electret.
Its the same thing again but this time near-rear has the mic facing the opposite way as its directional.
far-rear has a much lesser effect because of room echo and dispersion of sound so much less hits the rear as it does in the near-far sample.
Also distance sensitivity is much less because again due to room echo and dispersion of sound much more hits the rear of the mic.
This is where a sound card and mic module really shines as its the only form of 3rd party noise suppression available to us without expensive beamforming.
I think I ll stay with the 2 mic pi hat and sox for now as building a nice compact satellite out of this with a button as an additional stt trigger, an amp, rgb leds, a speaker and no cables visible apart from the power cable really would be pain and not one I would want to have four times which is how many satellites I have right now. Soon it will be six. I can see this on my workbench for playing but not as something that would be accepted as a general solution in the household that everybody has to be happy with.
It would really be a tiny improvement for hours I’d have to spent to build it.
That’s the charme of the hat. Takes me 3 minutes to assemble another satellite.
But thanks for the noise gate inspiration, I think it will be a nice addition to my sox command.
And at least for me the 2mic works fine up too more than four meters set up like this.