Simple cheap USB Microphone / Soundcard

I will give that a try but helps for all so that also an they.

PS Adafruit recently got into the 2x Mix and so annoying as they justy copied the Seed 2 mic, the mics are analogue so could of been the 1st pi complete soundcard.
Even used the respeaker drivers.

This is a far and near sample reading at normal volume with settings from above that have both alc and noisegate enabled:

numid=12,iface=MIXER,name='Headphone Playback ZC Switch'
  ; type=BOOLEAN,access=rw------,values=2
  : values=off,off
numid=11,iface=MIXER,name='Headphone Playback Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=127,step=0
  : values=126,126
  | dBscale-min=-121.00dB,step=1.00dB,mute=1
numid=17,iface=MIXER,name='PCM Playback -6dB Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=57,iface=MIXER,name='Mono Output Mixer Left Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=58,iface=MIXER,name='Mono Output Mixer Right Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=41,iface=MIXER,name='ADC Data Output Select'
  ; type=ENUMERATED,access=rw------,values=1,items=4
  ; Item #0 'Left Data = Left ADC;  Right Data = Right ADC'
  ; Item #1 'Left Data = Left ADC;  Right Data = Left ADC'
  ; Item #2 'Left Data = Right ADC; Right Data = Right ADC'
  ; Item #3 'Left Data = Right ADC; Right Data = Left ADC'
  : values=0
numid=19,iface=MIXER,name='ADC High Pass Filter Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=36,iface=MIXER,name='ADC PCM Capture Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=255,step=0
  : values=195,195
  | dBscale-min=-97.50dB,step=0.50dB,mute=1
numid=18,iface=MIXER,name='ADC Polarity'
  ; type=ENUMERATED,access=rw------,values=1,items=4
  ; Item #0 'No Inversion'
  ; Item #1 'Left Inverted'
  ; Item #2 'Right Inverted'
  ; Item #3 'Stereo Inversion'
  : values=0
numid=2,iface=MIXER,name='Capture Volume ZC Switch'
  ; type=INTEGER,access=rw------,values=2,min=0,max=1,step=0
  : values=0,0
numid=3,iface=MIXER,name='Capture Switch'
  ; type=BOOLEAN,access=rw------,values=2
  : values=on,on
numid=1,iface=MIXER,name='Capture Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=63,step=0
  : values=32,32
  | dBscale-min=-17.25dB,step=0.75dB,mute=0
numid=10,iface=MIXER,name='Playback Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=255,step=0
  : values=255,255
  | dBscale-min=-127.50dB,step=0.50dB,mute=1
numid=23,iface=MIXER,name='3D Filter Lower Cut-Off'
  ; type=ENUMERATED,access=rw------,values=1,items=2
  ; Item #0 'Low'
  ; Item #1 'High'
  : values=0
numid=22,iface=MIXER,name='3D Filter Upper Cut-Off'
  ; type=ENUMERATED,access=rw------,values=1,items=2
  ; Item #0 'High'
  ; Item #1 'Low'
  : values=0
numid=25,iface=MIXER,name='3D Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=24,iface=MIXER,name='3D Volume'
  ; type=INTEGER,access=rw------,values=1,min=0,max=15,step=0
  : values=0
numid=33,iface=MIXER,name='ALC Attack'
  ; type=INTEGER,access=rw------,values=1,min=0,max=15,step=0
  : values=5
numid=32,iface=MIXER,name='ALC Decay'
  ; type=INTEGER,access=rw------,values=1,min=0,max=15,step=0
  : values=6
numid=26,iface=MIXER,name='ALC Function'
  ; type=ENUMERATED,access=rw------,values=1,items=4
  ; Item #0 'Off'
  ; Item #1 'Right'
  ; Item #2 'Left'
  ; Item #3 'Stereo'
  : values=3
numid=30,iface=MIXER,name='ALC Hold Time'
  ; type=INTEGER,access=rw------,values=1,min=0,max=15,step=0
  : values=6
numid=27,iface=MIXER,name='ALC Max Gain'
  ; type=INTEGER,access=rw------,values=1,min=0,max=7,step=0
  : values=4
numid=29,iface=MIXER,name='ALC Min Gain'
  ; type=INTEGER,access=rw------,values=1,min=0,max=7,step=0
  : values=0
numid=31,iface=MIXER,name='ALC Mode'
  ; type=ENUMERATED,access=rw------,values=1,items=2
  ; Item #0 'ALC'
  ; Item #1 'Limiter'
  : values=0
numid=28,iface=MIXER,name='ALC Target'
  ; type=INTEGER,access=rw------,values=1,min=0,max=15,step=0
  : values=2
numid=21,iface=MIXER,name='DAC Deemphasis Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=42,iface=MIXER,name='DAC Mono Mix'
  ; type=ENUMERATED,access=rw------,values=1,items=2
  ; Item #0 'Stereo'
  ; Item #1 'Mono'
  : values=0
numid=20,iface=MIXER,name='DAC Polarity'
  ; type=ENUMERATED,access=rw------,values=1,items=4
  ; Item #0 'No Inversion'
  ; Item #1 'Left Inverted'
  ; Item #2 'Right Inverted'
  ; Item #3 'Stereo Inversion'
  : values=0
numid=45,iface=MIXER,name='Left Boost Mixer LINPUT1 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=43,iface=MIXER,name='Left Boost Mixer LINPUT2 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=44,iface=MIXER,name='Left Boost Mixer LINPUT3 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=9,iface=MIXER,name='Left Input Boost Mixer LINPUT1 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=3,step=0
  : values=3
  | dBrange-
    rangemin=0,,rangemax=1
      | dBscale-min=0.00dB,step=13.00dB,mute=0
    rangemin=2,,rangemax=3
      | dBscale-min=20.00dB,step=9.00dB,mute=0

numid=5,iface=MIXER,name='Left Input Boost Mixer LINPUT2 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-15.00dB,step=3.00dB,mute=1
numid=4,iface=MIXER,name='Left Input Boost Mixer LINPUT3 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-15.00dB,step=3.00dB,mute=1
numid=49,iface=MIXER,name='Left Input Mixer Boost Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=53,iface=MIXER,name='Left Output Mixer Boost Bypass Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=37,iface=MIXER,name='Left Output Mixer Boost Bypass Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-21.00dB,step=3.00dB,mute=0
numid=52,iface=MIXER,name='Left Output Mixer LINPUT3 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=38,iface=MIXER,name='Left Output Mixer LINPUT3 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-21.00dB,step=3.00dB,mute=0
numid=51,iface=MIXER,name='Left Output Mixer PCM Playback Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=35,iface=MIXER,name='Noise Gate Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=34,iface=MIXER,name='Noise Gate Threshold'
  ; type=INTEGER,access=rw------,values=1,min=0,max=31,step=0
  : values=20
numid=48,iface=MIXER,name='Right Boost Mixer RINPUT1 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=46,iface=MIXER,name='Right Boost Mixer RINPUT2 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=47,iface=MIXER,name='Right Boost Mixer RINPUT3 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=8,iface=MIXER,name='Right Input Boost Mixer RINPUT1 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=3,step=0
  : values=3
  | dBrange-
    rangemin=0,,rangemax=1
      | dBscale-min=0.00dB,step=13.00dB,mute=0
    rangemin=2,,rangemax=3
      | dBscale-min=20.00dB,step=9.00dB,mute=0

numid=7,iface=MIXER,name='Right Input Boost Mixer RINPUT2 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-15.00dB,step=3.00dB,mute=1
numid=6,iface=MIXER,name='Right Input Boost Mixer RINPUT3 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=0
  | dBscale-min=-15.00dB,step=3.00dB,mute=1
numid=50,iface=MIXER,name='Right Input Mixer Boost Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=56,iface=MIXER,name='Right Output Mixer Boost Bypass Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=39,iface=MIXER,name='Right Output Mixer Boost Bypass Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=5
  | dBscale-min=-21.00dB,step=3.00dB,mute=0
numid=54,iface=MIXER,name='Right Output Mixer PCM Playback Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=on
numid=55,iface=MIXER,name='Right Output Mixer RINPUT3 Switch'
  ; type=BOOLEAN,access=rw------,values=1
  : values=off
numid=40,iface=MIXER,name='Right Output Mixer RINPUT3 Volume'
  ; type=INTEGER,access=rw---R--,values=1,min=0,max=7,step=0
  : values=2
  | dBscale-min=-21.00dB,step=3.00dB,mute=0
numid=16,iface=MIXER,name='Speaker AC Volume'
  ; type=INTEGER,access=rw------,values=1,min=0,max=5,step=0
  : values=4
numid=15,iface=MIXER,name='Speaker DC Volume'
  ; type=INTEGER,access=rw------,values=1,min=0,max=5,step=0
  : values=4
numid=13,iface=MIXER,name='Speaker Playback Volume'
  ; type=INTEGER,access=rw---R--,values=2,min=0,max=127,step=0
  : values=127,127
  | dBscale-min=-121.00dB,step=1.00dB,mute=1
numid=14,iface=MIXER,name='Speaker Playback ZC Switch'
  ; type=BOOLEAN,access=rw------,values=2
  : values=off,off

Near is 1m and far is 4.5 m. Excuse the dog paw sounds in the backgroud. The other noise you can hear is the mic picking up sounds from something my girlfriend is watching on her phone down the hallway in another room.
Near:
https://drive.google.com/file/d/1sYOM7rFpr9q7keAVey73cZuhrg0xlb_b/view?usp=sharing
Far:
https://drive.google.com/file/d/11no6UsZWwWYsf3kRc98HhD-9mMk6utMm/view?usp=sharing

What do you think about such settings then ? Does rhasspy better listen ?
Will test them asap

Well they did improve my experience but with my voice2json set up as described above.
You might want to make the decay / ramp up longer to improve vad detection with Rhasspy. You might also want to raise the alc max.
I have a setup where i can record in parallel to my assistant running. So I always save the last command so that i can listen in when something didn’t work. Or just record a few minutes randomly sometimes.
I found this to be the best way to tune the audio side as I can actually listen to what was happening and what the effect of certain settings were.

To compare if your hardware performs equally to mine it would be cool to have samples you recorded with the same settings.

Rhasspy will have better hearing as the low signals will get auto gain from the ALC.

When close the auto gain will be low when far auto gain will be high processed in the incoming signal.

When it comes to noise it depends on the noise vs voice ratio and no it will do nothing for that also as gain increases its also likely so will noise.

But the comparisons here would make some sense if approx levels and distances where given.
Any sample can be used really but the volume at the mic is what we are testing and the distance from the mic.

I have a el cheapo db meter but you need to describe the source and hope you have some level of similarity.

I was just interested in what you posted and some of your wav files as to be honest I don’t think there is any difference between any of the devices. Guess different mems might of been used but think they are are all just el cheapo’s on the 2 mic.


Might well help attenuate noise but I have never successfully set up a ladspa plugin in ALSA and have tried a few times :slight_smile:
Looks fairly easy for pulseaudio and guess you could mangle it back to alsa as you can make alsa use pulseaudio

I did get rnnoise going as a Alsa plugin and think its very much like webrtc as that a Pi3 may struggle.
It might be interesting how much difference a 2.0Ghz Pi makes as think clock speed here has much influence.

Summary

This text will be hidden

pcm.capture {
    # Add an ALSA plug for LADSPA
    type plug
    slave.pcm {
      # Add the LADSPA noise filter
      type ladspa
      slave.pcm {
        # Convert from float to int
        type lfloat
        slave {
          format "S16_LE"
          pcm "hw:CARD=Device"       # Use card 1 (e.g. USB webcam soundcard), device 0 (the default)
        }
      }
      # LADSPA configuration for the noise filter
      # See https://github.com/werman/noise-suppression-for-voice
      path "/usr/local/lib/ladspa"
      capture_plugins [
        {
          label noise_suppressor_mono
          input { controls [ 2 ] }      # VAD Threshold %
        }
      ]
    }
  }



git clone https://github.com/werman/noise-suppression-for-voice
cd noise-suppression-for-voice
cmake -Bbuild -H. -DCMAKE_BUILD_TYPE=Release
cd build
make
sudo make install

Here is a Pi4 going at 2.0Ghz and it doesn’t do that bad a job.
My office is cold and the fan heater is blowing on me and that is quite a severe test that actually it doesn’t make a bad attempt at.
The VAD sensitivity seems crazy high as the setting needs to be crazy low here its at 10% and not sure if its that, that is cutting slightly.

no-rnnoise
https://drive.google.com/open?id=1pIH5O_TP6YoNrp9ql2rr_t5LmnpMB9QE
rnnoise
https://drive.google.com/open?id=1_Qr-XaaaxEy-nQeiS8GaTVCWCd1egde_

Its actually a good attempt by the pi4 running pios 64bit lite @ 2.0Ghz

Rnnoise is pretty low-tech nowadays compared to the fresh rake of voice technologies that need really an X86+GPU with a min of approx GTX 1650.

Got a feeling its just clock speed as there is very little load.

You probably would still need to dial it back a bit. It does sound distorted and cut off a bit and im not sure that either the wakeword models or the asr models would love it as they tend to struggle a lot with heavily processed or to artificially quiet audio as that is just not what they were trained on.

You see that is something that has been said before and is slightly paradoxical.
As the argument of feeding denoised input into a model that has not been denoised is obviously going to be prone to artefact noise.

If you are creating models you must use the the tools you record with (aka run your dataset through denoise) or you force your tools to be as exact as the original.

rnnoise is old tech but if in spring the Pi4a does make an appearance its quite possible to make noise resilient denoised KWS models but really that infrastructure is a crazy model as all with AI quality and results = raw power and likely a single central processor with more horse power than a Pi due to the inherrent nature of sporadic voice commands could serve a much better multi user / multi room role.

I mean this is a Pi4 with rnnoise but you should hear the results of RTX voice with a newer GPU or the facebook research pytorch cuda based voice technologies.
Same goes for tacotron2 + waveglow its actually awesome but on the Pi just forget about it.

Yes that is true for keyword models but it would be a massive effort for some kaldi asr models as you need a lot more of speech plus transcribed text corpora to train those. And the available open source data sets are just not recorded that way mostly.

Yeah why I see distributed models on low end hardware as probably pointless. But you could.

Yes but most people will not have that. They will have one of the many available single board conputers like the pi or a rockpro or an odroid.

Actually no in comparison to the availability of x86 and gpu’s from the GTX780 up more people have those.
Its you who has a Pi and actually in market share its much less.

I think you have to look at the people who are actually interested in diy assistants and home automation and so on. Nodered for example just had a big community survey and a big majority is running all their projects like this on single board computers.
The crowd who run x86 systems with beefy graphic cards as a 24/7 server is very small.
And I would bet that if you did a survey here it would be the same result.
I know the same is true for the user base of things like openhab where i used to be active and home assistant where alot of users for Rhasspy come from.
There might be a few people who also have a beefy machine at home but even those often run their server things on raspberry pis.
I really do talk from experience on this. It is the most popular hardware choice, just look at the questions in the forum.

My point is when it comes to voice AI the best you can produce is for those who want to say “look mum what I have built”.

Otherwise its extremely poor in comparison to $30 big data silicon and the only option is horsepower if you wish to be private and have something that rivals maybe even beats the big guys.

So enjoy but for me your talking toys.

Node red is IoT and that is far more wide ranging than poor voice AI.

That is your opinion but doesn’t a project like Rhasspy also have to look at what the userbase is actually using and listen/ aim its development at that.
Because otherwise you will loose a huge chunk of people.
And no there is a few people including me who actually use tools like Rhasspy or Voice2json in production/ day to day life on that toy hardware you call it. It just turned of my tv and my girlfriend set a timer for the tea we made.
Than I asked what the weather tomorrow will be like and that is all working rather nicely.
My mom wasn’t involved to look at it and say what a good boy i am unfortunately…

There is no user base apart from you, there is a lack of skills and people like Kibo who think compared to Snips this sucks and is currently useless.
I haven’t used a Mycroft or Rhasppy for a long time because they are so poor.
I keep looking for alternatives and hardware that might make a difference but I am not going to convey anything other than the truth.

The hobbyist programmers should enjoy what they are doing but need to reign in claims of effectiveness.

Wow you just disenfranchised a lot of people here. FYI for me it performs on par to snips which i also used for a year before it was shut down.
Don’t you think the project would be dead and the forum full of shouting people if I was the only one successfully using it?
You should really leave the sinking ship and jump to the next project.

Look at post count your about 80% of community and if truth disenfranchises I have no care.
I came to create working opensource voice ai not make friends.

Opensource is about user driven software and currently this needs to be driven hard and you seem at times disingenuous as after never using snips the only vibe I get is Rhasspy is no Snips.

It could be worse as my opinion of Mycroft is utter snakeoil :slight_smile:

No opensource is about contribution. Its about all the people who quietly contribute, be it you that helps people with audio hardware problems. Somebody designing a nice printable case. Small pull requests and so on. There is no them to drive its an us.
But i guess thats just my point of view.