Simple cheap USB Microphone / Soundcard

Rhasspy will have better hearing as the low signals will get auto gain from the ALC.

When close the auto gain will be low when far auto gain will be high processed in the incoming signal.

When it comes to noise it depends on the noise vs voice ratio and no it will do nothing for that also as gain increases its also likely so will noise.

But the comparisons here would make some sense if approx levels and distances where given.
Any sample can be used really but the volume at the mic is what we are testing and the distance from the mic.

I have a el cheapo db meter but you need to describe the source and hope you have some level of similarity.

I was just interested in what you posted and some of your wav files as to be honest I don’t think there is any difference between any of the devices. Guess different mems might of been used but think they are are all just el cheapo’s on the 2 mic.


Might well help attenuate noise but I have never successfully set up a ladspa plugin in ALSA and have tried a few times :slight_smile:
Looks fairly easy for pulseaudio and guess you could mangle it back to alsa as you can make alsa use pulseaudio

I did get rnnoise going as a Alsa plugin and think its very much like webrtc as that a Pi3 may struggle.
It might be interesting how much difference a 2.0Ghz Pi makes as think clock speed here has much influence.

Summary

This text will be hidden

pcm.capture {
    # Add an ALSA plug for LADSPA
    type plug
    slave.pcm {
      # Add the LADSPA noise filter
      type ladspa
      slave.pcm {
        # Convert from float to int
        type lfloat
        slave {
          format "S16_LE"
          pcm "hw:CARD=Device"       # Use card 1 (e.g. USB webcam soundcard), device 0 (the default)
        }
      }
      # LADSPA configuration for the noise filter
      # See https://github.com/werman/noise-suppression-for-voice
      path "/usr/local/lib/ladspa"
      capture_plugins [
        {
          label noise_suppressor_mono
          input { controls [ 2 ] }      # VAD Threshold %
        }
      ]
    }
  }



git clone https://github.com/werman/noise-suppression-for-voice
cd noise-suppression-for-voice
cmake -Bbuild -H. -DCMAKE_BUILD_TYPE=Release
cd build
make
sudo make install

Here is a Pi4 going at 2.0Ghz and it doesn’t do that bad a job.
My office is cold and the fan heater is blowing on me and that is quite a severe test that actually it doesn’t make a bad attempt at.
The VAD sensitivity seems crazy high as the setting needs to be crazy low here its at 10% and not sure if its that, that is cutting slightly.

no-rnnoise
https://drive.google.com/open?id=1pIH5O_TP6YoNrp9ql2rr_t5LmnpMB9QE
rnnoise
https://drive.google.com/open?id=1_Qr-XaaaxEy-nQeiS8GaTVCWCd1egde_

Its actually a good attempt by the pi4 running pios 64bit lite @ 2.0Ghz

Rnnoise is pretty low-tech nowadays compared to the fresh rake of voice technologies that need really an X86+GPU with a min of approx GTX 1650.

Got a feeling its just clock speed as there is very little load.

You probably would still need to dial it back a bit. It does sound distorted and cut off a bit and im not sure that either the wakeword models or the asr models would love it as they tend to struggle a lot with heavily processed or to artificially quiet audio as that is just not what they were trained on.

You see that is something that has been said before and is slightly paradoxical.
As the argument of feeding denoised input into a model that has not been denoised is obviously going to be prone to artefact noise.

If you are creating models you must use the the tools you record with (aka run your dataset through denoise) or you force your tools to be as exact as the original.

rnnoise is old tech but if in spring the Pi4a does make an appearance its quite possible to make noise resilient denoised KWS models but really that infrastructure is a crazy model as all with AI quality and results = raw power and likely a single central processor with more horse power than a Pi due to the inherrent nature of sporadic voice commands could serve a much better multi user / multi room role.

I mean this is a Pi4 with rnnoise but you should hear the results of RTX voice with a newer GPU or the facebook research pytorch cuda based voice technologies.
Same goes for tacotron2 + waveglow its actually awesome but on the Pi just forget about it.

Yes that is true for keyword models but it would be a massive effort for some kaldi asr models as you need a lot more of speech plus transcribed text corpora to train those. And the available open source data sets are just not recorded that way mostly.

Yeah why I see distributed models on low end hardware as probably pointless. But you could.

Yes but most people will not have that. They will have one of the many available single board conputers like the pi or a rockpro or an odroid.

Actually no in comparison to the availability of x86 and gpu’s from the GTX780 up more people have those.
Its you who has a Pi and actually in market share its much less.

I think you have to look at the people who are actually interested in diy assistants and home automation and so on. Nodered for example just had a big community survey and a big majority is running all their projects like this on single board computers.
The crowd who run x86 systems with beefy graphic cards as a 24/7 server is very small.
And I would bet that if you did a survey here it would be the same result.
I know the same is true for the user base of things like openhab where i used to be active and home assistant where alot of users for Rhasspy come from.
There might be a few people who also have a beefy machine at home but even those often run their server things on raspberry pis.
I really do talk from experience on this. It is the most popular hardware choice, just look at the questions in the forum.

My point is when it comes to voice AI the best you can produce is for those who want to say “look mum what I have built”.

Otherwise its extremely poor in comparison to $30 big data silicon and the only option is horsepower if you wish to be private and have something that rivals maybe even beats the big guys.

So enjoy but for me your talking toys.

Node red is IoT and that is far more wide ranging than poor voice AI.

That is your opinion but doesn’t a project like Rhasspy also have to look at what the userbase is actually using and listen/ aim its development at that.
Because otherwise you will loose a huge chunk of people.
And no there is a few people including me who actually use tools like Rhasspy or Voice2json in production/ day to day life on that toy hardware you call it. It just turned of my tv and my girlfriend set a timer for the tea we made.
Than I asked what the weather tomorrow will be like and that is all working rather nicely.
My mom wasn’t involved to look at it and say what a good boy i am unfortunately…

There is no user base apart from you, there is a lack of skills and people like Kibo who think compared to Snips this sucks and is currently useless.
I haven’t used a Mycroft or Rhasppy for a long time because they are so poor.
I keep looking for alternatives and hardware that might make a difference but I am not going to convey anything other than the truth.

The hobbyist programmers should enjoy what they are doing but need to reign in claims of effectiveness.

Wow you just disenfranchised a lot of people here. FYI for me it performs on par to snips which i also used for a year before it was shut down.
Don’t you think the project would be dead and the forum full of shouting people if I was the only one successfully using it?
You should really leave the sinking ship and jump to the next project.

Look at post count your about 80% of community and if truth disenfranchises I have no care.
I came to create working opensource voice ai not make friends.

Opensource is about user driven software and currently this needs to be driven hard and you seem at times disingenuous as after never using snips the only vibe I get is Rhasspy is no Snips.

It could be worse as my opinion of Mycroft is utter snakeoil :slight_smile:

No opensource is about contribution. Its about all the people who quietly contribute, be it you that helps people with audio hardware problems. Somebody designing a nice printable case. Small pull requests and so on. There is no them to drive its an us.
But i guess thats just my point of view.

From Stallman to Eric Raymond, Apache, Libreoffice to the Linux Kernel its about how contribution can make effective user driven shared ownership software.
Contributing dross in masse has little use.

PS A armour case with a 12v 40mm fan on 5v makes a super easy and low cost Pi4 2.0Ghz machine.

We do need to sort out the start of the input chain with voiceai and audio processing.
Garbage in, garbage out and currently things are not good in common domestic environs.

I know ive been running mine overclocked for half a year now in a passive heatsink case.

I tried passive with the Pi4 but under stress it just throttles.
Even with the fan it may eventually throttle maybe not but with constant load it quickly gets up to 65c.

Just found the fan just stuck on 5v is silent practically and gives far more load headroom.
Double sided sticky tape rules :slight_smile:

I never said rhasspy is useless.
There have been a long road, with api integration, bug finding/fixing after 2.5 remake, wakeword etc.

But since the beginning the intent definition and asr is better than snips. Raven has close the gap to provide a workable open source wakeword.

I have snips working for two years in house by all family, plugged into Jeedom and everyone here use it everyday for lot of stuff.

Actually rhasspy is ready, plugged into my production Jeedom or test Jeedom with a switch. And it works better than snips. BUT only things preventing me to get ride of snips is this particular problem of infinite listening with background noise/music. And there is some nice ideas here to get this improved. A max duration settings in stt listening would help a lot also. And I have no doubt @synesthesiam will soon find solutions :grinning:

Like ever said snips was a team of lot of dev and rhasspy is driven by one man and a few helpers. And apart this listening problem I ever think it is better than snips.

The world is full of people saying it’s impossible while some are doing it …

One day we will all be able to ditch snips and I would never thanks enough @synesthesiam for that.

4 Likes