Rhasspy 2.5.9 Released

I was looking into RNNoise yesterday, and it might fit the bill here. Not only does it remove noise, it also does VAD. The only downside appears to be that the model was trained on 48Khz audio, so we’d need to convert.

Yeah RNNoise is prob good there are some alsa wrappers that for me seemed like voodoo in terms of implemented VAD threshold.
Again though it might be better to train a 16k model?
Also RNNoise was one of the first but a github trawl will uncover a few.

The FFT processing to create a spectrogram or MFCC does create load and like Precise a more modern streaming Neural KWS is fed MFCC in strides so that a simple low latency CNN can work on each stride feature and have a VAD response time of 20ms or whatever the streaming stride length is.
My point was to use the audio analysis data we already have that we have already worked. RNNoise would be doing the same parallel tasks of creating a spectrogram/mfcc stream for RNNoise and just adding different or more modules is an increase in load.

The VAD we have isn’t that great and because its a separate module even if much of its input processing could be shared its not because its installed as a prebuilt separate lib.

The best way would be to modify RNNoise so that it passes the MFCC/Spectrogram as an output but would be far faster to keep in memory and go straight to KWS, meaning really be part of the KWS and not bolt on load.
At the stage of VAD/RNNoise its possible to do much such as Diarisation and model switching because the load is much less as the process from wav to audio image is done once.
This is not true of adding each function as completely separate libs and processes that just branch off the audio chain.

Also if you use RNNoise then its likely you would have to retrain your models to gain the advantages of noise reduction and garner accuracy with the slightly different spectrograph it will give.

I used https://github.com/werman/noise-suppression-for-voice and wasn’t at all sure if the VAD was working as it should from the Alsa settings I had provided.
Simple cheap USB Microphone / Soundcard

Took me ages to find how to get it to work and was never sure if vad was ok but rnoise itself if implemented should

  type ladspa
  slave.pcm {
    # Convert from float to int
    type lfloat
1 Like

Hi @synesthesiam

Do we actually need the llvm runtime? The reason I ask is that if I force install the deb package to ignore the dependency then everything still seems to work. Admittedly I haven’t tried every configuration supported by Rhasspy, but voice capture & sentence matching works but I’m not using a wakeword yet.

I’m running Rhasspy on an x86 laptop

Kind Regards
Dom

Its needed by Larynx (text to speech), but it may be a good idea to just move it to a suggested dependency and make a note in the documentation.

2 Likes

RNNoise looks really promising!

1 Like

To be honest Rnnoise will force all KWS & ASR models to be retrained with the datasets of being passed through Rnnoise as it passes but a audio stream.
From having a reread it has to be 48Khz or the filters provide do not work well if at all at lower SR, so as well as resample to 16k the MFCC process needs to begin again but maybe that can be passed through (I would have to look at code again for a while).

Thing is from last time I tested it isn’t that great as RTXVoice it is not as it really falls down on relatively modest levels of noise and creates a shed led of audio artefacts.
Its like RTXVoice as run on a 20 or 30 series card its supposedly pretty mindblowing even on lesser cards with fast system with AVX-512 it still does a good job but supposedly not as good as the cards with tensor cores running as AI accelerators that they use in their Deep Learning Super Sampling (DLSS) technology.

Its very much about horsepower and if you try RNNoise and you have tried RTXvoice it also does position results correctly as there is no comparison really with that amount of horsepower and a raspberry pi.

There is no way to extract noise without highspeed silicon DSP or some pretty heavyweight hardware without artefacts that create just as much noise as your extracting and that is just plain fact.
Google & Amazon just have better and more complex ASR models that cope better with noise.

Just run the link I sent as you can set up RNNoise via alsa pretty quick.
The sample of no RNNoise
https://drive.google.com/file/d/1pIH5O_TP6YoNrp9ql2rr_t5LmnpMB9QE/view
Vs RNNoise was on a Pi4 @ 2.0Ghz
https://drive.google.com/file/d/1_Qr-XaaaxEy-nQeiS8GaTVCWCd1egde_/view

I didn’t really test much more than the above but maybe you can get it to work better with higher levels of noise.
That was just a fan heater with my usual Harry Potter recording which was fairly minimal noise and its already starting to artefact the audio.

I think you might find the SNR that RNNoise can cope with is actually lower than the SNR your models can cope with anyway.
The problem might be because the models being used are too ‘clean’ and can be increased by noise sample addition anyway.
But you will have to test.

If you have an input signal of what is noise that AEC uses to subtract from mic in then you can run on much lower hardware but how or what is noise is the problem and its needs to be strictly synced without clock drift or you need horsepower to do the adaptive syncing.
But capturing known noise and running filtering such as AEC could be possible on low end hardware.

The easiest and cheapest way to deal with noise is by positioning a mic so voice=near & noise=far so distributed wide array microphones even with networked arrays of 2x, 4x, 8x or above are a fraction of the cost of where cutting edge noise reduction is.

@synesthesiam give it a go but you can prob tell I think its likely a waste of your time.

You could employ a much better VAD though.


https://github.com/hcmlab/vadnet (blingdows but how is still worth a look)
https://github.com/nicklashansen/voice-activity-detection
https://github.com/Cocoxili/VAD
https://github.com/filippogiruzzi/voice_activity_detection
https://github.com/MarcoZaror/VAD_system

Tensorflow VAD is like tensorflow KWS its not all that hard the cutting edge is tensorflow itself and its just a matter of implementing the framework.

1 Like

Hi,

Big support for that request ! Especially for embedded systems like raspberry, it is very important to minimize the dependencies.

As a workaround, install with dkpg --ignore-dependencies and remove it in /var/lib/dpkg/status.

1 Like

Hi,

thank you very much for this update!
All works fine in a docker installation for my base (i386) and one sattelite (pi3):

  • external mqtt
  • arecord
  • raven
  • pocketsphinx
  • fuzzywuzzy
  • picotTTS
  • aplay
  • rhasspy
1 Like

I am trying to install rhasspy in WSL2 following the directions at your web page.

As suggested I picked Ubuntu 20.04 as linux choise and intalled it following the directions.

However, I keep getting the ERROR messege as the one below:

ERROR: torch-1.6.0-cp37-cp37m-linux_x86_64.whl is not a supported wheel on this platform.
make: *** [Makefile:173: install-rhasspy] Error 1

When thrying under Debian I get these Errors:
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pvporcupine 1.9.0 requires enum34, which is not installed.
rhasspy 2.5.8 requires pylint==2.4.4, but you have pylint 2.5.3 which is incompatible.

Downloading Flask_Cors-3.0.9-py2.py3-none-any.whl (14 kB)
ERROR: Could not find a version that satisfies the requirement gruut~=0.5.0
ERROR: No matching distribution found for gruut~=0.5.0
make: *** [Makefile:177: install-rhasspy] Error 1

What do I do wrong? Please Help

Somehow it does not download this directory. But I did downloaded it using

git clone --recursive https://github.com/rhasspy/gruut

and I also did

sudo apt-get install python-enum34

To eliminate the scource of the first error. Than
./configure --enable-in-place
make
make install

And same result again - all same errors. Why?

Hi @synesthesiam

I was in 2.4.19 and I decided yesterday evening to install the 2.5.9 version through docker.
I’m on PI3+Jabra 410 and I’am surprised by the very good response time and I would like thanks to the adding of a change request I proposed a lot months ago (reboot, stop button in IHM).

Integration with the excellent plugin jeeRhasspy from @KiboOst also well done

Thank’s again for the great works

3 Likes

You may need to download the gruut release tarball: https://github.com/rhasspy/gruut/releases or disable Larynx with ./configure ... --disable-larynx

What happened was I exceeded the size limit for upload to PyPi with the most recent version of gruut, so you can’t currently pip install it (unless you do pip install https://github.com/rhasspy/gruut/releases/download/v0.5.0/gruut-0.5.0.tar.gz)

I need to refactor gruut so the language-specific pieces can be downloaded separately.

You’re welcome :slight_smile:

Always glad to hear success stories :+1: