Rhasspy 2.5.9 Released

synesthesiam · January 19, 2021, 3:03am

I’ll see if I can bump to LLVM 8 without breaking anything. Not sure why Ubuntu dropped 7.

Not officially yet, no. What hardware are you running Rhasspy on?

sfiel · January 19, 2021, 1:38pm

Hi.

I’m using home assistant and have rhasspy installed as add-on. I received the notification of rhasppy 2.9 and updated my addon. But still on the webpage (on :12101) the old version 2.5.7 (for some reason I have not made the update to 2.5.8) is showing up.
I checked the docker image version and it seems to be correct
3fbb1a43/amd64-addon-rhasspy 2.5.9 ed36febeb00e 29 hours ago 1.64GB

and also the container uses the correct image:
243d60d029ae 3fbb1a43/amd64-addon-rhasspy:2.5.9 “/run.sh” 29 hours ago Up 29 hours 0.0.0.0:12101->12101/tcp, 0.0.0.0:12333->12333/udp addon_3fbb1a43_rhasspy

I already tried uninstalling the addon and installing it again, but still 2.5.7 is in the header (of course I also made a refresh without cache)

am I missing something for the update? or does anybody has a hint want I can try in addition?

Regards,
Stefan

Thargor · January 20, 2021, 8:30am

I tried DeepSpeach (german) instead of Kaldi in my Server Satellite Environment (both running the official docker container), but can’t geht it to work. It doesn’t seem to react. If I switch back to Kaldi, everything work again. I will try it again …

In my installation there are two lines “Satellite siteIds:…” for DeepSpeach. Is that a bug?

AlmostSerious · January 20, 2021, 1:39pm

I am having the same experience. When using German Deepspeech my Server simply gives back a TimeOut. Same behaviour as I had on 2.5.8.

itsMattShull · January 20, 2021, 7:39pm

Anyone have info on how to use current_energy_threshold, max_current_energy_ratio_threshold, and max_energy? Are these helpful in silence detection?

synesthesiam · January 21, 2021, 1:13am

I’m hoping they will be, but I’m not really an audio guy so I’m just going off of what I’ve read. To make use of these for silence detection, you will need to get the audio statistics working in the Rhasspy web UI or get the rhasspy-silence command-line tool running.

Here’s a brief overview for everyone:

Audio “energy” is computed right here. This was borrowed from the speech_recognition library.
current_energy_threshold just means that the energy of an audio chunk is compared to some threshold and, if it’s lower, the audio chunk is considered silence.
max_current_energy_ratio_threshold means that a ratio is computed for every audio chunk (max / current), and the chunk is considered silence if the value is above the threshold.
- To make it intuitive, imagine the threshold ratio is 1. This means that if the current energy is ever less than the max energy, we have silence (max/current > 1). If we halve the threshold to 0.5, it means that anything quieter than half the max will be silence.
- You can set max_energy to a specific value or let Rhasspy dynamically set it over time.

Hope this helps some.

rolyan_trauts · January 21, 2021, 3:45am

I have been wondering for a while if a neural network VAD would be a better option if the KWS was provided inhouse.
A considerable amount of the load is the conversion of audio to a MFCC image and a streaming model KWS just splits the images into ‘strides’ of 20msec or more.
I have been thinking a neural VAD could work on the strides and pass to KWS without any additional MFCC processing and the model would be quite lite due to only working on a stride at a time.

I am thinking a more accurate VAD could be provided with little more overhead than current could even be less as those strides would already be used by a KWS.
Haven’t looked in ages but the VAD currently has very similar FFT routines to the MFCC of some KWS and at times it isn’t great at judging what silence is.
If a VAD was trained with users voice then its likely it could be extremely accurate at detecting when the user is speaking or not which is probably a better metric than what we consider silence.

synesthesiam · January 21, 2021, 2:34pm

I was looking into RNNoise yesterday, and it might fit the bill here. Not only does it remove noise, it also does VAD. The only downside appears to be that the model was trained on 48Khz audio, so we’d need to convert.

rolyan_trauts · January 21, 2021, 4:53pm

Yeah RNNoise is prob good there are some alsa wrappers that for me seemed like voodoo in terms of implemented VAD threshold.
Again though it might be better to train a 16k model?
Also RNNoise was one of the first but a github trawl will uncover a few.

The FFT processing to create a spectrogram or MFCC does create load and like Precise a more modern streaming Neural KWS is fed MFCC in strides so that a simple low latency CNN can work on each stride feature and have a VAD response time of 20ms or whatever the streaming stride length is.
My point was to use the audio analysis data we already have that we have already worked. RNNoise would be doing the same parallel tasks of creating a spectrogram/mfcc stream for RNNoise and just adding different or more modules is an increase in load.

The VAD we have isn’t that great and because its a separate module even if much of its input processing could be shared its not because its installed as a prebuilt separate lib.

The best way would be to modify RNNoise so that it passes the MFCC/Spectrogram as an output but would be far faster to keep in memory and go straight to KWS, meaning really be part of the KWS and not bolt on load.
At the stage of VAD/RNNoise its possible to do much such as Diarisation and model switching because the load is much less as the process from wav to audio image is done once.
This is not true of adding each function as completely separate libs and processes that just branch off the audio chain.

Also if you use RNNoise then its likely you would have to retrain your models to gain the advantages of noise reduction and garner accuracy with the slightly different spectrograph it will give.

I used https://github.com/werman/noise-suppression-for-voice and wasn’t at all sure if the VAD was working as it should from the Alsa settings I had provided.
Simple cheap USB Microphone / Soundcard

Took me ages to find how to get it to work and was never sure if vad was ok but rnoise itself if implemented should

  type ladspa
  slave.pcm {
    # Convert from float to int
    type lfloat

TotalSpaceshipguy · January 21, 2021, 6:21pm

Hi @synesthesiam

Do we actually need the llvm runtime? The reason I ask is that if I force install the deb package to ignore the dependency then everything still seems to work. Admittedly I haven’t tried every configuration supported by Rhasspy, but voice capture & sentence matching works but I’m not using a wakeword yet.

I’m running Rhasspy on an x86 laptop

Kind Regards
Dom

synesthesiam · January 21, 2021, 6:28pm

Its needed by Larynx (text to speech), but it may be a good idea to just move it to a suggested dependency and make a note in the documentation.

itsMattShull · January 21, 2021, 6:48pm

RNNoise looks really promising!

rolyan_trauts · January 21, 2021, 8:49pm

To be honest Rnnoise will force all KWS & ASR models to be retrained with the datasets of being passed through Rnnoise as it passes but a audio stream.
From having a reread it has to be 48Khz or the filters provide do not work well if at all at lower SR, so as well as resample to 16k the MFCC process needs to begin again but maybe that can be passed through (I would have to look at code again for a while).

Thing is from last time I tested it isn’t that great as RTXVoice it is not as it really falls down on relatively modest levels of noise and creates a shed led of audio artefacts.
Its like RTXVoice as run on a 20 or 30 series card its supposedly pretty mindblowing even on lesser cards with fast system with AVX-512 it still does a good job but supposedly not as good as the cards with tensor cores running as AI accelerators that they use in their Deep Learning Super Sampling (DLSS) technology.

Its very much about horsepower and if you try RNNoise and you have tried RTXvoice it also does position results correctly as there is no comparison really with that amount of horsepower and a raspberry pi.

There is no way to extract noise without highspeed silicon DSP or some pretty heavyweight hardware without artefacts that create just as much noise as your extracting and that is just plain fact.
Google & Amazon just have better and more complex ASR models that cope better with noise.

Just run the link I sent as you can set up RNNoise via alsa pretty quick.
The sample of no RNNoise
https://drive.google.com/file/d/1pIH5O_TP6YoNrp9ql2rr_t5LmnpMB9QE/view
Vs RNNoise was on a Pi4 @ 2.0Ghz
https://drive.google.com/file/d/1_Qr-XaaaxEy-nQeiS8GaTVCWCd1egde_/view

I didn’t really test much more than the above but maybe you can get it to work better with higher levels of noise.
That was just a fan heater with my usual Harry Potter recording which was fairly minimal noise and its already starting to artefact the audio.

I think you might find the SNR that RNNoise can cope with is actually lower than the SNR your models can cope with anyway.
The problem might be because the models being used are too ‘clean’ and can be increased by noise sample addition anyway.
But you will have to test.

If you have an input signal of what is noise that AEC uses to subtract from mic in then you can run on much lower hardware but how or what is noise is the problem and its needs to be strictly synced without clock drift or you need horsepower to do the adaptive syncing.
But capturing known noise and running filtering such as AEC could be possible on low end hardware.

The easiest and cheapest way to deal with noise is by positioning a mic so voice=near & noise=far so distributed wide array microphones even with networked arrays of 2x, 4x, 8x or above are a fraction of the cost of where cutting edge noise reduction is.

@synesthesiam give it a go but you can prob tell I think its likely a waste of your time.

You could employ a much better VAD though.

https://github.com/hcmlab/vadnet (blingdows but how is still worth a look)
https://github.com/nicklashansen/voice-activity-detection
https://github.com/Cocoxili/VAD
https://github.com/filippogiruzzi/voice_activity_detection
https://github.com/MarcoZaror/VAD_system

Tensorflow VAD is like tensorflow KWS its not all that hard the cutting edge is tensorflow itself and its just a matter of implementing the framework.

farfade · January 22, 2021, 3:52pm

Hi,

Big support for that request ! Especially for embedded systems like raspberry, it is very important to minimize the dependencies.

As a workaround, install with dkpg --ignore-dependencies and remove it in /var/lib/dpkg/status.

Tobias_Riesemann · January 30, 2021, 8:53am

Hi,

thank you very much for this update!
All works fine in a docker installation for my base (i386) and one sattelite (pi3):

external mqtt
arecord
raven
pocketsphinx
fuzzywuzzy
picotTTS
aplay
rhasspy

tigrat · January 30, 2021, 8:13pm

I am trying to install rhasspy in WSL2 following the directions at your web page.

As suggested I picked Ubuntu 20.04 as linux choise and intalled it following the directions.

However, I keep getting the ERROR messege as the one below:

ERROR: torch-1.6.0-cp37-cp37m-linux_x86_64.whl is not a supported wheel on this platform.
make: *** [Makefile:173: install-rhasspy] Error 1

When thrying under Debian I get these Errors:
ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pvporcupine 1.9.0 requires enum34, which is not installed.
rhasspy 2.5.8 requires pylint==2.4.4, but you have pylint 2.5.3 which is incompatible.

Downloading Flask_Cors-3.0.9-py2.py3-none-any.whl (14 kB)
ERROR: Could not find a version that satisfies the requirement gruut~=0.5.0
ERROR: No matching distribution found for gruut~=0.5.0
make: *** [Makefile:177: install-rhasspy] Error 1

What do I do wrong? Please Help

tigrat · January 30, 2021, 9:36pm

Somehow it does not download this directory. But I did downloaded it using

git clone --recursive https://github.com/rhasspy/gruut

and I also did

sudo apt-get install python-enum34

To eliminate the scource of the first error. Than
./configure --enable-in-place
make
make install

And same result again - all same errors. Why?

raqpub · February 20, 2021, 10:00am

Hi @synesthesiam

I was in 2.4.19 and I decided yesterday evening to install the 2.5.9 version through docker.
I’m on PI3+Jabra 410 and I’am surprised by the very good response time and I would like thanks to the adding of a change request I proposed a lot months ago (reboot, stop button in IHM).

Integration with the excellent plugin jeeRhasspy from @KiboOst also well done

Thank’s again for the great works

synesthesiam · February 22, 2021, 9:11pm

You may need to download the gruut release tarball: https://github.com/rhasspy/gruut/releases or disable Larynx with ./configure ... --disable-larynx

What happened was I exceeded the size limit for upload to PyPi with the most recent version of gruut, so you can’t currently pip install it (unless you do pip install https://github.com/rhasspy/gruut/releases/download/v0.5.0/gruut-0.5.0.tar.gz)

I need to refactor gruut so the language-specific pieces can be downloaded separately.

synesthesiam · February 22, 2021, 9:12pm

You’re welcome

Always glad to hear success stories