Preliminary Release of Rhasspy 2.5.10

synesthesiam · April 1, 2021, 9:41pm

Hi everyone! This is a “preliminary” release of 2.5.10, meaning I’ve created a rhasspy/rhasspy:2.5.10 Docker tag and uploaded new Debian packages, but I haven’t made these the “latest” release yet (the docs are not updated yet either). Due to time constraints, I haven’t been able to run all my usual tests; but I wanted to get something out to everyone

There are some big additions in 2.5.10, such as:

ASR support for Swedish (sv)
New Larynx that’s faster and has a ton of new voices
- Hint: switch to “Low Quality” on a Pi 4 or below for a big speed-up
- This version should also work with older x86_64 CPUs (no AVX)
Kaldi ASR now has confidence value for words and sentences
Dialogue manager now has a minimum ASR confidence threshold (speech_to_text.<system>.min_confidence where <system> is kaldi, deepspeech, etc.)

I’d appreciate any testing and feedback that the community can offer! Thanks

Added

New version of Larynx with improved performance and 35 voices (20 English, 1 German, 3 French, 2 Spanish, 3 Dutch, 2 Italian, 1 Swedish, 3 Russian)
Kaldi ASR model for Swedish (sv)
Confidence and word timings for Kaldi ASR
Minimum ASR confidence threshold for dialogue manager
Detect AVX support and warn for Larynx, DeepSpeech, and Precise in Web UI
Handle spaces in converter arguments with word!(converter, …)
rhasspy-tts-cli-hermes TTS commands may be Jinja2 templates (–use-jinja2)
Support for MaryTTS effects (jasonhildebrand)
customData added to hermes/nlu/query message
customData is copied by NLU services from query to intent/intentNotRecognized
lang property added for wake, speech_to_text, and intent profile sections
Wake, ASR, NLU services all set lang properties if null

Fixed

Remote HTTP service sets site_id of satellite for ASR/NLU endpoints
DeepSpeech token output (was letters, now words)
Multiple values in custom converters are sent as a list on stdin
Don’t show restart/shutdown button if “sudo” isn’t available (Docker, Hass.io)
Added missing espeak phonemes for some profiles
MaryTTS voice test in Web UI
Remove dialogue session from site cache on end
Don’t throw error about system not configured if message is intent for satellite (schnopsi)

Changed

/api/listen-for-command uses a proper wake workflow now (requires dialogue manager)
Show absolute paths for custom models (precise, snowboy, porcupine) in Web UI

romkabouter · April 2, 2021, 6:26am

good work, I wiil try it out.
My focus will be the Dutch languages

AlmostSerious · April 2, 2021, 6:43am

Thank you, just tried it with a manual update on my HomeAssistant Addon.

First thing I noticed is that I am getting an error regarding WaveNet:

 File "/usr/lib/rhasspy/rhasspy-tts-wavenet-hermes/rhasspytts_wavenet_hermes/__init__.py", line 15, in <module>
    from google.cloud import texttospeech
ModuleNotFoundError: No module named 'google.cloud'

Everything else seems to work. I am currently playing around with the Kaldi Confidence Scores in German. Very nice to have this. Although I realized that they vary widely even with the same sentence being spoken the same way. Is it possible to also see the confidence of the single words? Currently the only thing I found that changes is the likelihood of the full utterance in the ASR/textCaptured

tjiho · April 2, 2021, 10:01am

Nice ! As soon as I get home, I’ll will test it.

grizewald · April 2, 2021, 5:41pm

I have only just started experimenting with rhasspy to extend my home automation system and had just got the 2.5.9 release working well with a pair of Pi4Bs. I had noticed that Larynx gave the best sounding TTS output but that it was horribly slow (around 15 seconds to generate audio for a “The time is…” sentence, so I was really looking forward to giving this a try!

Both Pi machines are running rhasspy as docker images and I use the MQTT server which runs on my Home Assistant server for both HA (also on docker) and rhasspy.

Anyway, I updated my docker installations with 2.5.10 and am glad to report that the increase in speed for Larynx is considerable. The first sentence took 5 seconds to deliver, but subsequent delivery is almost immediate.

Nice one!

Next to test: ASR confidence values and handling.

romkabouter · April 2, 2021, 8:44pm

I get this as well, I switched to Larynx now because I want to test it

AlmostSerious · April 2, 2021, 9:02pm

On my NUC J5005 the Larynx component works well, altough even with low quality mode it takes several seconds to create the sound file. I would be interested to find out about what is required to create another german voice. Are there some predefinded sentences to record to help in creating a new voice option?

rolyan_trauts · April 3, 2021, 2:50am

Is that just the 1st sentence as above as the NuC should have far more Ooomf than a pi4?

AlmostSerious · April 3, 2021, 7:04am

Just tested it again. And yes the first sentence takes several seconds. The following sentences about 1 second. High Quality adds 0.5 to 1 second to that. But i dont really hear a difference in it anyway.

rolyan_trauts · April 3, 2021, 7:40am

It prob should say hi as it initialises on boot

synesthesiam · April 3, 2021, 1:24pm

This may need some tuning. I’m using the Minimum Bayes Risk from Kaldi, but I’m not entirely sure the best way to report it.

I must have gotten interrupted implementing this. The confidences are produced during transcription, they’re just not being passed up the layers into textCaptured or the NLU intent. I’ve created a bug report here to remember: https://github.com/rhasspy/rhasspy/issues/207

Ah, thank you. I was debugging a problem that ended up being with pip and forgot to turn this back on.

Yes! Thanks to volunteers like @RaspiManu, we have a set of German phrases to read Anyone who’s interested, please PM me and I’ll send you a link.

Larynx delays loading the TTS/vocoder models until it’s called the first time, so this is what you’re seeing. I might be able to add an option to preload the voice if this is an issue for people.

tjiho · April 3, 2021, 2:21pm

I tried to update Rhasspy on my raspberry pi 3 and it takes a very long time to install dependencies with pip.
When running /home/pi/rhasspy/.venv/bin/python -m pip install "/home/pi/rhasspy" I have this warning displayed:

INFO: pip is looking at multiple versions of importlib-metadata to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
        INFO: pip is looking at multiple versions of hyperframe to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of hpack to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of h2 to determine which version is compatible with other requirements. This could take a while.

One core of the raspberry is at 100% and it takes hours to find the right version.
It’s probably a pip problem more than a rhasspy one, but maybe it should be possible to fix the version in rhasspy requirement file.

On the venv, it’s pip 21.0.1 and python 3.7 .

tjiho · April 3, 2021, 2:25pm

Right now, pip is still running, that’s so slow I started the installation 3 hours ago

synesthesiam · April 3, 2021, 2:41pm

You need to force the pip version to <= 20.2.4 for it to work now. They completely ruined pip with the new dependency resolver. I can’t get anything to install now if it has more than one dependency.

If you’re installing from source, try exporting PIP_VERSION="pip<=20.2.4” before make install.

tjiho · April 3, 2021, 5:09pm

Thanks, with pip 20.2.4 it’s far faster.
However, I have an error:

ERROR: Could not find a version that satisfies the requirement onnxruntime~=1.6.0 (from larynx~=0.3.0->rhasspy==2.5.10) (from versions: none)

synesthesiam · April 3, 2021, 6:00pm

https://github.com/synesthesiam/prebuilt-apps/releases/download/v1.0/onnxruntime-1.6.0-cp37-cp37m-linux_armv7l.whl

Microsoft only has pre-compiled wheels for 64-bit ARM for some reason.

MikeLDPT · April 3, 2021, 6:47pm

Is it easy to do a manual update with HASSIO? I’ll certainly join in testing if I can figure out how to update my HASSIO 2.5.9 version. Thanks

AlmostSerious · April 3, 2021, 7:07pm

It is. I think romkabouter was giving me this tip in another thread.
Basically you just copy the hassio addon repository to your addons/local folder.
Then in the dockerfile you put

FROM rhasspy/rhasspy:2.5.10

And install

MikeLDPT · April 3, 2021, 7:54pm

Thanks. I’ll give it a go.

rolyan_trauts · April 4, 2021, 1:47am

Because like tensorflow and all tensor based math the 2-3x speed increase of 64bit means that 32bit is now aimed at only for microcontrollers as it really is 2-3x at least with tensorflow but presume Onnx is very similar.
The Neon SIMD is highly optimised with all NN engines and with Armv8 the 128 Neon register to float math means in real terms 2-3x perf increase which is absolutely huge so they don’t see armv7 as viable or at least worth much mention as why would you?

I did some benchmarks with the exact same just PiOS64v32 and depending on the model like vs like of 64v32 the perf increase is 2-3x with the wider databus.