Rhasspy 2.5.10 Released

synesthesiam · April 10, 2021, 11:58pm

The official release of 2.5.10 is here! Thanks to every one for the testing and feedback

I’ve added a few bug fixes since the preliminary release, but the main new features since 2.5.9 are still:

ASR support for Swedish (sv)
New Larynx that’s faster and has a ton of new voices
- Hint: switch to “Low Quality” on a Pi 4 or below for a big speed-up
- This version should also work with older x86_64 CPUs (no AVX)
Kaldi ASR now has confidence value for words and sentences
Dialogue manager now has a minimum ASR confidence threshold (speech_to_text.<system>.min_confidence where <system> is kaldi, deepspeech, etc.)

Since the preliminary release, I’ve also added:

ASR confidences show up now in asr/textCaptured and nlu/intent messages (see asrConfidence and asrTokens).
Custom entities from /api/listen-for-command now show up in the recognized nlu/intent (see customEntities)
TTS timeouts are based on the text length and dialogue.say_chars_per_second
Sound paths in the dialogue manager may be directories - a random WAV file is chosen each time then
Only one satellite within a group should start recording if more than one detect the wake word at the same time

Added

New version of Larynx with improved performance and 35 voices (20 English, 1 German, 3 French, 2 Spanish, 3 Dutch, 2 Italian, 1 Swedish, 3 Russian)
Kaldi ASR model for Swedish (sv)
Confidence and word timings for Kaldi ASR
Minimum ASR confidence threshold for dialogue manager
Detect AVX support and warn for Larynx, DeepSpeech, and Precise in Web UI
Handle spaces in converter arguments with word!(converter, …)
rhasspy-tts-cli-hermes TTS commands may be Jinja2 templates (–use-jinja2)
Support for MaryTTS effects (jasonhildebrand)
customData added to hermes/nlu/query message
customData is copied by NLU services from query to intent/intentNotRecognized
lang property added for wake, speech_to_text, and intent profile sections
Wake, ASR, NLU services all set lang properties if null
Profile now has “parent” setting, allowing one profile to load settings from another
Dialogue manager sound paths may be directories, from which a random WAV will be chosen each time (thanks plafue)

Fixed

Remote HTTP service sets site_id of satellite for ASR/NLU endpoints
DeepSpeech token output (was letters, now words)
Multiple values in custom converters are sent as a list on stdin
Don’t show restart/shutdown button if “sudo” isn’t available (Docker, Hass.io)
Added missing espeak phonemes for some profiles
MaryTTS voice test in Web UI
Remove dialogue session from site cache on end
Don’t throw error about system not configured if message is intent for satellite (schnopsi)
Custom entities from /api/listen-for-command are passed through to NLU intent
Slots inside sub-directories will properly show up in the web interface
Use locks in dialogue manager to prevent multiple group satellite sessions during audio playback

Changed

/api/listen-for-command uses a proper wake workflow now (requires dialogue manager)
Show absolute paths for custom models (precise, snowboy, porcupine) in Web UI
TTS timeouts are computing using text length (dialogue.say_chars_per_second)

romkabouter · April 11, 2021, 6:50am

Great work,

Question, there are two addon repo’s:

and

I us the first, but it might be a good idea to depricate that and start using the latter.
This first is up-to-date with 2.5.10, the latter is a couple versions behind.

What do you think @synesthesiam

DaveKearley · April 12, 2021, 6:27am

Sounds great, a couple of questions…

As a noob how do we get the update??

Is it a ‘breaking’ update? i.e. if applied to a working setup, will things stop working and require days of fiddling about

synesthesiam · April 12, 2021, 5:15pm

I’ve updated both to the most recent version. I’ll eventually deprecate the one under my Github user name. Maybe a few more versions

synesthesiam · April 12, 2021, 5:16pm

I do my best not to break things, but it’s really hard to test all of the possible configurations

All past versions are kept at least, so worse case you can revert back to your working version.

DaveKearley · April 12, 2021, 6:20pm

Thanks, how are the updates applied?

synesthesiam · April 13, 2021, 12:54am

It depends on how you installed Rhasspy. If you’re using Docker, just follow: https://rhasspy.readthedocs.io/en/latest/installation/#updating

DaveKearley · April 13, 2021, 10:32am

Thanks for the link, updating now…

DaveKearley · April 13, 2021, 10:42am

I just followed the link, thanks

lilbuh · April 14, 2021, 4:51pm

just updated my rhasspy to 2.5.10 and tryed to switch to larynx ( french )
i m having trouble downloading additional files:

DownloadFailedException: (‘https://github.com/rhasspy/gruut/releases/download/v0.9.0/fr-fr.tar.gz’, ‘File size mismatch (got 9991077 byte(s), expected 9983999)’)

synesthesiam · April 14, 2021, 6:12pm

Thanks! I’ll upload a fix today

B0ndo2 · April 16, 2021, 7:49pm

Is there any documentation on how to use Larynx. I selected it from the gui but testing fails

TtsException: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /home/pi/.config/rhasspy/profiles/en/tts/larynx/en-us/kathleen-glow_tts/generator.onnx failed:Load model /home/pi/.config/rhasspy/profiles/en/tts/larynx/en-us/kathleen-glow_tts/generator.onnx failed. File doesn't exist``

synesthesiam · April 16, 2021, 8:31pm

I need to put in the documentation that setting up Larynx involves:

Select Larynx for TTS
Save settings and restart Rhasspy
Go to the top of the page and Download the required voice files
Enjoy!

B0ndo2 · April 16, 2021, 10:53pm

I followed these steps before you sent them from my Firefox/Linux PC and step 3 never showed up (nothing at the top of the page).
I redid them from my Android phone and I had a prompt at the top asking me to download files, which I did and then a “training …” message appeared and stayed there.
I looked at the log and here is what I see (am on RPI 3b+)

[ERROR:2021-04-16 18:49:29,508] rhasspyserver_hermes:
Traceback (most recent call last):
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 840, in api_wake_words
    hotwords = await core.get_hotwords()
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 910, in get_hotwords
    handle_finished(), messages, message_types
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 994, in publish_wait
    result_awaitable, timeout=timeout_seconds
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[ERROR:2021-04-16 18:49:29,552] rhasspyserver_hermes:
Traceback (most recent call last):
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 824, in api_speakers
    speakers = await core.get_speakers()
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 881, in get_speakers
    handle_finished(), messages, message_types
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 994, in publish_wait
    result_awaitable, timeout=timeout_seconds
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[ERROR:2021-04-16 18:49:29,589] rhasspyserver_hermes:
Traceback (most recent call last):
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 789, in api_microphones
    microphones = await core.get_microphones()
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 848, in get_microphones
    handle_finished(), messages, message_types
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 994, in publish_wait
    result_awaitable, timeout=timeout_seconds
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

Odie · April 21, 2021, 12:56pm

Hi,
So I’m working with a matrix creator. Since the update. Rhasspy is having a hard time keeping track of the sound card. Every time I reboot or power cycle. Upon start up, I have to go in and manually point rhasspy at the microphone. It also keeps creating new hardware. It starts with device 3, then adds 4 and 5. After start-up. Ideas?

manju-rn · April 22, 2021, 3:37pm

I am able to run Larynx properly in RPI 4B properly. The first time a new speech runs, it takes some time before it is played (the logs show some synthesis is going on). However, if the same speech is run next time, it is immediate (assuming it plays from cache). Is there a way to load all the speeches initially? Also, any dynamic sentence will then have a delay. Anyways to overcome this?

synesthesiam · April 22, 2021, 4:04pm

Setting to medium or low quality helps quite a bit on a Raspberry Pi. Try both and see which your prefer.

I do plan to add this. It’s tricky because the speech models may not have been downloaded yet when the TTS service starts. So what I’ll probably end up doing is attempting to load them, but falling back to waiting for the first TTS request if that fails.

manju-rn · April 23, 2021, 2:19am

thanks. I checked with voice profiles -
High Quality - 6-7 seconds
Medium - 2 -3 seconds (most of the time it is 2 seconds)
Low - 2 seconds
Hardware - RPI4B 4GB - JBL speaker connected via jack

Also, let me know how I can contribute if you can point to specific developer docs on this.
I have been using Rhasspy for quite some time now - love it and appreciate all the efforts. This Larynx by far to me seemed to be best bet, will be an exciting journey towards its refinement!

VoxAbsurdis · May 4, 2021, 3:17am

This cake [v2.5.10] is great. So delicious and moist!

The new Larynx voices are a whole different level of quality compared to the old ones.

synesthesiam · May 5, 2021, 12:32am

Thanks @VoxAbsurdis

Love the username, btw.