Rhasspy 2.5.8 Released

Hi everyone :wave:

With the holidays coming up, it seems like a good time to push out a new release. Unlike 2.5.7, there are quite a few new things in 2.5.8 to go over.

Thanks to everyone who contributed, and to the many community members who are helping us build a great voice assistant for everyone :rainbow: . As always, please open GitHub issues so we can squash those bugs :bug:

Larynx TTS

This release finally incorporates the Larynx text to speech system, which is a fork of MozillaTTS. The goal of this TTS system is to provide high quality voices for as many languages as possible, replacing the need for Google Wavenet.

Once it gets warmed up, Larynx runs well on x86_64 systems (NUC, etc.), and OK on a Pi 4. I wouldn’t recommend trying to use it on a Pi 3 or 2. It uses PyTorch on the CPU, so there may be room for improvement with a GPU someday in the future.

Out of the box, I have voices for Dutch, German, French, Spanish, and Russian. Many more are currently in progress, including English, Swedish, Portuguese, and Vietnamese :slight_smile:

New Kaldi STT Models

In line with the Master Plan, I’ve trained up Kaldi speech to text models for Italian, Spanish, French, and Russian. You can use these now in Rhasspy by selecting Kaldi in the appropriate profile.

More languages are coming as I locate public speech data. There are also several efforts underway to crowd-source this data from the Rhasspy community and other places. If you know of a good dataset or would like to volunteer, please let me know!

Volume Everywhere

Many users have asked for the ability to adjust Rhasspy’s output volume, so I’ve made an effort to add this in a way that (I think) makes the most sense.

In the Settings page, you can now independently set the volumes of:

  • The audio output service (aplay)
  • The text to speech service
  • The dialogue feedback sounds (beeps)

On the main web UI page, there is also a handy “Set Volume” button. If you leave the site ID text box next to it blank, it will change the volume on whatever system you’re using. But you can also put specific site IDs in the box and change the volumes of multiple satellites at once (this uses a new MQTT message).

Lastly, there’s a new /api/set-volume HTTP endpoint where you can programmatically set the volume. It takes a ?siteId=site1,site2,.. parameter too if you want to set multiple site ids. Oh, and /api/text-to-speech now has a ?volume=0.5 parameter if you want just one utterance to be quiet.

Complete Changelog

Added

  • Russian Kaldi profile and Larynx TTS voice
  • Spanish Kaldi profile and Larynx TTS voice
  • French Kaldi profile and Larynx TTS voice
  • Italian Kaldi profile
  • German Larynx TTS voice
  • Volume scale (0-1) for feedback sounds and TTS
  • rhasspy/asr/setVolume MQTT message and /api/setVolume HTTP endpoint
  • rhasspy/asr/recordingFinished MQTT message sent immediately after silence detection
  • Satellite site ids to intent handling settings in web UI
  • Group separator for co-located satellites (dialogue.group_separator)
  • num2words support for Swedish (thanks Bostrom!)

Fixed

  • Argument list for sound output command system (jrouly)
  • Expand environment variables in TLS ca_certs
  • spn silence phone in Swedish profile
  • Use callback API in PyAudio to avoid buffer overrun
  • HTTP API JSON should not be forced to ASCII

Changed

  • Default Kaldi language model type is now text FST instead of arpa
11 Likes

Great work! Hope to try it soon :slight_smile:

1 Like

Loaded it up on my server and satellites and no issues so far. Awesome work! Thank you.

I did however notice one of the new features to set the volume of the “beeps” doesn’t seem to be working.
Setting the aplay volume on the Satellite seems to affect the beeps and tts, but changing the volume of the “Sounds” on the Satellite (even down to .1) doesn’t seem to make an audible difference.

Speaking of the beeps, is there a way to simply disable some or all? And if the Wake WAV is disabled, will the delay be shorter before it begins listening for the command?

1 Like

Hmmmm, I’ll take a look. Thanks for the feedback.

If you delete the file name in the web UI, it should stop playing that WAV file. There should be shorter delay too, since there’s no worry of the mic picking up the beeps as speech.

Thank you very much for the new version!

I tried Larynx TTS (de-thorsten) on my Server (Synology Intel NAS) with a satellite setup, but I always get an TimeOut Error:

[ERROR:2020-11-20 20:49:28,093] rhasspyserver_hermes: 
Traceback (most recent call last):
  File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1282, in api_train
    result = await core.train()
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 461, in train
    timeout_seconds=self.training_timeout_seconds,
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 971, in publish_wait
    result_awaitable, timeout=timeout_seconds
  File "/usr/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError

How can I get more information what is not working? Is there somewhere more debug info?

Thank you!

1 Like

You’re welcome! Do you need any messages from rhasspytts_larynx_hermes in the log? It can take some time for MozillaTTS to load the model; you should see a message that it successfully created a synthesizer.

Unfortunately I don’t see such a log.
Only:

[DEBUG:2020-11-20 23:00:00,771] rhasspyprofile.download: Skipping tts/larynx/de/thorsten/vocoder/config.json (/profiles/de/tts/larynx/de/thorsten/vocoder/config.json)
[DEBUG:2020-11-20 23:00:00,770] rhasspyprofile.download: Skipping tts/larynx/de/thorsten/vocoder/checkpoint_500000.pth.tar (/profiles/de/tts/larynx/de/thorsten/vocoder/checkpoint_500000.pth.tar)
[DEBUG:2020-11-20 23:00:00,768] rhasspyprofile.download: Skipping tts/larynx/de/thorsten/scale_stats.npy (/profiles/de/tts/larynx/de/thorsten/scale_stats.npy)
[DEBUG:2020-11-20 23:00:00,767] rhasspyprofile.download: Skipping tts/larynx/de/thorsten/config.json (/profiles/de/tts/larynx/de/thorsten/config.json)
[DEBUG:2020-11-20 23:00:00,766] rhasspyprofile.download: Skipping tts/larynx/de/thorsten/checkpoint_380000.pth.tar (/profiles/de/tts/larynx/de/thorsten/checkpoint_380000.pth.tar)
[DEBUG:2020-11-20 23:00:00,764] rhasspyprofile.download: text_to_speech.system larynx larynx = True

OK, do you see files in your profile under the tts/larynx directory?

Yes:

/de/tts/larynx$ ls -Ra
.:
. … cache de

./cache:
. …

./de:
. … thorsten

./de/thorsten:
. … checkpoint_380000.pth.tar config.json scale_stats.npy vocoder

./de/thorsten/vocoder:
. … checkpoint_500000.pth.tar config.json