With the holidays coming up, it seems like a good time to push out a new release. Unlike 2.5.7, there are quite a few new things in 2.5.8 to go over.
Thanks to everyone who contributed, and to the many community members who are helping us build a great voice assistant for everyone . As always, please open GitHub issues so we can squash those bugs
This release finally incorporates the Larynx text to speech system, which is a fork of MozillaTTS. The goal of this TTS system is to provide high quality voices for as many languages as possible, replacing the need for Google Wavenet.
Once it gets warmed up, Larynx runs well on
x86_64 systems (NUC, etc.), and OK on a Pi 4. I wouldn’t recommend trying to use it on a Pi 3 or 2. It uses PyTorch on the CPU, so there may be room for improvement with a GPU someday in the future.
Out of the box, I have voices for Dutch, German, French, Spanish, and Russian. Many more are currently in progress, including English, Swedish, Portuguese, and Vietnamese
New Kaldi STT Models
In line with the Master Plan, I’ve trained up Kaldi speech to text models for Italian, Spanish, French, and Russian. You can use these now in Rhasspy by selecting Kaldi in the appropriate profile.
More languages are coming as I locate public speech data. There are also several efforts underway to crowd-source this data from the Rhasspy community and other places. If you know of a good dataset or would like to volunteer, please let me know!
Many users have asked for the ability to adjust Rhasspy’s output volume, so I’ve made an effort to add this in a way that (I think) makes the most sense.
In the Settings page, you can now independently set the volumes of:
- The audio output service (
- The text to speech service
- The dialogue feedback sounds (beeps)
On the main web UI page, there is also a handy “Set Volume” button. If you leave the site ID text box next to it blank, it will change the volume on whatever system you’re using. But you can also put specific site IDs in the box and change the volumes of multiple satellites at once (this uses a new MQTT message).
Lastly, there’s a new
/api/set-volume HTTP endpoint where you can programmatically set the volume. It takes a
?siteId=site1,site2,.. parameter too if you want to set multiple site ids. Oh, and
/api/text-to-speech now has a
?volume=0.5 parameter if you want just one utterance to be quiet.
- Russian Kaldi profile and Larynx TTS voice
- Spanish Kaldi profile and Larynx TTS voice
- French Kaldi profile and Larynx TTS voice
- Italian Kaldi profile
- German Larynx TTS voice
- Volume scale (0-1) for feedback sounds and TTS
- rhasspy/asr/setVolume MQTT message and /api/setVolume HTTP endpoint
- rhasspy/asr/recordingFinished MQTT message sent immediately after silence detection
- Satellite site ids to intent handling settings in web UI
- Group separator for co-located satellites (dialogue.group_separator)
- num2words support for Swedish (thanks Bostrom!)
- Argument list for sound output command system (jrouly)
- Expand environment variables in TLS ca_certs
- spn silence phone in Swedish profile
- Use callback API in PyAudio to avoid buffer overrun
- HTTP API JSON should not be forced to ASCII
- Default Kaldi language model type is now text FST instead of arpa