Hi everyone
With the holidays coming up, it seems like a good time to push out a new release. Unlike 2.5.7, there are quite a few new things in 2.5.8 to go over.
Thanks to everyone who contributed, and to the many community members who are helping us build a great voice assistant for everyone . As always, please open GitHub issues so we can squash those bugs
Larynx TTS
This release finally incorporates the Larynx text to speech system, which is a fork of MozillaTTS. The goal of this TTS system is to provide high quality voices for as many languages as possible, replacing the need for Google Wavenet.
Once it gets warmed up, Larynx runs well on x86_64
systems (NUC, etc.), and OK on a Pi 4. I wouldnāt recommend trying to use it on a Pi 3 or 2. It uses PyTorch on the CPU, so there may be room for improvement with a GPU someday in the future.
Out of the box, I have voices for Dutch, German, French, Spanish, and Russian. Many more are currently in progress, including English, Swedish, Portuguese, and Vietnamese
New Kaldi STT Models
In line with the Master Plan, Iāve trained up Kaldi speech to text models for Italian, Spanish, French, and Russian. You can use these now in Rhasspy by selecting Kaldi in the appropriate profile.
More languages are coming as I locate public speech data. There are also several efforts underway to crowd-source this data from the Rhasspy community and other places. If you know of a good dataset or would like to volunteer, please let me know!
Volume Everywhere
Many users have asked for the ability to adjust Rhasspyās output volume, so Iāve made an effort to add this in a way that (I think) makes the most sense.
In the Settings page, you can now independently set the volumes of:
- The audio output service (
aplay
) - The text to speech service
- The dialogue feedback sounds (beeps)
On the main web UI page, there is also a handy āSet Volumeā button. If you leave the site ID text box next to it blank, it will change the volume on whatever system youāre using. But you can also put specific site IDs in the box and change the volumes of multiple satellites at once (this uses a new MQTT message).
Lastly, thereās a new /api/set-volume
HTTP endpoint where you can programmatically set the volume. It takes a ?siteId=site1,site2,..
parameter too if you want to set multiple site ids. Oh, and /api/text-to-speech
now has a ?volume=0.5
parameter if you want just one utterance to be quiet.
Complete Changelog
Added
- Russian Kaldi profile and Larynx TTS voice
- Spanish Kaldi profile and Larynx TTS voice
- French Kaldi profile and Larynx TTS voice
- Italian Kaldi profile
- German Larynx TTS voice
- Volume scale (0-1) for feedback sounds and TTS
- rhasspy/asr/setVolume MQTT message and /api/setVolume HTTP endpoint
- rhasspy/asr/recordingFinished MQTT message sent immediately after silence detection
- Satellite site ids to intent handling settings in web UI
- Group separator for co-located satellites (dialogue.group_separator)
- num2words support for Swedish (thanks Bostrom!)
Fixed
- Argument list for sound output command system (jrouly)
- Expand environment variables in TLS ca_certs
- spn silence phone in Swedish profile
- Use callback API in PyAudio to avoid buffer overrun
- HTTP API JSON should not be forced to ASCII
Changed
- Default Kaldi language model type is now text FST instead of arpa