I’ve had my head down coding for the past few months on a new version of Larynx for text to speech. My main goal has been speed, since the last version (based on MozillaTTS) was painfully slow, even on a desktop machine.
I’m happy to say that the new version of Larynx is much faster. Even on a Pi 4, you can get faster than realtime speech using the lower quality settings!
I’m planning to release Rhasspy 2.5.10 with the new Larynx this upcoming week, but I wanted to give everyone a preview of the 35 voices (8 languages) that will be available:
- English (
en-us
, 20 voices)- blizzard_fls (F, accent)
- cmu_aew (M)
- cmu_ahw (M)
- cmu_aup (M, accent)
- cmu_bdl (M)
- cmu_clb (F)
- cmu_eey (F)
- cmu_fem (M)
- cmu_jmk (M)
- cmu_ksp (M, accent)
- cmu_ljm (F)
- cmu_lnh (F)
- cmu_rms (M)
- cmu_rxr (M)
- cmu_slp (F, accent)
- cmu_slt (F)
- ek (F, accent)
- harvard (F, accent)
- kathleen (F)
- ljspeech (F)
- German (
de-de
, 1 voice)- thorsten (M)
- French (
fr-fr
, 3 voices)- gilles_le_blanc (M)
- siwis (F)
- tom (M)
- Spanish (
es-es
, 2 voices)- carlfm (M)
- karen_savage (F)
- Dutch (
nl
, 3 voices)- bart_de_leeuw (M)
- flemishguy (M)
- rdh (M)
- Italian (
it-it
, 2 voices)- lisa (F)
- riccardo_fasol (M)
- Swedish (
sv-se
, 1 voice)- talesyntese (M)
- Russian (
ru-ru
, 3 voices)- hajdurova (F)
- nikolaev (M)
- minaev (M)
If you hear a problem with any voice, or would like to donate your own, please let me know! In most cases, I can train a new voice with about 1.5 hours of quality audio.
These voices were possible thanks to:
- Public audio datasets, some donated by Rhasspy users!
- Feedback from Rhasspy users on language-specific sentences and pronunciations
- My mini GPU “cluster”, with one GPU donated by a Rhasspy user