The coverage of languages for TTS (larynx) is quite impressive. Has anyone tried other languages? Esp. the big ones (counting native speakers) like Mandarin.
1 Like
The two limiting factors for language support in Larynx are:
- Support for the language in gruut (text to phoneme conversion)
- Publicly available audio data
Not required, but the voice quality also improves dramatically if I first train a Kaldi ASR model, and then use it to do forced alignment of the audio data with its transcription phonemes.
For Mandarin, there’s plenty of audio data available. I “just” need to add it to gruut
Anyone understand Mandarin enough to help?
1 Like
For a Kaldi ASR model, I can recommend the multi_cn recipe, but training takes over 20 days with an older GPU like GTX 1660.
I will try to encourage some native speakers of Mandarin to help, but no promises …
1 Like