The coverage of languages for TTS (larynx) is quite impressive. Has anyone tried other languages? Esp. the big ones (counting native speakers) like Mandarin.
The two limiting factors for language support in Larynx are:
- Support for the language in gruut (text to phoneme conversion)
- Publicly available audio data
Not required, but the voice quality also improves dramatically if I first train a Kaldi ASR model, and then use it to do forced alignment of the audio data with its transcription phonemes.
For Mandarin, there’s plenty of audio data available. I “just” need to add it to gruut
Anyone understand Mandarin enough to help?
For a Kaldi ASR model, I can recommend the multi_cn recipe, but training takes over 20 days with an older GPU like GTX 1660.
I will try to encourage some native speakers of Mandarin to help, but no promises …