Which TTS is better for the Dutch language?

synesthesiam · September 9, 2020, 7:11pm

OK, I’ve combined all the approved prompts between @koan and @hugocoolens (total: 1587). I then ran the optimization to minimize the set without dropping overall coverage, which produced 1169 prompts.

I’ve done my best to clean up the prompts. If you have time, please give them one more quick look over.

The final coverage is about 90%, which I think will be sufficient. For reference, I recently came across a set of Dutch prompts on VoxForge and their coverage was only 70%. VoxForge is all GPL, so I’d be hesitant in re-using any of their prompts.

koan · September 9, 2020, 7:59pm

Ok this was much better: I only removed/commented 30 sentences.

But I feel that there is still some repetition in sentence fragments. This one stood out because “wie o wie” is not something you read every day:

koan@x1:~/Rhasspy/tts-prompts$ grep -i "wie o wie" nl-nl/prompts.txt 
( nl_rhasspy_999 "Wie o wie kan ons helpen?" )
( nl_rhasspy_1010 "Wie o wie gaat er winnen, en welke weg gaan ze nemen?" )
( nl_rhasspy_1072 "Wie o wie?" )
( nl_rhasspy_1087 "Wie o wie kan mij helpen?" )

999 and 1087 are almost the same, and 1072 is probably redundant.

I’d still love to improve coverage by trying to add as much words as possible for the missing phoneme pairs.

By the way, can you publish the script to compute the coverage? This would be helpful for fine-tuning the list of prompts: I would then immediately see the effect of removing or adding words or sentences.

synesthesiam · September 9, 2020, 8:29pm

Great, thank you! I’m thinking we should keep some of the sentences with English works to get a bit of the foreign phoneme coverage. Do you think those English words are common enough?

This must have been a way of getting the phoneme pair /iː o/ (wie is /w iː/). The /iː/ is from something like “anal[y]se” too. I see maybe 10 examples in the lexicon where this sound comes at the end of a word (like zei), and no cases where the pair occurs in a single word.

Sure, the coverage file contains all of the missing pairs and example words. If the example words all look unusual or foreign, it probably means the pair can be safely ignored.

koan · September 10, 2020, 4:24pm

Yes, maybe I was a bit too strict in weeding out all English words. Words like “Halloween”, “liken”, “showen”, “shop” and “bye” are probably common enough to include.

rdh · October 5, 2020, 10:10am

Hi guys, I uploaded my Dutch dataset here.

Enjoy!

synesthesiam · October 6, 2020, 2:03pm

Awesome, thank you very much! I am training a new model now with your data

synesthesiam · October 7, 2020, 8:45pm

Quick update, here are some samples from the latest model: https://drive.google.com/drive/folders/1TXDr_ZeRks3fXDQmlIChlwFcDNK5ki4X?usp=sharing

I’m using GlowTTS and the Multiband MelGAN vocoder. I had to re-train from scratch this morning because of a mistake, so the vocoder isn’t sounding so great yet.

I’ll update the examples after it’s had a change to train overnight

tipofthesowrd · October 9, 2020, 3:06pm

I’m looking forward to this!
I’m already ordering some Pi Zero for wakewords in the house

synesthesiam · October 9, 2020, 5:26pm

Updated the samples again with the vocoder about 50% done: https://drive.google.com/drive/folders/1TXDr_ZeRks3fXDQmlIChlwFcDNK5ki4X

Sounds much better!

fastjack · October 9, 2020, 5:46pm

Awesome! Even at 50% the voice sounds great! Can’t wait for 100%

Would it be possible to do this for French also? Can we do this ourselves (recordings+training)?

synesthesiam · October 9, 2020, 5:55pm

Definitely The first step will be adding support for French to gruut. I expect this to be done by next week or so.

The recordings can be done with anything as long as you have WAV files and transcripts. With a little work, gruut will help select a small set of sentences from a large corpora (books, Wikipedia, etc.) that are maximally useful.

Training just needs a CUDA-enabled GPU that’s supported by PyTorch >= 1.5. I’m using a GTX 1060 6GB.

hugocoolens · October 9, 2020, 7:05pm

Updated the samples again with the vocoder about 50% done: https://drive.google.com/drive/folders/1TXDr_ZeRks3fXDQmlIChlwFcDNK5ki4X

4 of them are OK, but in the sentence “moeder sneedt zeven scheve sneden brood”, brood is
mispronounced.

kind regards,
Hugo

synesthesiam · October 13, 2020, 5:36pm

Something cool I realized I could do this morning: https://drive.google.com/file/d/1PQneOLgfIyHqACsNj4x3FT1_Ypnuy6w0/view?usp=sharing

This is the rdh Dutch voice speaking an (accented) English sentence! Because I use IPA for both English and Dutch phonemes, I created a small mapping file that approximates the 14 or so “missing” phonemes from Dutch. I had to guess on some, but it works as a proof of concept

So this means we could re-use some of the voices for other languages, until we get native speakers in that language. To get it right, though, I will need help from people who speak both languages. For example, I don’t really know how Dutch folks pronounce the “th” sounds from “thing” and “the”.

hugocoolens · October 14, 2020, 2:18pm

I don’t really know how Dutch folks pronounce the “th” sounds from “thing” and “the”

These sounds don’t exist in Dutch

kind regards,
Hugo

synesthesiam · October 14, 2020, 2:24pm

I substituted them with “s” and “z” sounds as a guess

What’s neat is that you could approximate a very specific accent with this approach.

hugocoolens · October 14, 2020, 2:57pm

Dutch people starting to learn English often mispronounce the “th” of “thing” as a plain “t” and the “th” of “the” as a “d”.

kind regards,
Hugo
p.s. started trimming the phrases, still 1300 to go…

synesthesiam · October 14, 2020, 5:39pm

Much appreciated I’ll get your model in the queue as soon as I get the files!

synesthesiam · October 15, 2020, 8:36pm

The first Dutch voice for Larynx is ready!

Special thanks to @rdh for donating his voice

It’s easiest to check out with Docker:

$ docker run -it -p 5002:5002 \
     --device /dev/snd:/dev/snd \
     rhasspy/larynx:nl-rdh-1

Then visit http://localhost:5002 to test it out

You can use this from Rhasspy or Home Assistant by changing the run command slightly:

$ docker run -it -p 59125:5002 rhasspy/larynx:nl-rdh-1

Now you can tell Rhasspy/HomeAssistant that you’re using a MaryTTS server and point it to http://localhost:59125 (usually the default).

tipofthesowrd · October 16, 2020, 8:21am

Loving it so far! Enormous thanks to @synesthesiam & @rdh

Just a few remarks, I don’t know if these are typical remarks that need manual finetuning.

Written numbers don’t come out well, 6 uur 46 vs. zes uur zesenveertig.
Example: GetTime intent

Some small things like OK is not correctly pronounced. However if you input “okee” the pronunciation is just fine.

koan · October 16, 2020, 1:22pm

For things like numbers and dates you should probably preprocess your text with something like Lingua Franca to convert them to words that are pronounceable by the TTS.