Which TTS is better for the Dutch language?

synesthesiam · October 9, 2020, 5:26pm

Updated the samples again with the vocoder about 50% done: https://drive.google.com/drive/folders/1TXDr_ZeRks3fXDQmlIChlwFcDNK5ki4X

Sounds much better!

fastjack · October 9, 2020, 5:46pm

Awesome! Even at 50% the voice sounds great! Can’t wait for 100%

Would it be possible to do this for French also? Can we do this ourselves (recordings+training)?

synesthesiam · October 9, 2020, 5:55pm

Definitely The first step will be adding support for French to gruut. I expect this to be done by next week or so.

The recordings can be done with anything as long as you have WAV files and transcripts. With a little work, gruut will help select a small set of sentences from a large corpora (books, Wikipedia, etc.) that are maximally useful.

Training just needs a CUDA-enabled GPU that’s supported by PyTorch >= 1.5. I’m using a GTX 1060 6GB.

hugocoolens · October 9, 2020, 7:05pm

Updated the samples again with the vocoder about 50% done: https://drive.google.com/drive/folders/1TXDr_ZeRks3fXDQmlIChlwFcDNK5ki4X

4 of them are OK, but in the sentence “moeder sneedt zeven scheve sneden brood”, brood is
mispronounced.

kind regards,
Hugo

synesthesiam · October 13, 2020, 5:36pm

Something cool I realized I could do this morning: https://drive.google.com/file/d/1PQneOLgfIyHqACsNj4x3FT1_Ypnuy6w0/view?usp=sharing

This is the rdh Dutch voice speaking an (accented) English sentence! Because I use IPA for both English and Dutch phonemes, I created a small mapping file that approximates the 14 or so “missing” phonemes from Dutch. I had to guess on some, but it works as a proof of concept

So this means we could re-use some of the voices for other languages, until we get native speakers in that language. To get it right, though, I will need help from people who speak both languages. For example, I don’t really know how Dutch folks pronounce the “th” sounds from “thing” and “the”.

hugocoolens · October 14, 2020, 2:18pm

I don’t really know how Dutch folks pronounce the “th” sounds from “thing” and “the”

These sounds don’t exist in Dutch

kind regards,
Hugo

synesthesiam · October 14, 2020, 2:24pm

I substituted them with “s” and “z” sounds as a guess

What’s neat is that you could approximate a very specific accent with this approach.

hugocoolens · October 14, 2020, 2:57pm

Dutch people starting to learn English often mispronounce the “th” of “thing” as a plain “t” and the “th” of “the” as a “d”.

kind regards,
Hugo
p.s. started trimming the phrases, still 1300 to go…

synesthesiam · October 14, 2020, 5:39pm

Much appreciated I’ll get your model in the queue as soon as I get the files!

synesthesiam · October 15, 2020, 8:36pm

The first Dutch voice for Larynx is ready!

Special thanks to @rdh for donating his voice

It’s easiest to check out with Docker:

$ docker run -it -p 5002:5002 \
     --device /dev/snd:/dev/snd \
     rhasspy/larynx:nl-rdh-1

Then visit http://localhost:5002 to test it out

You can use this from Rhasspy or Home Assistant by changing the run command slightly:

$ docker run -it -p 59125:5002 rhasspy/larynx:nl-rdh-1

Now you can tell Rhasspy/HomeAssistant that you’re using a MaryTTS server and point it to http://localhost:59125 (usually the default).

tipofthesowrd · October 16, 2020, 8:21am

Loving it so far! Enormous thanks to @synesthesiam & @rdh

Just a few remarks, I don’t know if these are typical remarks that need manual finetuning.

Written numbers don’t come out well, 6 uur 46 vs. zes uur zesenveertig.
Example: GetTime intent

Some small things like OK is not correctly pronounced. However if you input “okee” the pronunciation is just fine.

koan · October 16, 2020, 1:22pm

For things like numbers and dates you should probably preprocess your text with something like Lingua Franca to convert them to words that are pronounceable by the TTS.

synesthesiam · October 16, 2020, 3:19pm

There are some undocumented features I’m still experimenting with, but I agree that in general a separate library should be used. Some of the features that are in there but disabled for now:

Currency recognition
- “$100.12” (sort of works now)
Number types
- “1_ordinal” becomes “first” in English
- “1902_year” becomes “nineteen oh two” in English
Alternative pronunciations
- “read_1” and “read_2” are pronounced like “red” and “reed” respectively

I also have the ability to list abbreviations for a language that are automatically expanded. I’ve got a list for English, like mr -> mister, but I don’t know any for Dutch.

tipofthesowrd · October 17, 2020, 6:00am

Happy to help and expand on those lists.
If I remeber correctly Mycroft has something like a collaborative system on their website.
(translate.mycroft.ai)

I suppose we could do something similar with just a github directory per language and documentation on what is needed for completing a language.

hugocoolens · October 17, 2020, 10:08am

How should I cope with the following issue?
In the trimming phase of the program, it happens I hear a phrase being pronounced in a way that I don’t found 100% OK, but as I don’t know whether I recorded it twice, I accept it anyway. But then I notice there is indeed a better version. How can I find/delete the first version (without going through all phrases once more)?

synesthesiam · October 17, 2020, 3:44pm

The WAV files are all named <id>_<timestamp>.wav where id is from the prompts file. So just open that up and search for the text. Then, take the id and find all WAV files that start with it.

geoffrey · October 17, 2020, 8:13pm

@synesthesiam No words, thank you upfront for all the relentless effort, again

I tried to set it up on an Intel NUC that acts as a Rhasspy server running Ubuntu 18.10 and the webserver perfectly loads, but when I try any word, e.g. oog (=eye), I get the following message:

Error: Failed to fetch Different browsers give a similar error message.

After that, the Docker container just crashes and nothing specific is shown in the output. It seems to go wrong when it tries to invoke the API at http://<ip>:59125/api/tts?text=oog&phonemes=false

Trying to invoke it directly using Postman gives the same behavior.

Let me know if I can help you figure this out and I’m happy to assist.

synesthesiam · October 18, 2020, 9:26pm

You’re welcome

Hmmm…this seem to work for me. Maybe I’m doing something different? This is on an x86_64 laptop:

$ docker run -it -p 59125:5002 rhasspy/larynx:nl-rdh-1

and then in a separate tab:

$ curl -X GET --output /tmp/test.wav 'http://localhost:59125/api/tts?text=oog'

When I aplay /tmp/test.wav it plays just fine.

In case it makes a difference:

$ docker --version
Docker version 18.09.3, build 774a1f4

Thoughts?

synesthesiam · October 21, 2020, 8:43pm

For those who want to use this in Hass.io, there is an add-on available now! It even works on the Raspberry Pi (super slow, but has a cache).

geoffrey · October 24, 2020, 6:46pm

I wish

Doing the exact same thing as you gives the same result, the laryx container stops without a message. This is the output from cur:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (52) Empty reply from server

Docker for me is a version higher at Docker version 19.03.6, build 369ce74a3c. I also tested it on a brand new alpine VM and that has the same behavior.

Not sure what to test, but if somebody has a clue, I’m happy to test things along.