Which TTS is better for the Dutch language?

Anything good at either of these sites?

I have been looking for some books which are in contemporary Dutch, here is what I found:

https://www.uitgeverijdefontein.nl/wp-content/uploads/2020/06/Lou-in-lockdown-Moyes-met-omslag.pdf
https://www.smashwords.com/books/download/617082/1/latest/0/0/spel-der-grenzen.pdf
https://www.smashwords.com/books/download/773824/1/latest/0/0/olifantentranen.pdf
https://www.smashwords.com/books/download/998326/1/latest/0/0/zusje.pdf
https://www.smashwords.com/books/download/691744/1/latest/0/0/ik-blijf-bij-je.pdf
https://www.chicklit.nl/static/files/chicklit/Jonkman_Pizzageur_Maneschijn.pdf
https://www.smashwords.com/books/download/505505/1/latest/0/0/passie-uit-het-verleden.pdf
https://www.smashwords.com/books/download/748347/1/latest/0/0/veroordeeld.pdf
https://www.dbnl.org/tekst/vos_049hima01_01/vos_049hima01_01.pdf

1 Like

What’s the license of these books? Are we allowed to use them?

1 Like

What’s the license of these books? Are we allowed to use them?

I think we should ask for trustworthy legal advice concerning this matter
kind regards,
hugo

1 Like

What’s the license of these books? Are we allowed to use them?

I think we should ask for trustworthy legal advice concerning this matter

Or maybe look for documents written by people who are willing to let them use for this purpose?

I’ve wondered: do you think any audiobook company would be willing to donate a contemporary book into the public domain for the Rhasspy project and others? Seems like it would look good for public relations.

New interested user here!!!

I’m looking for a dutch TTS myself. As mentioned espeak is really awful. I found these two references. It seems somebody is already working on a Dutch / Flemish TTS via Mozilla.


Maybe we should try to get in contact with the rdh user?

edit: I just posted in the Mozilla forum and posted the links to this topic. Let’s hope this results in a useful collaboration.

2 Likes

Great, thank you! I downloaded the German corpus linked from that discussion already. I know we already have plenty of German voices, but it’ll be good to have on hand.

If anything, maybe that user will have an idea of where to get good phrases.

I read the information about the layoff about Mozilla and the possible impact on the TTS / STT projects. That would be a major roadblock. I can understand the reluctance you showed to use Mozilla going forward since the future of the project is unknown at this time.

Maybe the following could a be a good set of dutch phrases?
https://nats.gitlab.io/swc/

Spoken Wikipedia in Dutch. An 8 GB archive is available for download.
Could this be used to train a TTS?

I’m willing to give this a go if I can get some guidance.
I’m no programming expert but I know my way around Python / C# / C++ and docker / Linux.

1 Like

I sent a request to https://www.uitgeverijdefontein.nl
I’ll keep you informed

kind regards,
hugo

1 Like

Let’s hope we get some support from the publisher. It would be nice.
I did find the following project which has some public domain works in dutch.

http://librivox.nl/

The dutch text is available as well (via Gutenberg or via the librivox site)

Thank you, @hugocoolens this is a big help. I’ll be digging through the Dutch Oscar corpus a bit today and seeing if there are good sentences there too.

Librivox is awesome, but looking at the publication dates of the Dutch books, I was worried about the use of old words and spellings (as @koan mentioned earlier).

Linguistically, this is an interesting problem. Copyright lasts so long that there is language drift by the time it’s in the public domain.

What about the Mozilla common voice project in dutch? It has about 42 hours in dutch.

Or does it need to be a limited number of voices?

I looked at TensorflowTTS today, I got it it to install just fine, but I’m having a little trouble at the lack of documentation. I’m currently looking into setting up a basic English model to build a better understanding.

For text to speech, it’s best to have a single voice in a quiet, professional environment. And it’s also important that the spoken phrases be phonetically balanced, so there are enough examples to cover corner cases.

I plan to use the Common Voice Dutch corpus to train my own Kaldi model. I’m seeing if I can get ahold of the CGN (Corpus Gesproken Nederlands) as well. Seems like there was some work on a DeepSpeech model using CGN.

I did find the following project which has some public domain works in dutch.

http://librivox.nl/

These are old texts in a kind of old fashioned Dutch. I wouldn’t use them.

kind regards,
Hugo

I sent a request to https://www.uitgeverijdefontein.nl

Unfortunately I didn’t get any reaction yet. I have however written another request to
the publisher chicklit. This time also mentioned it would look good too for their public relations.

You can find the e-mail I wrote here below (if people have suggestions for improvement or just want to use it for other publishing companies, feel free to do so)

Geachte Mevrouw/Heer,

Op het forum van het open source project Rhasspy (an open source, fully offline set of voice assistant services for many human languages) is momenteel een initiatief gestart om de omzetting van fonemen naar het Nederlands te verbeteren. Er wordt daarom overwogen om gebruik te maken van een selectie van zinnen die uit gratis downloadbare e-books afkomstig zijn. Die selectie van zinnen zou vervolgens door een native speaker voorgelezen worden en het resultaat zou dan gebruikt worden voor verbetering van de omzetting van tekst naar spraak (TTS=text to speech).

We hadden graag geweten of jullie het gebruik van gratis downloadbare e-books op jullie website voor deze toepassing zouden toestaan omdat we in geen geval later met juridische copyright issues willen geconfronteerd worden. We denken overigens dat samenwerking ook voor jullie voordelig zou kunnen zijn voor wat het public relations aspect betreft.

mvg.

Hugo Coolens

Een link naar het Rhasspy project vinden jullie hier:

https://rhasspy.readthedocs.io/en/latest/

en de discussie waarover ik het hierboven had vinden jullie hier:

Which TTS is better for the Dutch language?

3 Likes

I also sent a similar request to “de Digitale bibliotheek voor de Nederlandse letteren”

kind regards,
hugo

1 Like

I noticed I originally wrote phonemen in stead of fonemen…

Thanks for doing this, @hugocoolens! Let me know if you hear anything back.

@koan, @hugocoolens: I’ve just cleaned up the voice recorder. Can you (youse, y’all) verify that it still runs on your Macs?

Dutch Sentences

I also finally have my first try at phonetically balanced Dutch sentences. These were taken from a subset of the Oscar corpus. Can you both take a look and see if they’re any good? Do they say anything bad or have weird words that nobody would know how to pronounce?

Some details on the sentences: there are 1186 sentences that (should) have about 94% phoneme coverage. I restricted sentences to be between 3 and 15 words, and threw out anything with a URL, e-mail address, or number.

What about English words in the sentences? Funky, seminar, …

I also saw some URLs in the list.

And there’s an encoding issue with some of the accents and special characters.

I’ll take a closer look tomorrow.

1 Like