Which TTS is better for the Dutch language?

Thanks for doing this, @hugocoolens! Let me know if you hear anything back.

@koan, @hugocoolens: I’ve just cleaned up the voice recorder. Can you (youse, y’all) verify that it still runs on your Macs?

Dutch Sentences

I also finally have my first try at phonetically balanced Dutch sentences. These were taken from a subset of the Oscar corpus. Can you both take a look and see if they’re any good? Do they say anything bad or have weird words that nobody would know how to pronounce?

Some details on the sentences: there are 1186 sentences that (should) have about 94% phoneme coverage. I restricted sentences to be between 3 and 15 words, and threw out anything with a URL, e-mail address, or number.

What about English words in the sentences? Funky, seminar, …

I also saw some URLs in the list.

And there’s an encoding issue with some of the accents and special characters.

I’ll take a closer look tomorrow.

1 Like

Any suggestions on an automated way to distinguish English words here? I could restrict everything to the N most frequent Dutch words (according to the Dutch language model I have). Or exclude words that show up as frequent English words…

Grrrrr…I filtered out “http” but there are some “www.” and ".com"s in there too!

This seems to be an issue with the Oscar corpus itself. I may need to pipe everything through iconv or something to fix UTF-8 issues. All sentences pass through Python, so they’re at least technically UTF-8. Filtering out non-word characters entirely may also be an option.

I think this is better reviewed manually.

Can you put this file in a repository? I think we need some iterations to get this into a usable state. I can go through the list then, delete all sentences that are less appropriate, you can add new sentences to keep the phoneme coverage high enough, and I can go through the diff again.

1 Like

OK, got it here: https://github.com/rhasspy/tts-prompts

Ok, I commented out 257 sentences for various reasons. I don’t know if commenting out or deleting is easiest for you to process it further, but I noticed you commented out some sentences with URLs, so I did the same.

The encoding was OK this time.

By the way, does this “94% phoneme coverage” mean that 6% of the Dutch phonemes will not be spoken by using this set of sentences?

1 Like

There is a some misunderstanding here. I said I had someone in my household who has a Mac with recording software and a professional mike. But I only tested the old version under linux (mint19.2). I tried out the new version also under mint 19.2 ans it works fine. I used a Jabra headset for the tests.

kind regards,
Hugo

1 Like

Ok, I commented out 257 sentences for various reasons. I don’t know if commenting out or deleting is easiest for you to process it further, but I noticed you commented out some sentences with URLs, so I did the same.

I read Koan’s corrected version. I think he did a great job. I noticed some typos, would like to add a few sentences and reject some others to be consistent with the criteria he used too.

I noticed a small typo in sentence 118:
( nl_rhasspy_118 “ik was altijd maar moe, maar dat kwam door mijn ogen. „“” )
should be:
( nl_rhasspy_118 “ik was altijd maar moe, maar dat kwam door mijn ogen.” )
Sentence 170 is OK if the speaker does not confuse hè with hé which are both pronounced differently in Dutch.
( nl_rhasspy_170 “Hè, hè, eindelijk goed geslapen.” )
I would leave out sentence 272 because of the French word “chapeau”
( nl_rhasspy_272 “Chapeau dus voor moed en volharding.” )
I would leave out sentence 337 because of the French word “enfin”
( nl_rhasspy_337 “Enfin, waar was ik gebleven.” )
I would leave out sentence 407 because of the French word “enfin”
( nl_rhasspy_407 “Enfin, praten heet dat dan gewoon.” )
Sentence 455 is acceptable for me
; ( nl_rhasspy_455 “Tijdens onze lessen staat de hele dag fruit, water, thee en koffie op tafel.” )
There is a typo in sentence 506:
( nl_rhasspy_506 “De aanhouder wint!)” ) should be: ( nl_rhasspy_506 “De aanhouder wint!” )
I would leave out 537 because of the word “team”
( nl_rhasspy_537 “Hoe hoog komt het team?” )
I would leave out 601 because of the word “enfin”
( nl_rhasspy_601 “Enfin, even zeven, en er kan gedronken worden.” )
I would leave out 606 because of the French word “enfin”
( nl_rhasspy_606 “Enfin, we zien wel.” )
I would leave out 673 because of the word “compromis”
( nl_rhasspy_673 “Een compromis dan maar op zolder.” )
I would leave out 713 because of the word “branche”
( nl_rhasspy_713 “Wat doet deze branche eigenlijk?” )
I would leave out 725 because of the word “chauffeur”
( nl_rhasspy_725 “Aangekomen bij het hotel geef ik de chauffeur aan om even te wachten.” )
I would leave out 780 because of the word “weekend”
( nl_rhasspy_780 “Dit weekend is geen gewoon weekend.” )
Sentence 843 is acceptable for me
; ( nl_rhasspy_843 “Die mensen is het moeilijk uit te leggen.” )
I would leave out 855 because of the word “interview”
( nl_rhasspy_855 “Ook niet als dat interview over Europa zou gaan.” )
There is a typo in 856
( nl_rhasspy_856 “Ik weet niet of ik dat wel wou weten :wink: Te laat!” ) should be:
( nl_rhasspy_856 “Ik weet niet of ik dat wel wou weten ; Te laat!” )
There is a typo in 870
( nl_rhasspy_870 “Wij van wc-eend adviseren wc-eend.’” ) should be:
( nl_rhasspy_870 “Wij van wc-eend adviseren wc-eend.” )
I would leave out 875 because of the word “populair”
( nl_rhasspy_875 “Die man is schijnbaar reuze populair, maar niemand van ons kent hem.” )
I would leave out 879 and 880 because of the word “cadeau”
( nl_rhasspy_879 “Dit is een puur cadeau aan jezelf!” )
( nl_rhasspy_880 “Een aanrader dus, een groot cadeau voor jezelf.” )
I would leave out 889 because of the word “meubilair”
( nl_rhasspy_889 “Al het bruin houten meubilair van die tijd.” )
I would leave out 893 because of the word “team”
( nl_rhasspy_893 “Niets kan op tegen een sterk team.” )
I would leave out 923 because of the word “ketchup”
( nl_rhasspy_923 “Tomatenketchup komt soms moeilijk uit de fles.” )
I would leave out 924 because of the word “gechargeerd”
( nl_rhasspy_924 “Wat gechargeerd, maar het scheelt niet veel.” )
I agree with Koan to leave out 934, but he entered accidently a colon in stead of a semicolon:
: ( nl_rhasspy_934 “Het is dé manier om voortdurend de big picture in het oog te houden.” )
I would leave out 924 because of the word “enthousiast”
( nl_rhasspy_990 “Moe maar heel enthousiast kwamen ze terug.” )
I would leave out 1013 because of the word “enfin”
( nl_rhasspy_1013 “Enfin, gaan we wel weer vroeg slapen.” )
I would leave out 1032 because of the word “shirt”
( nl_rhasspy_1032 “Jongen, donkerblauw shirt, rode tekst.” )
Sentence 1035 is acceptable for me:
; ( nl_rhasspy_1035 “Ik vond het echt een top tent.” )
I would leave out 1042 because of the word “enfin”
( nl_rhasspy_1042 “Enfin, dat maakte nu niet meer uit.” )
I would leave out 1050 because of the word “champagne”
( nl_rhasspy_1050 “Ik drink graag champagne.” )
I would leave out 1112 because of the word “enfin”
( nl_rhasspy_1112 “Enfin, waag uw kans!” )
I would leave out 1115 because of the words “college” en “enthousiast”
( nl_rhasspy_1115 “Ook het college werd enthousiast.” )

p.s.1 To avoid any misconceptions: the reason I leave out sentences with words of foreign origin is not because those words are not acceptable or used in contemporary Dutch but because they could reduce the quality of the phoneme-text conversion for Dutch.

p.s.2 As those words of foreign origin don’t follow the normal rules in Dutch, they could maybe be used
for refinements later?

Thank you! Commenting out is preferred since I will use those lines to create a “block list” on the next run.

Based on @hugocoolens’s feedback, I’m thinking I’ll need a separate block list for “foreign” words. I may have to get more sophisticated, using word likelihoods or excluding “optional” phonemes that are usually only used for foreign words (for example, supposedly /g/ as in “goal” is not native to Dutch).

No, this refers to phoneme pair coverage. This gets a little tricky, and I hope I’m doing it right. It’s common to do phone pairs (diphones) in a TTS dataset. Most phonemes are just one phone, but I also capture dipthongs, which are pairs of vowels that behave as one unit/phoneme (Dutch as at least three).

So I’m computing coverage based on phoneme pairs, but not all possible ones. Humans can’t produce all possible pairs, and I don’t know how to tell which ones belong in a language, so I’m just counting all pairs in the lexicon as “possible”. Coverage is then relative to all lexicon pronunciations. With hundreds of thousands of words, I’m hoping this will be sufficient.

Thank you! I’ll incorporate these later tonight or tomorrow, and then re-run the optimization.

Some may be necessary to get coverage of “optional” phonemes (like /g/ I mentioned above). I’m trying to stay somewhat close to the phonemes from our existing Kaldi Dutch model because it’s been shown to work well.

1 Like

I’m not sure this is grammatically valid: I think this should be “staan”. Because of this doubt, I commented out the sentence. People reading these sentences shouldn’t have to think about it or be annoyed by it, because this influences their performance. That’s why I removed so many sentences: the slightest doubt would detract from the quality of their work.

Grammatically correct, but I would say “Aan die mensen”, so this sentence would annoy me when I would have to read it :slight_smile:

I don’t think this is a typo: this is an emoticon, ;). But maybe people will hesitate when they have to “speak” an emoticon, so I’d probably replace this by a full stop or an ellipsis.

I found this too “Northern Dutch”. And it should be “toptent” anyway.

I don’t agree with all the sentences @hugocoolens leaves out, but I agree with his reasoning that the quality of the phoneme-text conversion should be the important factor here. I let myself guide by the same principle for my deletions. So I think we should just drop any sentence that at least one of us doesn’t find acceptable. I suppose the corpus is big enough to find replacement sentences.

1 Like

OK Koan, that makes sense

kind regards,
Hugo

1 Like

Almost ready with an updated prompt list. I took @hugocoolens’s input into consideration and created a list of blocked words. We can add to this list and re-run the optimization.

I’ve added more sentences to the process (now about 13 million), and I’m finding that forcing the system to keep all of the approved sentences is less efficient. I can reduce the needed set from ~1200 to ~900 sentences by allowing it to pick anything, which seems like a big win for our voice talents (~300 fewer sentences to record). It does make more work up front for you guys, though. Thoughts?

OK, new prompts are here (930 instead of 1143).

I’ve included some coverage details too, which has a printout of all the missing phoneme pairs (like x c with example words that contain that pair (like aagtje). If we could meaningfully slip a few of these example words into the existing prompts, we could boost the coverage (currently about 93%).

The config.ini describes exactly how sentences were filtered – i.e…, which regular expressions/words excluded sentences from consideration. Anything commented out in prompt_filter.txt was also excluded.

Do you want me to go through them and give feedback like I did before?

kind regards,
Hugo

@hugocoolens I’ve started from the beginning. Maybe you can start from the end until 450?

@hugocoolens I’ve started from the beginning. Maybe you can start from the end until 450?
OK, should I also give the reasons why as I did last time?

kind regards,
Hugo

Yes, I’m also adding the reasons now as a comment in the list, that will be helpful to tune the filters probably.

Yes, I’m also adding the reasons now as a comment in the list, that will be helpful to tune the filters probably.

OK, I’ll do my part today

kind regards,
Hugo

I have gone through the first 450 sentences: https://github.com/rhasspy/tts-prompts/commit/eae987b64172ac0934b8cc60f2103b3cb305fa9f

The quality has improved somewhat, I “only” commented out 19% of the sentences. I explained my reasons in the comments.

I’l take a look at Hugo’s comments too when he’s done with the second half.

However, if the next iteration will again be a completely new set, I’m not sure we’re going anywhere with this. It seems that each time still a considerable part of the prompts has to be removed because of the sentences are inappropriate, incomplete, weird or with difficult to pronounce words.

Is there a way to adapt the generation of the prompts so it takes into account some margin of 20-25% prompts that can be deleted? And then after weeding out the roughly 20% less usable sentences, check which phoneme pairs are still missing and adding these words manually in these sentences?

Thank you! I see you offered sentence modifications. I’ll put the modified sentences back into the next iteration.

Going forward, I can lock the approved prompts in and only vary the commented out ones. This should let us converge on something soon.

Thank you, I appreciate you two taking the time to do this.