Faulty pronounciation of the combination sp when using Kaldi/nl

I noticed the letter combination: “sp”
is pronounced wrongly as “st” when using Kaldi/nl
i.e. for the Dutch language. In my opinion this is a bug.

kind regards,
hugo

Can you give an example sentence?

I’m not sure I understand this correctly. Kaldi does speech recognition, so it shouldn’t pronounce words, right?

Examples of mispronounced words:
spieken -->stieken
spelen–>stelen
rhasspy–>rasti

This is how I proceeded:
I went to the page http://rhasspy.local:12101/words
I entered the word “spieken” in the Guess Word box
This is the guess resulting from guessing the word:
spik@n (which is correct)
However, when I ask to pronounce this using the Pronounce-box I clearly hear: stieken

But as you mentioned “Kaldi does speech recognition”, I deduce from that it’s actually not Kaldi which
makes the mistake but the part of Rhasspy which translates the phonemes to sound, I guess that’s “espeak”?

To investigate if it is buggy behavior of Espeak which causes the mispronunciation of “sp”, I tried to change Text to Speech in the Rasspy-settings from Espeak to PicoTTS, NanoTTS, Flite, MaryTTS, OpenTTS,… but each time after I have done that and I did a restart of Rhasspy, the system returned
to the original choice of Espeak. Do I have to change my installation so I can effectively try out these other TTS-engines? If so, how? FIY: I’m running Rhasspy in a docker installation.

kind regards,
Hugo
p.s. The faulty pronunciation was confirmed by user No_one
in this thread:
https://community.rhasspy.org/u/No_one

I noticed I first had to save the new settings and then restart. Maybe it would be better to warn the user when he has changed something but did not save it, the restart will be without the changes made.

kind regards,
Hugo

Not always espeak’s fault, but maybe mine. There’s a mapping file in each profile that maps the speech system’s phonemes (Kaldi in this case) to espeak’s phonemes.

I don’t speak any language but English, so I wrote a program to guess. It runs all words in the dictionary through espeak, tries to align phonemes, and then just goes off which alignments have the highest count.

Looking up “spieken” in the base dictionary for the Kaldi NL profile, I see the pronunciation “s p i k @ n”. Going down the phoneme map, this should become “spik@n” for espeak as you said.

Now, let’s compare this to what espeak believes:

$ espeak -v nl -q -x 'spieken'
 sp'ik@n

So Rhasspy’s guess seems to be missing the accent on the i. Does running this sound right to you?

$ espeak -v nl "[[sp'ik@n]]"

I hear the “p” sound in either case, but I don’t know how the word is really supposed to sound :confused:

I find the pronunciation not very clear. I hear “spieken” (but that’s probably my brain tricking me because I know what word this is), but my girlfriend (which didn’t know what word it was supposed to be) hears “stieken”.

With or without the accent doesn’t make a difference to me, I still hear stieken

kind regards,
hugo