Multilanguage Pronounciation Guesser

Evening,

I was trying to get my rhasspy to switch input from various game consoles to either my tv or pc. While doing that I stumbled over something of a problem that ended with me trying to teach rhasspy the pronunciation of “playstation”. All guesses weren’t even close and I personally can’t even guess at how to fix it. The guide that lists phenomens with words is English, and I would need to guess at how those words are pronounced before even trying.

My problem in this case was the “station” part which is pronounced differently in German.

Click here for sample

Now there are quite a few words that are used regularly that keep the English pronounciation and I don’t think German is the only language that does this. And English is not the only language that a pronounciation is needed from so I though it could be possible to select the pronounciation language in a dropdown menu of the words page, have it download the pronounciation table for the selected language if needed and then look up words in different languages. Or maybe just host a web service that can do that, that would negate the need of having to download the pronounciation table for a whole language if only one word needs to be looked up.

Any input on if this idea is possible/needed would be nice

This has always done my head in (a little a bit like the word itself).
As far as I know
Phonemes are supposed to be the same sound no matter what language you use.
But in practice it is just a very rough guide.
You would probably find millions of variations of the word station across all native English speakers. So to distill it down to a phoneme list that actually works is pretty amazing in itself.
It has frustrated me for as long as I have been coding (since 1992) that the computer industry is dominated by American spelling which is only down to Webster deciding to simplifying the spelling.
So Analogue became analog and colour became color.
Many American pronounce epic and epoch the same.

What I am really getting at here -after having my little rant - is that as as much as it pains me to say.
I doubt this would even work if you implemented it.

I think this will be the case for quite a while yet. For now it is really just a situation of building specific workarounds as you find them.

here’s an example I had to use.

When I want rhasspyr to handle a number range. (working in french) my sentences.ini needs to have ( (une:1)|(2…23) ) (heure | heures) to work.

you could try (play stay shun):playstation :slight_smile:

1 Like

This is something I actually have a working solution for, but getting it into Rhasspy is going to take a lot of work!

My gruut project transforms text into phonemes coded in the International Phonetic Alphabet (IPA). Each language has its own phoneme “inventory”, though there is obviously more overlap between languages with shared histories.

I recently added inline pronunciations to gruut, allowing word pronunciations to be specified in terms of other words and even parts of other words. This isn’t a naive implementation either; in English, gruut will produce different vowels for the “o” in the words “to” and “so”:

gruut en-us text2phonemes --inline '{{ t{o} }}' | jq -r .pronunciation_text
ˈu

gruut en-us text2phonemes --inline '{{ s{o} }}' | jq -r .pronunciation_text
ˈoʊ

For Rhasspy, there are two complications: user interface and mapping between different speech to text models. I personally like the IPA Chart interface, and can imagine having something similar with only the phonemes from the current language’s inventory visible.

Mapping between speech to text models is tedious, but not impossible. I “just” have to go through each model’s phonemes (usually some form of SAMPA) and create a mapping over to IPA. Then, Rhasspy users can just use IPA everywhere :slight_smile:

3 Likes

or maybe you are just looking for this.
http://localhost:12101/api/phonemes

1 Like

I know about that list, it is what I meant when I said something about only giving English examples in my original post. I read through it another time when not half asleep and noticed only some examples are English words, but it is still less than ideal having to guess how the example word is pronounced. It also seems to be missing some German sounds that are common like sch.

There are already such lists used in rhasspy (I knew I read about them somewhere here) and there already is a list with phenomes used to guess words on the words page in rhasspy. All I am asking about/requesting is a way to select/switch which of those language lists is used to guess the pronunciation of a words since many languages have words that are imported from English and having rhasspy try to guess an English word with German pronunciation never turns out well, if I could tell it to guess this word in English instead of German, this would help. And if such a system were to exist, it would be a shame to not be able to download/activate it for all languages that one might need to guess a word for because there are other languages that bleed words over (German has a bit of French, and Netherlands is a mix of English, German and French, at least the bit I can understand by knowing those languages).

Sorry I didn’t mean to offend by suggesting the list.

Honesty I only just found it myself and thought it mught help you or others in your situation.

Being a native English speaker but using rhasspy in French I have the same problem often. But that is just my bad French.

TL:DR;
We would like to know how to use specific foreign words in a language profile in rhasspy.

I think I understand @Daenara issue as I have been struggling with this for a long time now. A couple of concrete examples: Using a German Profile with most words beeing German it is not possible to use English words at all, the IPA translations definitely do not help. So For example

“Play” does not work in a German Profile and the guess phoneme doesn’t do the trick (guess = “p l 'E I”), according to google the IPA of "play in english is [ˈpleɪ] which does not compile for a German Profile.

Same goes for words like “Play”,“Shield”, “Pause”, “Android”, “Nvidia” … and many more.
It would be great to figure this out somehow.

If you play around on this page Play - Pronunciation: HD Slow Audio + Phonetic Transcription you see that it is not possible to generate a transcription for play in German.
I hope the issue is somewhat clear now.