g2p.fst versus model.crf

mbarnig · June 27, 2021, 1:34pm

The initial version of rhasspy/gruut was based on rhasspy/phonetisaurus-pypi
to train a g2p model for a new language. I used it to create the g2p.fst model for the luxembourgish language, based on a dictionary with the following format:

....
avl	ɑ v l
aachtchen	aː χ t ɕ ə n
aachtche	aː χ t ɕ ə
aachtercher	aː χ t ɐ ɕ ɐ
aachtdeeler	aː χ t d eː l ɐ
aachteck	aː χ t æ k
....

The latest gruut version is based on python-crfsuite to train a crf-model for a new language, by using the gruut/g2p.py script. To my understanding the required format for the pronunciation dictionary is now as follows (example for english):

....
a}ˈeɪ '|s}z
a}ˈɑ a}_ b}b e|r}ɚ g}ɡ
a}ˈɑ a}_ c}k h}_ e}ə n}n
a}ˈɑ a}_ c}k h}_ e}ə n}n e|r}ɚ
a}ˈɑ a}_ h}_
a}ˈɑ a}_ k}k e|r}ɚ
a}ə a}_ l}l i}ˈi y}ə a}_ h}_
.....

I read all available documentation and explored the gruut code, but I am totally lost !
How can I create a dictionary in a new language with the required format ?
Is it possible to use the old g2p.fst models in the latest gruut version ?
Are there other solutions to progress ?

Thank you for your hints and advise.

synesthesiam · June 27, 2021, 2:16pm

Hi @mbarnig, sorry for the confusion!

I temporarily removed the phonetisaurus dependency to make including gruut in Coqui-TTS easier. I plan to add support back soon with an optional flag like pip install gruut[fst]

I didn’t make it clear in the docs that the format of the input file to gruut/g2p.py is actually the output of phonetisaurus. When you run phonetisaurus train, you specify a path with --model for g2p.fst. But you can also add --corpus g2p.corpus to create the “pronunciation dictionary”.

This g2p.corpus file is the result of phonetisaurus’ alignment of a lexicon. You can see that } separates graphemes and phonemes, and that | indicates multiple graphemes/phonemes (_ is the “empty” phoneme). I train my CRF models using this alignment, which reduces the complexity of the training code for me [1]. My CRF models are not as good as the phonetisaurus FSTs, but they work well enough

I’ll let you know when I get phonetisaurus support back into gruut!

[1] Plus you can do really cool things with g2p.corpus, like have users specify pronunciations using segments of known words

mbarnig · June 27, 2021, 6:08pm

Thank you very much for the quick answer. This is great.
It was my fault. I didn’t see that phonetisaurus-pypi creates the required corpus:

$ phonetisaurus train --help
usage: phonetisaurus train [-h] [--corpus CORPUS] [--lexicon-word-separator LEXICON_WORD_SEPARATOR] [--lexicon-phoneme-separator LEXICON_PHONEME_SEPARATOR] --model MODEL
                           [--casing {lower,upper,ignore}] [--debug] [--machine {x86_64,armv6l,armv7l,armv8}]
                           lexicon [lexicon ...]

positional arguments:
  lexicon               Path(s) to read one or more phonetic dictionaries

optional arguments:
  -h, --help            show this help message and exit
  --corpus CORPUS       Path to write trained g2p corpus
  --lexicon-word-separator LEXICON_WORD_SEPARATOR
                        Separator regex between words in each lexicon entry (default: \s+)
  --lexicon-phoneme-separator LEXICON_PHONEME_SEPARATOR
                        Separator regex between phonemes in each lexicon entry (default: \s+)
  --model MODEL         Path to g2p model
  --casing {lower,upper,ignore}
                        Case transformation to apply to words
  --debug               Print DEBUG messages to the console
  --machine {x86_64,armv6l,armv7l,armv8}
                        Override detected platform machine type

Now my luxembourgish corpus is ready (lb-corpus.dict : 14,1 MB) :

....
a}ɑ v}v l}l
a|a}aː c|h}χ t}t c|h}ɕ e}ə n}n
a|a}aː c|h}χ t}t c|h}ɕ e}ə
a|a}aː c|h}χ t}t e|r}ɐ c|h}ɕ e|r}ɐ
a|a}aː c|h}χ t}t d}d e|e}eː l}l e|r}ɐ
a|a}aː c|h}χ t}t e}æ c|k}k
....

The next step is the training of the crf-model. This takes some time. While waiting for the result I visited the voice2json.org website.

Finally the model-lb.crf file is saved : 474,1 KB.

Time to check if it’s working. Let’s predict the first sentence of the fable De Norwand an d’Sonn.

python3 gruut/g2p.py predict --model /home/mbarnig/myTTS-Project/g2p/model-lb.crf  --debug "An der Zäit hunn sech den Nordwand an d’Sonn gestridden, wie vun hinnen zwee wuel méi staark wier, wéi e Wanderer, deen an ee waarme Mantel agepak war, iwwert de Wee koum."

>>>

ɑ n d ə ʀ e ts æːɪ t h u n z ə ɕ d ə n ɑ̃ː n ɔ ʀ t v aː n d aː n d z o n g ə ʃ t ʀ i d ə n yː v iə v u n t h i n ə n ts w eː v uə l m ɜɪ ə ʃ t aː ɐ k s v iː ɐ ɲ v ɜɪ aː ʀ v ɑ n d ə ʀ ə ʀ ɑ̃ː d eː n ɑ n eː v aː ʀ m eː m ɑ n t ə l ɑ g ə p aː k v aː ʀ d i v ɐ t d eː v eː k əʊ m

Let’s compare with the phonemes guessed by the old model g2p-lb.fst:

phonetisaurus predict --model /home/mbarnig/myTTS-Project/g2p/g2p-lb.fst "an der zäit hunn sech den nordwand an d’sonn gestridden, wie vun hinnen zwee wuel méi staark wier, wéi e wanderer, deen an ee waarme mantel agepak war, iwwert de wee koum."

>>>

 ɑ n d ɐ ts æːɪ t h u n z ə ɕ d æ n ɔ ʀ d v ɑ n d ɑ n t z o n g ə ʃ t ʀ i d æ n v iə f u n h i n ə n ts w eː v uə l m ɜɪ ʃ t aː ʀ k v iə v ɜɪ ə v ɑ n d ə ʀ ɐ d eː n ɑ n eː v aː ʀ m ə m ɑ n t ə l aː ʁ ə p aː k v ɑ ʀ i v ɐ t d ə v eː k əʊ m

Here is the official luxembourgish phonemization:

ɑn dɐ ‚ʦæ:ɪt / hun zeɕ dən ’noʀtvɑnt ɑn ‚dzon gə’ʃtʀidən / viə fun hinən ‚ʦve: vuəl ‚meɪ ʃta:ʀk viɐ / veɪ eː ‚vɑndəʀɐ / de:n ɑn eː ‚va:ʀmə ‚mɑntəl ‚a:ɡəpa:k va:ʀ / ivɐt də ‚veː kəʊm

Not quite the same, but I think both models will do a good job in combination with the pronuniciation dictionary.

After a final check of my whole configuration I will soon start the first training round with my small luxembourgish dataset and the crf-model.