g2p.fst versus model.crf

The initial version of rhasspy/gruut was based on rhasspy/phonetisaurus-pypi
to train a g2p model for a new language. I used it to create the g2p.fst model for the luxembourgish language, based on a dictionary with the following format:

avl	ɑ v l
aachtchen	aː χ t ɕ ə n
aachtche	aː χ t ɕ ə
aachtercher	aː χ t ɐ ɕ ɐ
aachtdeeler	aː χ t d eː l ɐ
aachteck	aː χ t æ k

The latest gruut version is based on python-crfsuite to train a crf-model for a new language, by using the gruut/g2p.py script. To my understanding the required format for the pronunciation dictionary is now as follows (example for english):

a}ˈeɪ '|s}z
a}ˈɑ a}_ b}b e|r}ɚ g}ɡ
a}ˈɑ a}_ c}k h}_ e}ə n}n
a}ˈɑ a}_ c}k h}_ e}ə n}n e|r}ɚ
a}ˈɑ a}_ h}_
a}ˈɑ a}_ k}k e|r}ɚ
a}ə a}_ l}l i}ˈi y}ə a}_ h}_

I read all available documentation and explored the gruut code, but I am totally lost !
How can I create a dictionary in a new language with the required format ?
Is it possible to use the old g2p.fst models in the latest gruut version ?
Are there other solutions to progress ?

Thank you for your hints and advise.

1 Like

Hi @mbarnig, sorry for the confusion!

I temporarily removed the phonetisaurus dependency to make including gruut in Coqui-TTS easier. I plan to add support back soon with an optional flag like pip install gruut[fst]

I didn’t make it clear in the docs that the format of the input file to gruut/g2p.py is actually the output of phonetisaurus. When you run phonetisaurus train, you specify a path with --model for g2p.fst. But you can also add --corpus g2p.corpus to create the “pronunciation dictionary”.

This g2p.corpus file is the result of phonetisaurus’ alignment of a lexicon. You can see that } separates graphemes and phonemes, and that | indicates multiple graphemes/phonemes (_ is the “empty” phoneme). I train my CRF models using this alignment, which reduces the complexity of the training code for me [1]. My CRF models are not as good as the phonetisaurus FSTs, but they work well enough :slight_smile:

I’ll let you know when I get phonetisaurus support back into gruut!

[1] Plus you can do really cool things with g2p.corpus, like have users specify pronunciations using segments of known words

Thank you very much for the quick answer. This is great.
It was my fault. I didn’t see that phonetisaurus-pypi creates the required corpus:

$ phonetisaurus train --help
usage: phonetisaurus train [-h] [--corpus CORPUS] [--lexicon-word-separator LEXICON_WORD_SEPARATOR] [--lexicon-phoneme-separator LEXICON_PHONEME_SEPARATOR] --model MODEL
                           [--casing {lower,upper,ignore}] [--debug] [--machine {x86_64,armv6l,armv7l,armv8}]
                           lexicon [lexicon ...]

positional arguments:
  lexicon               Path(s) to read one or more phonetic dictionaries

optional arguments:
  -h, --help            show this help message and exit
  --corpus CORPUS       Path to write trained g2p corpus
  --lexicon-word-separator LEXICON_WORD_SEPARATOR
                        Separator regex between words in each lexicon entry (default: \s+)
  --lexicon-phoneme-separator LEXICON_PHONEME_SEPARATOR
                        Separator regex between phonemes in each lexicon entry (default: \s+)
  --model MODEL         Path to g2p model
  --casing {lower,upper,ignore}
                        Case transformation to apply to words
  --debug               Print DEBUG messages to the console
  --machine {x86_64,armv6l,armv7l,armv8}
                        Override detected platform machine type

Now my luxembourgish corpus is ready (lb-corpus.dict : 14,1 MB) :

a}ɑ v}v l}l
a|a}aː c|h}χ t}t c|h}ɕ e}ə n}n
a|a}aː c|h}χ t}t c|h}ɕ e}ə
a|a}aː c|h}χ t}t e|r}ɐ c|h}ɕ e|r}ɐ
a|a}aː c|h}χ t}t d}d e|e}eː l}l e|r}ɐ
a|a}aː c|h}χ t}t e}æ c|k}k

The next step is the training of the crf-model. This takes some time. While waiting for the result I visited the voice2json.org website.

Finally the model-lb.crf file is saved : 474,1 KB.

Time to check if it’s working. Let’s predict the first sentence of the fable De Norwand an d’Sonn.

python3 gruut/g2p.py predict --model /home/mbarnig/myTTS-Project/g2p/model-lb.crf  --debug "An der Zäit hunn sech den Nordwand an d’Sonn gestridden, wie vun hinnen zwee wuel méi staark wier, wéi e Wanderer, deen an ee waarme Mantel agepak war, iwwert de Wee koum."


ɑ n d ə ʀ e ts æːɪ t h u n z ə ɕ d ə n ɑ̃ː n ɔ ʀ t v aː n d aː n d z o n g ə ʃ t ʀ i d ə n yː v iə v u n t h i n ə n ts w eː v uə l m ɜɪ ə ʃ t aː ɐ k s v iː ɐ ɲ v ɜɪ aː ʀ v ɑ n d ə ʀ ə ʀ ɑ̃ː d eː n ɑ n eː v aː ʀ m eː m ɑ n t ə l ɑ g ə p aː k v aː ʀ d i v ɐ t d eː v eː k əʊ m

Let’s compare with the phonemes guessed by the old model g2p-lb.fst:

phonetisaurus predict --model /home/mbarnig/myTTS-Project/g2p/g2p-lb.fst "an der zäit hunn sech den nordwand an d’sonn gestridden, wie vun hinnen zwee wuel méi staark wier, wéi e wanderer, deen an ee waarme mantel agepak war, iwwert de wee koum."


 ɑ n d ɐ ts æːɪ t h u n z ə ɕ d æ n ɔ ʀ d v ɑ n d ɑ n t z o n g ə ʃ t ʀ i d æ n v iə f u n h i n ə n ts w eː v uə l m ɜɪ ʃ t aː ʀ k v iə v ɜɪ ə v ɑ n d ə ʀ ɐ d eː n ɑ n eː v aː ʀ m ə m ɑ n t ə l aː ʁ ə p aː k v ɑ ʀ i v ɐ t d ə v eː k əʊ m

Here is the official luxembourgish phonemization:

ɑn dɐ ‚ʦæ:ɪt / hun zeɕ dən ’noʀtvɑnt ɑn ‚dzon gə’ʃtʀidən / viə fun hinən ‚ʦve: vuəl ‚meɪ ʃta:ʀk viɐ / veɪ eː ‚vɑndəʀɐ / de:n ɑn eː ‚va:ʀmə ‚mɑntəl ‚a:ɡəpa:k va:ʀ / ivɐt də ‚veː kəʊm 

Not quite the same, but I think both models will do a good job in combination with the pronuniciation dictionary.

After a final check of my whole configuration I will soon start the first training round with my small luxembourgish dataset and the crf-model.

1 Like