Sharing Unknown Words

Each time I create a new intent file (all my slots are define with files), training find new words from the slots added in the intent file.

So with time I will generate lot of unknown word I guess.

Maybe we could share, merge and enhance such files to rhasspy words definitions for each language somewhere ?

It seems some have been added automatically:

l’éclairage ll ei kk ll ai rr aa jj
cuisine kk uy ii zz ii nn
salon ss aa ll on
séjour ss ai jj ou rr
exterieur ai kk ss tt ai rr yy oe rr
sejour ss ei jj ou rr
cusine kk uu zz ii nn
dressing dd rr ai ss ii ii gg
entree an tt rr ii
l’entrée ll an tt rr ei
l’étage ll ei tt aa jj
fenêtre, ff ee nn ai tt rr
rideaux, rr ii dd au
store, ss tt oo rr
volet, vv oo ll ai

And here some new one I will try to add

ampli
aspi
camera
chromecast
d’eau
kodi
l’alarme
l’ampli
lampadaire
monitoring
switch
videoprojecteur

How does this works ? Why some are automatically added, should we care about adding them manually ?

Just saw dictionary.txt file in config, is it normal to have some words several times ?

bains bb ai nn ss
bains(2) bb in
bains(3) bb in zz

chambre ch an bb rr
chambre(2) ch an bb rr ee
chambres ch an bb rr
chambres(2) ch an bb rr ee
chambres(3) ch an bb rr ee zz
chambres(4) ch an bb rr zz

console kk on ss oo ll
console(2) kk on ss oo ll ee

etc

I also saw base_dictionary.txt file, maybe we can enhance this one for the community.

according to these doc : https://cmusphinx.github.io/wiki/tutorialdict/ multiple occurences of same word are used for alternative pronunciations.

In you case i would delete line 1 and 3 an only keep line 2 without (2) because in french the final “s” is muted.

same problem for chambres(3) and chambres(4)

It depends if you want to handle the « s » liaison between words:

Les chambres_et les_autres pièces

And tonic accent like:

chambr / chambre / chambrE / etc

2 Likes