Custom converter for Numbers little occupation and therefore very fast

Hello everyone … I hailed myself to the phenomenon of voice recognition of large digits and response times that increase the larger the number is see here and I found a solution that I propose to you: it is based on two functionalities
1 - a custom converter which translates these into “number” text
explanation: we enter two hundred and twenty and we get a string “220”

the custom converter: (file in /profiles/fr/converters/text2num)

#!/usr/bin/env python3
import sys
import json
from text_to_num import text2num
INlst = sys.stdin.readlines()
val=’’.join(INlst)
z=len(INlst)
if z == 0:
print(" “)
quit()
val=val.replace(’”"’,’ ‘)
val=val.replace(‘quatre vingt’,‘quatre-vingt ‘)
val=val.replace(‘dix sept’,’ dix-sept’)
val=val.replace(‘dix huit’,’ dix-huit’)
val=val.replace(‘dix neuf’,’ dix-neuf’)
val=val.replace(’"’,’’)
valnum = text2num(val,‘fr’)
print(str(valnum))

2 - a rules for number in text (in /profiles/fr/intents/Chiffres.ini)

[strchiffres]
deux_9 = ( deux | trois | quatre | cinq | six | sept | huit | neuf )
un_une = ( (un | et un) | (une | et une) )
un_9 = ( <un_une> | <deux_9> )
un_23 = [( <un_9> | <dix_19>)] [ vingt [ (<un_une>| deux| trois)] ]
un_59 = [<diz_1mot>] [( <un_9> | <dix_19> ) ]
dix_19 = ( dix | onze | douze | treize | quatorze | quinze | seize | dix sept | dix huit | dix neuf )
diz_1mot = ( vingt | trente | quarante | cinquante | soixante )
diz_2mot = ( soixante dix | quatre vingt )
un_100 = [ (<diz_2mot>| <diz_1mot>) ] [(<un_9> | <dix_19>) ]
cent = cent
mille = mille
million = million
nombre = [ [ <un_9> ] <cent> ] [ [ <un_100> ] <million> ] [ [ <deux_9> ] <cent> ] [ [ <un_100> ] <mille> ] [ [ <deux_9> ] <cent> ] [ <un_100> ]

You just have to create an intention like for example:

[setValeur]
valeur <strchiffres.nombre> {text2num!text2num}

That’s it … if it can help a few
cordially
Arpagor

Been there done that.

1 can be done with a number of tools (duckling, rustling, rasa, text2num) and work quite well (I’m using my own Node native binding for Rustling to do this).

2 is more complicated. It has the potential to unbalance the language model and stear transcriptions toward strange results. Use with caution. To avoid this you can use the « text_fst » language model type. It should help a bit.

Cheers

Thank you for your comments, but I will clarify:
1 - I am already using the Language Model Type “Text FST” option in Kaldi
2 - Similarly my converter uses Text2Num as you suggest
3 - Can you explain to me the remarks “it’s more complicated” and “it risks unbalancing the language model”
because everything happens at the NLU stage and if we compare the INTENT Json for

[setVolume]
set Volume (0…100) {volume}

and

[setValue]
set Value <strdigits.number> {value! text2num}

in the end after conversion the result of INTENT Json are stictly identical

Being French we have a syntax which is specific for the numbers which agree to the masculine / feminine example:

vingt et “une” pommes
vingt et “un” mille pommes

not to mention the response times when you want values> 100,000!
for a range (1…1000) -> more than 15 sec of trainning and langague_model.txt goes from 39 KB to 2567 KB

cordially
Arpagor

It should work nicely using the text_fst language model type.

Just wanted to point that out because using the arpa type (which was the only option at the time I did this) will tip the language model toward more frequent words series.

In arpa mode, as the rule repeats a lot of « et un | et une », the probability of having « un » or « une » after « et » is increased to a point where it is so high the decoder kind of always ears it whatever the real word is. Same goes for other frequent words repetition in the dataset.

Using arpa allows for more flexibility but comes at the price of possibly unbalanced language model.

Using text_fst avoid this but at the cost of more rigid transcriptions. It does not allow any transcription that was not defined in the language model.

yes I completely agree and that’s why I had validated the Text FST option of kaldi
but for large numbers> 99999 the option (1…10000) is in my opinion too expensive in resources hence my study to finalize a converter which is much less greedy!
if I go to (0…10000) the language_modele.txt file goes from 1660 to 96.474 lines (2.567 KB)
and if I test (0…100000) we go to 1.085.674 lines (31.239Ko), 6 min 45 sec for training and 10 sec to respond to the intention “value 100000”
with the converter it’s almost instantaneous.

1 Like

The grammar you provided seems to only handle numbers from 0 to 100. Have you pushed it to higher maximum values?

What was the impact on training and decoding time (as comparing it to the 0…100000 format would then be incorrect)?

EDIT: Looking at the rule again, I think it just miss the <cent>, <mille> and <million> rules. It should not impact the perfs that much.

:+1:

My rule covers up to 999 million 999 thousand 999 -> almost instantaneous response

nombre = [ [ <un_9> ] <cent> ] [ [ <un_100> ] <million> ] [ [ <deux_9> ] <cent> ] [ [ <un_100> ] <mille> ] [ [ <deux_9> ] <cent>] [ <un_100> ]

Sorry the words <million> , <mille> and <cent> had disappeared due to being between <> (I added a “\” to make them appear!)

I could add billions etc … :slight_smile:

small improvement of the numbers rule:

[strchiffres]
un_une = ( (un | et un) | (une | et une) )
un_9 = ( <un_une> | <deux_9> )
deux_9 = ( deux | trois | quatre | cinq | six | sept | huit | neuf )
un_23 = [( <un_9> | <dix_19>)] [ vingt [ (<un_une>| deux| trois)] ]
un_59 = [<diz_1mot>] [( <un_9> | <dix_19> ) ]
dix_19 = ( dix | onze | douze | treize | quatorze | quinze | seize | dix sept | dix huit | dix neuf )
diz_1mot = ( vingt | trente | quarante | cinquante | soixante )
diz_2mot = ( soixante dix | quatre vingt )
un_100 = [ (<diz_2mot>| <diz_1mot>) ] [(<un_9> | <dix_19>) ]
nombre = [ [ <un_9> ] ( cent | cents ) ] [ [ <un_100> ] (million | millions) ] [ [ <deux_9> ] ( cent | cents ) ] [ [ <un_100> ] (mille) ] [ [ <deux_9> ] ( cent | cents ) ] [ <un_100> ]

Cordially
Arpagor

2 Likes