Minimum confidence in Kaldi

I don’t really understand how the MBR is calculated. Is this for the whole sentence, or does it take into account the confidence for each word?

For example I just tried it out by saying gibberish:
There is a random intent recognized and the likelihood is only 0.43.
So far so good.

When I then look at the different confidence levels of the words, this looks completely different.
They are in the 0.8 - 0.99 levels.

{"text": "lampe rosa alle lichter", "likelihood": 0.42996500000000004, "seconds": 2.5103948839823715, "siteId": "schreibtisch", "sessionId": "schreibtisch-computer_raspberry-pi-17d8873a-655f-4af0-8f32-6d86d8a3e52a", "wakewordId": null, "asrTokens": [[{"value": "lampe", "confidence": 0.899503, "rangeStart": 0, "rangeEnd": 6, "time": {"start": 0.000986619, "end": 0.270247}}, {"value": "rosa", "confidence": 0.869788, "rangeStart": 6, "rangeEnd": 11, "time": {"start": 0.27686, "end": 1.11}}, {"value": "alle", "confidence": 0.812442, "rangeStart": 11, "rangeEnd": 16, "time": {"start": 1.12572, "end": 1.46169}}, {"value": "lichter", "confidence": 0.999363, "rangeStart": 16, "rangeEnd": 24, "time": {"start": 1.46299, "end": 2.55}}]], "lang": null}
1 Like

I have been playing too, longer sentences seem worse in some ways…

Intent:-
[NonSense]
what color is the lamp post at the corner of the road

Tests:-
“what color is the lamp post at the the corner of the road” = confidence 1
“what color is the lamp at the corner of the road” = confidence 1
“what color is the corner is of the road” = confidence 1

Saying “what time is the color of the road” or “what color is the” simply triggers my [GetTime] intent which is “what time is it” The confidence is around 0.9 though so that is trappable with a reasonable score rating.

That seems to be a very wide assumption range of what is only a partial match and all score perfectly???

Does it match the whole sentence or just the first few words??

A little more, it really dislikes optional stuff it seems…

[SetTimer]
minutes = (1){min} minute | (2…59){min} minutes
seconds = (1){sec} second | (2…59){sec} seconds
set [a] timer for (1){min} and (a half){sec:30!int} minutes

Saying “set a timer for one and a half minutes” works but the score is very low at around 0.53

1 Like

I just upgraded to 2.5.10 in the hopes of taking advantage of the conference values to reduce false positives. This thread however is making me think that the results are mixed. Are less false positives being reported with silence? It seems that incorrect sentences or gibberish might still be pretty likely to generate false positives.

Here’s the code where this is happening. MBR is computed for the whole sentence. The word confidences are computed from the “one best” result.


There do seem to be other notions of confidence in Kaldi. Perhaps I should be using a different calculation than MBR. From my research, it seemed better than the other sentence-level confidence measure, which was just the likelihoold difference between the first and second transcriptions.

I include an “<unk>” (unknown) word during training, but I don’t think it can ever be chosen by Kaldi in Text FST mode. I may need to experiment with putting this into the sentences, so garbage words can become “<unk>” rather than some random word.

Hi,
Just upgraded my production setup to 2.5.10, have set min confidence to 0.9 for asr and have strange result. Can’t get any intent recognized. min confidence to 0 works nice.

On mqtt explorer, for “allume la lumière” I have this:
“value”: “lumière”,
“confidence”: 0.673712,

other words are above 0.95, entire sentence correspond exactly to sentences.ini
Why such low confidence ??

EDIT: even some common words like musique get 0.5 confidence … same for “la” :woozy_face:

Anyone using ASR min confidence with french ??

Ok, seems I don’t understand anything …

ASR min confidence 0.4

lower confidence = 0.6 but not recognized

{"text": "éteins la lumière", "likelihood": 0.018881999999999954, "seconds": 1.695091596999191, "siteId": "salle", "sessionId": "salle-nicolas-01f4a49d-cdd0-4baa-b596-f36d2438bbe7", "wakewordId": null, 
"asrTokens": [[{
"value": "éteins", "confidence": 0.777797, "rangeStart": 0, "rangeEnd": 7, "time": {"start": 0.0, "end": 0.9}}, 
{"value": "la", "confidence": 0.963026, "rangeStart": 7, "rangeEnd": 10, "time": {"start": 0.900449, "end": 1.09095}},
 {"value": "lumière", "confidence": 0.607585, "rangeStart": 10, "rangeEnd": 18, "time": {"start": 1.09095, "end": 1.30153}}]], "lang": null}

sentences:
[TurnOffJeedom]
(coupe | éteins | arrête) (la | le | les) ($device_name){device_name}

slors device_name:
( lumière | lampe ):lumiere

Any insight on this :face_with_monocle: ?

the minimum confidence that you put to 0.4 refers to the “likelihood” value of the whole sentence.
That means, that your recognized sentence of “éteins la lumière” did not pass your minimum confidence of 0.4. Therefore, was discarded.

Yes, but why, as every words are above of 0.6 ?? And how to know this ‘sentence’ confidence ?

This has most probably to do with the other possible sentences.
Check for use of optional words.
In my experience, I was able to solve one particularly stubborn sentence by breaking it up into more seperate pieces.