STT Speeds on local pi setup

I haven’t found a Kaldi model for Italian yet, unfortunately. If we can get a big enough speech corpus, we can train one. The Zamia English model was trained on about 1200 hours of audio.

That’s certainly the truth. Using Kaldi, I get about 98-to-99% accuracy.
With pocketsphinx, it was barely 50% at best. :frowning:

Strange, I got very good intent catching with pocketsphinx, near 100%.
wakeword false positive is another story and my main problem with rhasspy actually … :sleepy:

Sounds like you have sensitive mic
Do you use snowboy?
Can you try to set audio_gain less than 1?
Or just play with sensitivity parameter

Pocketsphinx works fine as long as the intents and slots are simple.

From what I experienced, as soon as you start adding complexity it breaks down and the WER decreases drastically.

I’ve found that Piero Cosi used italian model in his Kaldi thesis.
I can try to contact him, maybe he has a model.

Based on a quick look at the paper, his model was trained on about 10 hours of children’s speech. Mozilla Common Voice has about 91 hours available. Maybe Piero is aware of other Italia speech corpora (preferably adult speech)?

I’m asked to him…waiting for an answer :slight_smile:

2 Likes

No fr model then ? :sleepy:

Don’t despair… here you go for the French model:

:wink:

Thanks ! Got Kaldi working now.

Not convinced anyway that it is better than pocketsphinx after a few try, but will keep it for a few days of testing.

Will see, still fighting with snowboy actually, really have to find some good settings.

Tried your Suggestion while running in Virtual Environment. Kaldi in general takes arround 8 Seconds for processing. With your tweaks the processing time is 2 Seconds but no text is recognized :confused:
Guess I’ll keep using Pocketsphinx (4 Seconds Processing) until the mentioned speed bump for Kaldi is released.

Piero answers me that he has a lot of models and he asks to me what of those models/files we need:

HMM
tri1 : first triphone system (delta+delta-delta features)
tri2 : an LDA+MLLT system
tri3 : Speaker Adaptive Training (SAT) system

SGMM2 Training
SGMM2 + MMI Training

DNN
Hybrid System (Dans DNN)
Combination SGMM + Dans DNN
Hybrid System (Karel’s DNN)
Hybrid System (Karel’s DNN), sMBR training

What of those does we need?

I’ve just tried it and have the same result((
@fastjack could you explain what you did step by step?
Btw, in my original kaldi folder I haven’t found normalization.fst and den.fst… so looks like I’m missing something🤔
The folder where I put the new files is rhasspy/profiles/en/kaldi/model/model

Which language model did u download? a german one? Got this issue as well.

Little bit of OT, but is there a guide for setting up kaldi with rhasspy as hassio addon? :slight_smile:

No, I’m using en model

There is nothing special… just select Kladi in Rhasspy settings. Then you will be asked to download files, and after that you will need to train. That’s it :nerd_face:

The option is greyed out in my case - i guess because i’m using a german profile?

I’ve heard about this bug, but I can’t reproduce it. The German profile definitely supports Kaldi, but the profile setting seems to be flipped in the web interface unless you start with English and then switch over.

i did not start with an english profile and had no problem selecting Kaldi for speech to text on a german profile. It works way better then pocketsphinx and almost always recognizes my senteces correctly. But quite a bit slower then pocketsphinx on a pi4 (nearly immediatly vs 2-3s)