STT Speeds on local pi setup

synesthesiam · January 14, 2020, 4:59pm

I haven’t found a Kaldi model for Italian yet, unfortunately. If we can get a big enough speech corpus, we can train one. The Zamia English model was trained on about 1200 hours of audio.

FredTheFrog · January 14, 2020, 7:21pm

That’s certainly the truth. Using Kaldi, I get about 98-to-99% accuracy.
With pocketsphinx, it was barely 50% at best.

KiboOst · January 14, 2020, 8:18pm

Strange, I got very good intent catching with pocketsphinx, near 100%.
wakeword false positive is another story and my main problem with rhasspy actually …

frkos · January 14, 2020, 8:24pm

Sounds like you have sensitive mic
Do you use snowboy?
Can you try to set audio_gain less than 1?
Or just play with sensitivity parameter

fastjack · January 14, 2020, 8:53pm

Pocketsphinx works fine as long as the intents and slots are simple.

From what I experienced, as soon as you start adding complexity it breaks down and the WER decreases drastically.

adrianofoschi · January 14, 2020, 9:57pm

I’ve found that Piero Cosi used italian model in his Kaldi thesis.
I can try to contact him, maybe he has a model.

synesthesiam · January 15, 2020, 3:22am

Based on a quick look at the paper, his model was trained on about 10 hours of children’s speech. Mozilla Common Voice has about 91 hours available. Maybe Piero is aware of other Italia speech corpora (preferably adult speech)?

adrianofoschi · January 15, 2020, 7:26am

I’m asked to him…waiting for an answer

KiboOst · January 15, 2020, 7:05pm

No fr model then ?

fastjack · January 15, 2020, 8:38pm

Don’t despair… here you go for the French model:

KiboOst · January 15, 2020, 9:28pm

Thanks ! Got Kaldi working now.

Not convinced anyway that it is better than pocketsphinx after a few try, but will keep it for a few days of testing.

Will see, still fighting with snowboy actually, really have to find some good settings.

shedz · January 16, 2020, 12:59pm

Tried your Suggestion while running in Virtual Environment. Kaldi in general takes arround 8 Seconds for processing. With your tweaks the processing time is 2 Seconds but no text is recognized
Guess I’ll keep using Pocketsphinx (4 Seconds Processing) until the mentioned speed bump for Kaldi is released.

adrianofoschi · January 16, 2020, 1:22pm

Piero answers me that he has a lot of models and he asks to me what of those models/files we need:

HMM
tri1 : first triphone system (delta+delta-delta features)
tri2 : an LDA+MLLT system
tri3 : Speaker Adaptive Training (SAT) system

SGMM2 Training
SGMM2 + MMI Training

DNN
Hybrid System (Dans DNN)
Combination SGMM + Dans DNN
Hybrid System (Karel’s DNN)
Hybrid System (Karel’s DNN), sMBR training

What of those does we need?

frkos · February 10, 2020, 8:23pm

I’ve just tried it and have the same result((
@fastjack could you explain what you did step by step?
Btw, in my original kaldi folder I haven’t found normalization.fst and den.fst… so looks like I’m missing something🤔
The folder where I put the new files is rhasspy/profiles/en/kaldi/model/model

Bozor · February 11, 2020, 9:14pm

Which language model did u download? a german one? Got this issue as well.

Sikk · February 11, 2020, 10:19pm

Little bit of OT, but is there a guide for setting up kaldi with rhasspy as hassio addon?

frkos · February 12, 2020, 5:26am

No, I’m using en model

There is nothing special… just select Kladi in Rhasspy settings. Then you will be asked to download files, and after that you will need to train. That’s it

Sikk · February 12, 2020, 1:49pm

The option is greyed out in my case - i guess because i’m using a german profile?

synesthesiam · February 12, 2020, 5:02pm

I’ve heard about this bug, but I can’t reproduce it. The German profile definitely supports Kaldi, but the profile setting seems to be flipped in the web interface unless you start with English and then switch over.

moqart · February 12, 2020, 6:56pm

i did not start with an english profile and had no problem selecting Kaldi for speech to text on a german profile. It works way better then pocketsphinx and almost always recognizes my senteces correctly. But quite a bit slower then pocketsphinx on a pi4 (nearly immediatly vs 2-3s)