Speech recognition, Kaldi vs DeepSpeech (Open transcription mode)

litinoveweedle · July 29, 2020, 10:09am

Hello,

I think tittle says it all. I am using Rhaspy 2.5.4, which still uses older DeepSpeech 0.6.1, and so far in Open transcription mode I am getting far better results with Kaldi. I am not native speaker, so my pronunciation is *&(&#!!!, and I think another issue is older DeepSpeech with only US English trained model.

My question is what is your experience? Is there somewhere GB trained model for DeepSpeech? Do you know about any plan of upgrading Rhasspy to newest DeepSpeech version? Or did I missed something completely? Thank you for any comment.

synesthesiam · July 29, 2020, 2:48pm

Hi @litinoveweedle! I haven’t had the time to test out the newest version of Deepspeech yet. Their development is moving so fast that others have not kept up (like this German model).

There’s enough open speech corpora available now that it seems feasible for the Rhasspy project to start training its own models (Kaldi, DeepSpeech, etc.). I may need to finally ask for donations to get a decent CUDA card, like a GTX 1080 Ti, to train the models.