STT Speeds on local pi setup

I just finished setting up my environment including homeassistant pocketsphinx and a deconz interface. However the decoding times of the Pocketspinx Decoder are arround three to five seconds, while the rest just takes milliseconds. Terminating homeassistant and deconz running on the pi doesn’t seem to change much.
I know i could include a home server for these computations for speed up but since I’m trying to do everything on one pi I was wondering if there are any ways of speeding up the process.

Are my times average or unusually high, what about your computation times?
Running on a Pi 3B+ btw…

By my experience I have similar times with Raspberry Pi 3b+ but It could depend from microphone or environment noise and size of wav registration (more seconds, more size).
I am trying two different setups:

  • Respeaker v1 (low cost) microphone, 2/3 seconds
  • Respeaker v2 (optimized) microphone, 3/5 seconds (probably slow for built-in algorithms or usb)

I have similar times with Raspberry 2 but I will try with Raspberry 4.

Often It happens that I stop to speak but the registration Is still Active becouse of environment noise. To resolve this I tried to decrease the timeout of the “command” section.

Suggestions are appreciated :slight_smile:

PS for good comparation we should specify our environment, I am using docker

I get the same with my RPi3b. Takes three / four seconds…

I’m on a RPi3b running Hassio - as Mic I’m using the PS3 Cam - Decode Times are 1.5 - 2.5 seconds

Regarding decoding time, are you using Kaldi? Using a smaller acoustic model (TDNN-250 instead of TDNN-F) reduced decoding times for me without any impact on accuracy.

@synesthesiam Maybe the smaller acoustic model can be provided as default?

Online decoding should also help to speed things up when Rhasspy supports it natively.

How to change the acoustic model?

On my rpi 3b+ with Kaldi I have to wait 4 seconds…
But here are some good news :nerd_face:

Until @synesthesiam updates the model provided for Kaldi profiles with smaller ones (as I think he will :wink: ) and if your language has such model (english, german and french do), you can simply replace the files in the {profile_dir}/kaldi/model/model folder by the TDNN-250 model files:

  • cmvn_opts
  • den.fst
  • final.mdl
  • normalization.fst
  • tree

The models should be available here:

Hope this helps.

1 Like

It does. Thanks a lot @fastjack

Maybe you could create a new topic showing how to install kaldi, and set it in Rhasspy ?
Not sure to have it all :wink:

There is not much to do I afraid… When your profile is all setup using the Kaldi Rhasspy profile, simply download the model from the link and swap the mentionned files. No config changes necessary (maybe retrain though?).

I pretty sure @synesthesiam will include these models natively in the near future as the narrow and specific language model trained by Rhasspy mitigates the “lightness” of the acoustic model.

To install Kaldi you just do that, then ?
https://kaldi-asr.org/doc/install.html

Ah I see what you mean… I did not build Kaldi as I’m using the Docker image so Kaldi is already installed with Rhasspy. The model files are downloaded by Rhasspy into the profile folder.

If you are not using Docker, you will have to build Kaldi using the link you provided (can be pretty complex though as Kaldi does not easily build on ARM).

Ah didn’t know that !! I use docker, so I will give it a try ! Thanks :beers:
Does number (1…100) builtin works with kaldi ?

Everything works with Kaldi :wink:
Much better than Pocketsphinx.

3 Likes

Thanks again, now I know what to do this evening :rofl:

Hi @fastjack
Thanks for sharing… will try it
But do I need to replace Kaldi files after every rasspy update?

I’m afraid so… If you download the Kaldi profile using the web UI, you’ll have to replace the file again. This is a quick fix to speed up the ASR transcription.

1 Like

Does kaldi support italian language?

Not now as it seems: https://rhasspy.readthedocs.io/en/latest/reference/