Vosk seems to use Kaldi under the hood and the acoustic models provided are the same as Rhasspy Kaldi service.
I do not think you’ll get any difference between Vosk and Rhasspy’s Kaldi implementation regarding accuracy.
@Aymux Regarding ASR, the Hermes protocol is pretty simple : start a decoding session when you receive
hermes/asr/startListening, subscribe to
hermes/audioServer/<siteId>/audioFrame to get the audio chunks to push into Vosk, when Vosk detects endpoint, send
hermes/asr/textCaptured. Never really understood the
hermes/asr/stopListening topic though.
It may be a good idea to substitute Vosk to the bash Kaldi scripts for better integration (the documentation is not very clear though).
@synesthesiam What do you think ?
The speaker recognition is based on Kaldi xvectors. I’m wondering about the accuracy of the prediction… This is interesting