Hi Michael, all
Reading here: https://rhasspy.readthedocs.io/en/latest/speech-to-text/, I see Vosk and Coqui STT are missing and maybe I’d suggest to list/integrate them.
I’m pretty enthusiast because latencies are very slow ( ~50 msecs - ~500 msecs on my PC). See some tests on my project voskjs, a simple Vosk-api nodejs wrapper. Language coverage is similar to the DeepSpeech/ (now Coqui STT), but latency are from twice as fast, to an order of magnitude faster (e.g. using small models with grammars, you obtain few tents of msecs for few words sentences)!
In voskjs I implemented the simple HTTP demo server
voskjshttp. I’d glad to extend it to be used in RHASSPI. May you confirm that the integration could be done withe the “Remote HTTP Server” as described here: https://rhasspy.readthedocs.io/en/latest/speech-to-text/#remote-http-server and here: https://rhasspy.readthedocs.io/en/latest/reference/#http-api ?
All pretty clear but I have a question: using the “Remote HTTP Server” way, client can not specify any parameter in the POST request. Right?
If confirmed, with minor update on
voskjshttp, RHASSPY user could use it to test Vosk ASR.
as you know ids a recent DeepSpeech fork, made by DeepSpeech core team developers, after Th Mozilla “suspension” of DeepSpeech (no polemics intended now). I made CoquiSTTjs, a simple/draft nodejs wrapper.
BTW, Coqui STT, as DeepSpeech, is, as pretty all ASRs, cpu-consuming and unfortunately works on a single thread, see: https://github.com/coqui-ai/STT/discussions/1870. I’ll try to implement a multi-thread/multi-process server architecture as part of my CoquiSTTjs project. If I success I’d try to implement a “Remote HTTP Server” interface.