Including the Sentence leading to the hotword for Increased Functionality

shedz · January 12, 2020, 2:30pm

I’m trying to realize a project running on a pi 3 where the last sentence leading up to the hotword is also considered in computation.
My plans so far are to keep the Speech to text switched on at all times transcribing what is said and overwriting after speech pauses. Once the Hotword has been mentioned the last sentence is analysed for commands as well as the tailing one.
Thus it would be possible to say: “Turn the light on Rhasspy” as well as "Rhasspy, Turn the light on." Also I would test the limitations of the Pi 3 to see if it is still in the acceptable range of response times.
Once completed I would make it available for the Rhasspy community in case anybody wanted to toy with it.
I created this topic to discuss the best way of realisation and for any advice anyone is willing to share.
Thanks in Advance

sepia-assistant · January 17, 2020, 10:48am

Typically what system like Amazon Alexa do is buffering the audio and when they see a hot-word/wake-word they transcribe the buffered audio to get the full sentence.
I’ve thought about this a lot in connection with SEPIA but it’s a very tricky, time consuming task to get it right.
The reason why there is a difference between wake-word detection and speech recognition is resource management. The WW detection can be done with low resources but thus can only recognize very few words. The ASR can recognize very large vocabulary but is very slow and resource intense.

shedz · February 12, 2020, 12:06pm

Maybe Deepspeech will be the solution. It provides almost acceptable offline transcription rates on the pi 4 (1.9s while 1.6 is still acceptable) and with a good vocabulary. Times can even improve within a limiting context and thus limited vocabulary.
Benchmarks