I have Rhasspy running for a while now, but i still have too much issues that prevent an acceptable WAF
- Intel Celeron N4100
- 4GB Ram
- Ubuntu (Rhasspy direct install, no docker)
- PSEye Mic
- AEC with PulseAudio WebRTC
- Porcupine with “snowboy”, confidence 0.7
- Kaldi with ARPA, no open transcript, DE model, confidence 0
My specific problems:
- Wakeword and intent recognition require a very clear pronounication
- Intent recognition does barely work when speaking fast
- Intent recognition requires a short delay after wakeword, no “fluent speaking”
- Wakeword has very high false negatives when watching movies/series
I already tried snowboy as wakeword engine. It has somewhat lower false negatives rate (especially when watching movies), but substantially higher false positive. I know that really solving (3) would require a feature that Rhasspy does not have: Buffering the audio stream and applying kaldi, starting a few ms in the “past”. For (4), i should probably go for a better AEC algorithm (maybe https://github.com/SaneBow/PiDTLN ?).
Hence, my questions:
- Anyone having encountered similar issues (especially 1,2) and has fine tuned the setup accordingly?
- Is there a way to resolve 3?
- Anyone having a really good AEC setup running (for 4)?