I have to admit, this setup is running on a Celeron Processor N4100, not a PI. For validation, i simply checked the difference between the mic activation levels (raw and AEC) and it is pretty clearly working… Wake word false positive rate after 4-6h series/movie watching was zero. (at least, not due to speaker sound )
If you are interested in examples from the x86, i can provide some.
On a sidenote: i didnt test speedx, which is supposedly comsuning less resources.
No as your input is welcome but found it to work on x86 seemed to do with my desktop.
Its just on a Pi where it does actually work but on only really low levels of ‘echo’ in the mic stream.
AEC only cancels what you play it doesn’t handle 3rd party noise, presume you watched on device?
Yes, you are right - i supressed a known source (watched on same device). The docs are not 100% clear on that, but it seems like the pulse audio module is also supposed to do noise cancelation. However, i wouldnt assume it to be able to supress unkown, load, voice sources. I don’t see how such a module would be able to differentiate between “human” and “tv” voice.
Maybe microphone beamforming is an option for you?
Yeah the noise cancellation is a bit pants apart from newer techs like nvidia voice as they leave audio artifacts that can reduce recognition.
The Arch docs are prob best for PA AEC.
I have been playing with the new TFL state-of-art KWS models and with a custom dataset of my voice the crrn-state model does an amazing job of just picking up my voice, just haven’t got enough words of mine in ‘unknown’