Looks like a solid research. Also it may possible to run on Pi as it has 3M parameters, similar to DTLN-aec 256.
I have forgot what load DTLN-aec finally had with the 128 when you completed your great implementation.
But I posted purely purely as interesting neural implementation, but when you start loading up with NS, EC, DOA, Beamformer & KWS for say a satellite it does start to add up.
A beamformer basically needs to add a delay to a 2 mic and sum…
https://nrr.mit.edu/sites/default/files/documents/Conventional%20Beamforming%20-%20Introduction.html
Its annoyed me for a while as we don’t need super accurate DOA & beamformer we just need a basic DOA & Beamformer that can work in conjunction with noise resilient models that overall result accuracy is much higher than the sum of the parts.
Both likely want to run at a higher sampling rate than the 16k of KWS which is no prob as most hardware is resamples from 44k (or 41k or whatever)
But to have a steerable beamformer that can provide 4 position 90’ attenuation would be a great leap.
Google have definitely gone this route with lower end 2 mic hardware.
Also with satellites distributed wireless KWS should be stream selectable based on best KW hit in a more pro conference style array at Pi3A+ prices.