My experience so far

I got the esp32-s3-box-lite seems to work just as well as the non-lite version so maybe that firmware can be hacked to use 2x I2S mics instead.
Haven’t really done anything dev wise.
I think the KWS might be non streaming and quite low speed rolling window as it seems to show the effects of that sort of method where if the window and timing is off you can get false positives.
Not exactly sure of that as it seems to work much better in certain positions and could of been position more,
It does work quite well and can operate under a bit of 3rd party noise but at full blast its aec doesn’t seem that great.

The internal amp and speaker are tiny and it does sound like a barbie toy and would only use for bleeps and announcements sounds than any form of media or voice output.

@Ferberto Have they dropped in price?

https://www.digikey.co.uk/en/products/detail/espressif-systems/ESP32-S3-BOX-LITE/15967391

The standard clone dev kits have now dropped to $7