Launching ESP32-S3-BOX

orrious · February 17, 2023, 7:23pm

Hey folks, has there been any further discussion into leveraging the ESP32-S3 with a microphone array as a satellite? An ESP32-S3 is currently under $4.00 each, with plenty of availability, where as a RPIZ2W is supposed to be $15, but actually going for north of $45.00, if you can find them.

rolyan_trauts · February 17, 2023, 8:03pm

Yeah I got confused as Farnel where sending me emails that they are going to supply x2 Pi02 and I never noticed that the delivery date was Feb2024 but that is when Farnel are expecting to send them to me.

Actually strangely due to a discussion with @jacopo about SpeexAec its always bugged me that Esspressif provides blobs for much of thier libs than hackable open source.
As the AEC on the Esp32-S3-Box has strangely used a Hardware loopback to a 3rd ADC channel in TDM mode for the AEC.
I slept and refreshed thought on running AEC on the Pi because its not a Rtos even though as long as a single device provides in/out audio on the same clock in sync the delay between is not.
Which its strange from Esspressif as the esp32-S3 is a true Rtos and that delay is static and just been thinking today because of the AEC conversation that actually you might not need the Hardware loopback because you can put the delay code in the ADC driver which is opensource and maybe have AEC running on what is their esp32-s3-box-lite hardware example or any standard S3 with enough psram and x2 channel ADC.

I am thinking the esp32-box is like it is with a hardware loopback is purely because its a very bloated demo of an all-in and apart from the difficulty of acurately measuring delay (any code change adds instructions, so changing delay) so its a bit of a chicken and egg to measure.
So they went for a hardware loopback to garner the smallest tail filter length that creates least process and psram usage that a dedicated wireless KWS would have far more spare.
But on a PI it doesn’t matter as a single clock keeps in/out audio in sync just a bigger filter length (tail) is used so the frame can sync the ‘echo’ in each filter chunk to compensate for delay latency.
The bigger the filter length the more work and also the time of sync reduces attenuation so in crowded s3-box esspressif they went hardware loopback, but I am now thinking its not needed.
So much of that demo can be cast off and return much clock and ram back to a specific esp32-s3 network KWS.

I am hoping with Hass announcing 2023 is the year of Voice that one of the Guru’s from EspHome or Tasmoto might start looking at the bigger Esp32-S3 as Esspressif already provide some key ingredients with AEC & BSS as part of thier audio framework.
Tflite for Micro also has AGC in thier micro frontend and we just need the models as any quantised model can fit the framework and work with an argmax setting.

github.com

tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/microfrontend/audio_microfrontend.cc

/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "flatbuffers/flexbuffers.h"  // from @flatbuffers
#include "tensorflow/lite/context.h"
#include "tensorflow/lite/experimental/microfrontend/lib/frontend.h"
#include "tensorflow/lite/experimental/microfrontend/lib/frontend_util.h"
#include "tensorflow/lite/kernels/internal/tensor.h"
#include "tensorflow/lite/kernels/kernel_util.h"

This file has been truncated. show original

So yeah and maybe not the $4.00 ones but one with 8mb psram and a 2x channel ADC with mics or I2S mics (I am a big fan of MAX9814 & adc though)

You need a Pre & Post filter AGC (pre so you don’t clip and then post to boost filtered voice) but yeah its all there and even though I have been thinking about it, it really needs a Esp32 guru to just meld the libs together as its all available. (As cheating with the MAX9814 to give a pre filter silicon AGC as like ADCs modules are freely available on Ebay & AliExpress).

Currently due to Pi stock I have been playing with the OrangePi02 which is prob $30 delivered and a touch faster than a Pi3b+ and a $15 plugable USB stereo ADC sound card.

The S3 has got quite diverse now from.

To budget 1st ‘$4’ type I found, but the code is flash:psram so N8R2 8mb:2mb respectively which ideally the big one of 16mb:8mb might be the best bet until the awaited Guru manages to squeeze all function into whatever, but yeah ESP32 prices with x10 ML/DSP perf over the standard ESP32 on a rtos.

The 8mb ps ram ones can be found under $10 and extremely tiny.