Budget $10 AEC my annoying voice and gpu_fft

fastjack · April 24, 2020, 6:07pm

This is awesome!

I’m probably gonna order a Respeaker 2mic hat soon to try to reproduce your example (10$ is much better than the 70$ for a Respeaker Mic Array v2) and see if it perform on par with the more expensive hardware solution.

Unfortunately I’m no C guru (maybe @maxbachmann can help?)

Either way I’ll be eagerly tracking your progress as I too think that AEC is paramount

rolyan_trauts · April 24, 2020, 6:12pm

The respeaker is https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT.html $9.90

The only downside is that they fix to a certain kernel and the drivers only seem to work with raspbian.
The PS3eye driver with its Ctl faults and wrong sampling rate reports works but it has problems on all.

I have been moaning at Seed on the forums as Ubuntu64 with the Pi4 showing considerable improvements on 64 bit would be really nice especially since the shiny and new 20.04 LTS is out with a the 5.4 kernel.

There are some slightly cheaper floating around on ebay & aliexpress that a search for “wm8960 pi” will prob return.

Give us a shout when yours turns up as wow it took some time due to current situation for mine.
I have been playing in general with some of the cheaper modules and the 50w TPA3116 modules for $5 are pretty damn good.
I am really interested in the Master/satellite model Rhassby can do and interested in what you think and try yourself.

These are the other sort and about same price delivered.

https://www.ebay.co.uk/i/133306675973?ul_noapp=true

I think the LEDS are a bit beefier on those if of interest.

I do have one of these on the way which are nice with the 4 mics and pixel ring but unfortunately will require a 2nd output card and will be prone to EC killing clock drift.

Andrew49 · April 24, 2020, 6:29pm

Could you provide a link, please?

rolyan_trauts · April 24, 2020, 6:41pm

This was the one I purchased and not sure how that differs to my sure electronics one.

The Sure is far better build quality and the audio is better but is it the price better… ?

I have one of these on the way just out of curiosity as it has mute and the circuit layout looks better with a smattering of higher quality audio caps. It might well be just the right cost / quality cross point.

Will tell you but to be honest the cheap and cheerful for what it is is pretty damn good.

Ignore the 100watt as that is 10%TDR into 2ohm, but yeah on 24V you will get 40watt +
The Wondom is just pure quality but for purpose is it worth it?

The 1watt per channel of the 2mic might be enough. Its a shame its driven from the headphone amp rather than line out as may need DC blocking caps.
But the Cirrus Logic PDF posted above has all the tech details also may help with what is at first quite a bewildering array of controls in alsamixer.

The amps I coupled to a

Which is about $10+ and gives a great and loud full range.

rolyan_trauts · April 24, 2020, 7:05pm

Much is about construction as audio is very directional so if mics are like Google or Amazon do on top of a radiating speaker with vibration and some sort of isolation you can improve echo feedback greatly.
I had the speaker pointing at me with the mic in the middle sat on a desk as an open board with the speaker so much was vibration through the desk.

So absolutely sure with a designed enclosure can make something much better.
Also want to try and get a filter to ‘clip’ the lower portion always get confused with compressors, expanders and noise gates to actually what they do.

maxbachmann · April 24, 2020, 9:40pm

When compiling on a raspberry pi 3 with gcc you should be able to pass

-mcpu=cortex-a53 -mfpu=neon-vfpv4

the flags for it should be

CT_ARCH_CPU="cortex-a53"
CT_ARCH_FPU="neon-vfpv4"

For the pi4 it is the cortex-a72

I did look into python bindings using the python C API for RapidFuzz in the last time, so thats something I could add I guess. @fastjack and I talked about this here aswell

rolyan_trauts · April 24, 2020, 10:19pm

I should of just tried this and kept quiet on how woefully unknowlegdable but can you just export the CT_FLAGS or do they have to be part of the configure script?
I will give it a whirl @maxbachmann and if we can tempt you to have a look at speechdsp I have a hunch it may be much less painful for you

Do you have any opinion on inserting the gpu-fft code into the speexdsp code? for the FFT routines?
The 2nd trial run on the Pi3 didn’t do as near as well as the Pi4 but it was compiles to use the FFTW3 libs instead interal and didn’t spend to much time setting up the card.

To get enough EC to provide possible recognition is all that is needed and not expecting hardware dsp results but just think that many could improve what is likely my poor attaempts.
I will have a read about what you mention above.

maxbachmann · April 24, 2020, 11:25pm

You can set the CFLAGS environment variable before running make

export CFLAGS="-mcpu=cortex-a53 -mfpu=neon-vfpv4"
make all

I never really worked with code on the GPU. For my problems so far the added complexity was not really worth diving into yet another hell when running my code accross multiple platforms ^^

rolyan_trauts · April 24, 2020, 11:42pm

I will have a play with some stuff but as for complexity I am not sure as was hoping someone could hack something, I am not worthy of hacker but merely molest a few lines sometimes
You already have the code in /opt/vc/src/hello_pi/hello_fft and was thinking it might quite easy to grab a chunk of that math VPU voodoo and just drop it into the speexdsp fft voodoo.

Much is already written http://www.aholme.co.uk/GPU_FFT/Main.htm
I will attempt a molestation but don’t hold any hopes.
I will do my usual and collect some stuff .
http://www.peteronion.org.uk/FFT/FastFourier.html
It might well be a fruitless effort as the sample array I guess would be the tail size and we are at the smaller end.
https://community.arm.com/developer/tools-software/graphics/b/blog/posts/optimizing-fast-fourier-transformation-on-arm-mali-gpus
I do understand why clock drift kills the analysis thankfully now and thanks to peteronion for that.
Also we might see perf gain by going 64bit and have almost got someone to hack the driver code so it will run on Ubuntu20.04.
That is going to be my main approach as mild OC and 64bit might give 30% easily, especially on the Pi4.
Currently and we should call it EA Echo Attenuation and its excellent, that in conjunction with VAD there is a working model here, but thinking it can be improved greatly.

romkabouter · April 25, 2020, 7:03am

haha, I always use the exact same wave file as test

Good post, I am not good at compile parameter either.

maxbachmann · April 25, 2020, 11:53am

Yeah me neither
Simply had a quick look at the arm docs https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

rolyan_trauts · April 25, 2020, 2:52pm

Done a bit of research even got a reply back from Andrew Holme which was very gracious of him.
gpu_fft was PiI days and shouldn’t return more than Neon routines so that answers that.

It really doesn’t matter because the echo attenuation levels are excellent that with VAD processing you should clearly be able to identify voice activity easily above voice that may be in the echo stream.

I have just been told repeatably software AEC on the PI just doesn’t work and is high load and that is just untrue.
I can not remember if VAD is the RMS of a voice bandpass or its some sort of LMS, but because we have extremely good echo attenuation without the need for DSP hardware there is a plethora of methods you could use to duck/cork playing media on voice detection.

The cost of multiple units to create satellite systems was a huge barrier to me, in fact prob many as mic systems alone costing the price of a standard size Google home is a reality cul-de-sac.
Its just a shame Raspberry and Broadcom have continued with VC4 as the above would probably be very different if it was a more modern Mali Gpu.
It doesn’t matter though as with a Pi3A+ $10 soundcard you can complete working media capable voice AI for about $50 - 70 which was the price of the mic alone.

@maxbachmann has sent me enough info where my OCD will continue to try a few more optimisations.

I have been on the Respeaker forum and github asking kindly for a kernel 5.4 driver and Ubuntu install as currently the drivers firmly fix you to Raspbian and also a fixed kernel version.

I think now we have a new and shiny Ubuntu 20.04LTS the substantial performance improvements for a Pi4 will be no harm for a Pi3 even if much less noticeable.
Going to 64bit on a Pi4 might get 20%+ perf improvement with FFT, the Pi is strong with 64bit Ubuntu so if anyone else fancies making a request on github please do so.
I have a feeling Respeaker have upstreamed to kernel 5.5 hence the replies I got.
The lib api naming structure has changes as we have moved through kernels and with a smattering of knowledge and a bit of trial and error it should be possible to have a 4.19 & 5.4 branch of the driver as its going to be a long time before the next LTS.

Both the Pi3 & Pi4 OC well with the Pi4 being an OC monster get yourself an armour case put a 30-40mm 12v fan on the 5v and you can have a 1.8Ghz safe stable clock, that is still silent. The speed and noise level of 12v on 5v works out just about perfect without extra cost and hassle of control.

I have been really surprised to the lack of any audio pre-processing tools and a standard hasn’t been created so that models are recorded against it.
These are machines we are talking to and not your mum so why we are processing to give crystal clear human parameters is confusing to me, but hey.

I think its likely we might get a 5.4 kernel driver relatively soon especially if github and the respeaker forum gets requests for such

Thanks to so simple info from @maxbachmann I am going to play with some FFT libs just to see if that avenue is a cul-de-sac, but there is still webrtc which is supposed to be a better implementation.
I do have reservations about what Freedesktop did with the drift compensation additions but really need to test it and will get to that eventually.

Speexdsp alsa-plugins are not in the raspbian distro because alsa-plugins requires a later version of the included speexdsp.
You can compile and alsa-plugin Speexdsp EC should be as easy as creating an asound.conf which I have tried and unlike the working EC based on speexdsp the alsa-plugins strangely don’t seem to work.

For the working version of EC I posted an install guide here.

voice · May 2, 2020, 12:34am

Hello. I’m one of those who bought the ps3 eye because of a popular Snips review and other recommendations. On top of that, the Respeaker v2 (4 mics) is limited to 16Khz audio playback (for the AEC to work) and it is too expensive.

Why do you say it is a bad choice? My experience with it is that having 4 speakers in a mic array is very interesting. I’m fine tuning a pulseaudio configuration that can do echo cancellation, noise reduction, beamforming and voice activation is software. I know people usually dislike pulseaudio, but it is currently a very powerful software solution. So my RPi4 wastes about 10% of the CPU all time while in Precise wakeword. My setup isn’t perfect yet (Microft Precise wakeword takes too many tries to kick in), but when you listen to the recordings of audio files, you see that pulseaudio is doing wonders. Really.

So, my setup is the following: 1) Create a filter for pulseaudio with echo cancellation and add an increase in volume around 350%; 2) Set that as the default source for pulseaudio; 3) Set alsa to use pulse for recording; 4) Use the arecord option (it seems that there is no support for pulseaudio directly).

In order to try to squeeze more performance, I’ve installed Debian AArch64 on the RPI4. Everything that matters is working operating system wise.

My problem is still the wake word. Although the pulseaudio magic is improving the audio a lot, I wonder if it is excessive…I’m still trying to tune the parameters (use VAD? use noise reduction? Volume increase of how much? etc).

What I don’t like about those pi hats is that you have to use a non-free driver (and sometimes tools).

rolyan_trauts · May 2, 2020, 1:07am

I am exactly the same but it was my initial reading of the Mycroft introduction that got me to source my PS3eye.

The PS3eye driver is a hack and by name it gives a hint to its purpose and platform.
Dmesg will complain on each boot and alsactl will fail if you try to load or access ctl values.
Its confusing more than anything.

But the main problem with seperate cards for capture and playback are is it will run from 2 different clock sources that will drift.
The killer here is the manner of an FFT filter as it sort of creates a spectrograph by counting the intensity of a frequency band.
Any drift means the increment can drop into the wrong band and it can drastically start to get things expotentially wrong.

Pulseaudio is bloat, but I have no objection and the code of webrtc_audio_processing is supposedly vastly superior to speex.
I can not run pulseaudio with my respeaker 2mic and so want to but the driver service crashes and why the hell does a driver have a service is beyond me.
The difference between AEC with a PS3eye + webrtc and Respeaker 2mic + speex has no comparison its vastly better with the Respeaker 2mic + speex and this is whilst as code its supposedly inferior.

Webrtc-audio-processing does have drift compensation but its relatively sucksville against the worst scenario of USB & I2S sources for playback and capture.
Its not just the ps3eye, running USB & I2S of the inbuilt is just about the worst method that generally will kill any EC code due to huge clock drifts.

I so wish to get pulseaudio to run with the respeaker 2 mic as I am itching to see how webrtc performs when its not hamstrung via the worst linux audio drift scenario of any that are exascerbated by the relative slow clocks of the SoCs we use.

There are some other alternatives that on a Pi3 it is possible to use 2 channels for I2S for mic and 2 channels I2S for DAC.
With a search on aliexpress you can find very cheap modules for the Pi3 and with a google you could prob get it runing quite easy as they share the same I2S clock.
Shame it supposedly doesn’t work on a Pi4.

That made me think hold on whats USB like when its on the same clock and I am testing that at this very moment or would be as found a cheap USB sound card that has the rarity of a stereo ADC.
So I am just testing how a USB sound card is to clock drift and EC as it negates the Respeaker I2S drivers and so does DIY I2S.
Passive mics on a soundcard might be less than stella but if the clock drift isn’t a problem then use powered mic modules and wire them up yourselves.

From low cost electric to ones with gpio controlled hardware gain and ALC to mems could all produce excellent results.

If you can get the I2S method running on a PI3 then I would hazard a guess clock drift is not a problem.
That just made me think that all-in-one usb of any kind might not be all that bad but just don’t recommend seperate playback & capture of any hardware that has multiple and different clock sources.

For voice AI not to be able to ‘barge in’ via any media play from a new stream to media is a huge problem for many.
Its untrue that software AEC does not work on a Pi, you need a Pi3 or above but it definately works as long as you don’t kill it with clock drift.

Its not just the PS3eye its recomending seperate capture & playback hardware isn’t a good idea and couple that with the other problems of the PS3eye why did someone recommend what for many Voice AI situations is essentially junk.

Adafruit & Sparkfun do all the above and also have a wealth of info and support.

The Enermax AP001E DreamBass USB Soundcard & Syba Sd-aud20101 have stereo ADC and a thing on aliexpress often called a S1 and looks like this is also supposedly stereo on mic input.

But still have to verify that but ebay/amazon all seem to sell them.
Stereo ADC and 2 mics can make a big leap in sensitivity and it is possible to also do software DoA (Direction of arrival)
You can even have mics in paralel on a channel but overall increases in mic and channels less returns as above 4 the simple physics of geometry and the speed of sound come into play and packing more in a small space doesn’t mean the DSP is better unless its faster to the increase in mic qty and reduction in gap spacing.

I also hate the 2mic drivers as they are as flakey as hell, but they might be all we have as the alternatives in respect to EC don’t work well enough to consider it to work.
Also you need the minium process power of somewhere around a Pi3.

Audio and voice has much interest and as SoC and the CPUs we use for AI gets stronger software methods will become much more of a valid alternative with hardware DSP getting ever cheaper and choice is more of a consideration.
I haven’t seen anyone yet with a multicore X86 monster and Turing GPU and hardware dsp as a master but there is a whole range of solutions and the satelite/master hireachy is an extremely efficient and effective model.

If there are any C/Python gurus out there please have a look at the pulseaudio and alsaplugin implementations as they are not that great and alsa-plugin webrtc_audio_processing would be really great.

rolyan_trauts · May 13, 2020, 1:40pm

So I got the enermax USB and it works will dc noise is lower than the respeaker but the dreambass so its called is too much for me.
It took some time as my 1st mic I ordered without much attention was a 4 pole, I do have some powered mic modules now but impressed with how it handles a cheap passive mic.
Bass not a major thing as thinking if I add approx a 1uF dc blocking cap to the amp input it will attenuate that bass a bit.

I also got a syba SD-Aud20101 and be aware as packaging and label all says so but linux reports this to be a c-media chip mono adc and not the via chip expected.

So with speexdsp aec and alsa here is the test.

Brilliant I thought as no horrid respeaker drivers, stereo ADC and some pretty damn good AEC results.
So off to pulseaudio and webrtc.

May 13 14:14:30 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Doing resync
May 13 14:14:30 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (37856), drop
May 13 14:14:49 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Doing resync
May 13 14:14:49 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (131103), dro
May 13 14:14:49 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Doing resync
May 13 14:14:49 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (75977), drop
May 13 14:14:50 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (62480), drop
May 13 14:14:51 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (13404), drop
May 13 14:14:52 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (10231), drop
May 13 14:14:53 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (9131), drop
May 13 14:14:54 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (9136), drop
May 13 14:14:55 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (9651), drop
May 13 14:14:56 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (9915), drop
May 13 14:14:57 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (9814), drop
May 13 14:14:58 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback too far ahead (10011), drop
May 13 14:15:00 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback after capture (-580), drop
May 13 14:15:03 raspberrypi pulseaudio[608]: E: [alsa-source-USB Audio] module-echo-cancel.c: Playback after capture (-158), drop
May 13 14:15:20 raspberrypi pulseaudio[608]: E: [alsa-sink-USB Audio] alsa-sink.c: ALSA woke us up to write new data to the device
May 13 14:15:20 raspberrypi pulseaudio[608]: E: [alsa-sink-USB Audio] alsa-sink.c: Most likely this is a bug in the ALSA driver 's
May 13 14:15:20 raspberrypi pulseaudio[608]: E: [alsa-sink-USB Audio] alsa-sink.c: We were woken up with POLLOUT set -- however a

We still get resync messages and to be honest if someone else wants to find out why with PA please do but listen to this.

I have no idea why its so bad and now to check my ps3eye as sure even that produced better webrtc results.
But to be honest not that bothered as without the driver problems of the respeaker 2mic my £6 + p&p usb sound card with speex aec and alsa is working extremely well.

I can could never get the latency of the ec as it always errors but the latency of the hardware is much lower and actually I had the delay set to 75.

pi@raspberrypi:~ $ alsabat -Dplughw:Dongle --roundtriplatency
alsa-utils version 1.1.8


Start round trip latency
Entering playback thread (ALSA).
Set period size: 45  buffer size: 90
Get period size: 45  buffer size: 90
Playing generated audio sine wave
Entering capture thread (ALSA).
Set period size: 45  buffer size: 90
Get period size: 45  buffer size: 90
Recording ...
Test1, round trip latency 10ms
Test2, round trip latency 10ms
Test3, round trip latency 10ms
Test4, round trip latency 10ms
Test5, round trip latency 11ms
Final round trip latency: 10ms
Playback completed.
Capture completed.

Return value is 0

fastjack · May 13, 2020, 3:37pm

Interesting! Can you provide links to the hardware you are using?

rolyan_trauts · May 13, 2020, 3:56pm

https://www.scan.co.uk/products/enermax-ap001e-dreambass-usb-soundcard-plus-earphones-genie-with-integrated-80-hz-plus6-db-bass-boos

The syba SD-AUD20101 should also be the same and not a c-media chip.
There is that s1 thing I still have to check but the AEC isn’t bad.

I have been meaning to chat to you fastjack about a CNN KWS as wondered what your opinion would be.

But it seems any single clock source playback/capture card works OK with speexdsp in fact just gave it a go with 10ms delay and yeah it was a bit better than the above.
Prob any usb soundcard will work.
But the via chip in the above is actually a pretty good chip just a bit old now.

Also with some tests seems to work better with card AGC turned off.
Think the static volumes helps the AEC as prob the waveform is bouncing in apllitude but AGC does make another noticeable difference.
It would prob be better to use software AGC after the input and AEC process if AGC is to be used.

I still don’t know why the pulseaudio AEC produced such bad results as finally got away from the respeaker drivers but results are woefull.

Going to give it another go and see if it can be improved.

fastjack · May 13, 2020, 8:04pm

I’d be happy to discuss KWS.

I think the wakeword is indeed the last piece of the puzzle. The ASR (Kaldi, Deepspeech), NLU (Rhasspy, Snips, Rasa) and TTS (PicoTTS, Mbrola, MarieTTS) all have good open source solutions.

For KWS:

Precise seems hard to train (needs so much dataset…).
Porcupine does not work well for non English languages, needs recurrent third party renewal of the model and is closed source.
Snowboy is shutting down and is closed source.

Can you create a new topic to discuss what you have in mind?

Cheers

rolyan_trauts · May 14, 2020, 3:38pm

The not Via of the Syba I tested also and can only say its prob just a cheap C-media inside but its not the Via on the label.
Capture is less sensitive but seems pretty reasonable but will be just mono input.

rolyan_trauts · May 27, 2020, 7:06am

@fastjack

PS a write up on the 4mic clones that are doing the rounds hate those reaspeaker drivers though.