Pulseaudio beamforming rethink

rolyan_trauts · June 20, 2021, 4:34pm

I have been playing around with the pulseaudio beamformer and actually getting great results.

I think last time because because the main focus was AEC (pretty lousy on the noise threshold it fails at) and that the beamformer wasn’t steerable (whats the point in that?) that the idea for use was binned.

I think I might of been using a PS3eye which might of been the problem as the mic distances are like 10 or 20mm apart and also prob was using 16kHz SR.

Speed of sound is 343M/s so divide that by 16k gives 0.0214 or 21mm so those mics can only differentiate a single sample or at best 2 if 20mm and likely why my tests stunk.

So I have something much easier as here are 2 files a 2 mic runiing pulseaudio beamforming with a usb speaker at 45 degrees behind it and another mic to the left facing the 2mic and speaker plus me.

So the nobeam.wav should give you an idea of real levels of my voice and noise from the speaker and beamform.wav is the results you get from the pulseaudio beamformer with a 2 mic 96mm array @ 16kHz.

https://drive.google.com/open?id=1hOoNGuWBm0tlZMdFJjC3qmHVQSW0U3c9

https://drive.google.com/open?id=1rlNIv0BDxjCZKQGqXFbzVmBvfIaVKFQc

If you listen to the second you will realise hey we do have beamforming and it works really well.
I think last time the respeaker 2mic causes all sorts of probs with pulseaudio and stupidly I had it lying flat and not vertical and facing a voice actor that yeah I wrote off pulseaudio beamforming as a bad job whilst its really rather excellent.

So I have been doing a rethink as I can steer this by merely unloading the echo-cancel module and reloading with the new steering co-ordinates.

I added some rough examples in GitHub - StuartIanNaylor/g-kws: Google Streaming KWS Arm Install Guide of a KWS triggered TDOA as the biggest problem is it bounces around on any voice so fixing it on KW works really well.

PS if you have a respeaker USB there is a freeze command that you can also lock beamforming on KW hit and with a streaming KWS during.
https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/#faq

Its a shame the 2mic hats seem to fight with pulseaudio but maybe someone might come up with a solution to that but to be honest I prefer electrets just soldered to a stereo 3.5mm jack plug mounted in grommets of any enclosure.

I used just a bit of aluminium angle for testing which became shorter than intended as the split part of the cable was a tad too short but is just 2x electrets plugged direct into a stereo usb sound card.

rolyan_trauts · June 24, 2021, 9:58pm

The only thing that is bad is the short range of the Y direction as presume with a 2 mic setup you don’t get a Y geometry just X anyway.
Its not all that bad as you can run 2x inference of a single mic & beamformer and choose best stream.

I still sort of think still the best and easiest way are single uni-directional electrets on a USB as for the non DiYers they can just use a 5$ ebay mic like so.

Having a system that totally lacks support for zones and multi-mic zones is extremely limiting as distributed low cost KWS to act as wide array KWS is one of the cheapest and best to implement.

sanebow · June 28, 2021, 4:16am

Wow your result is so good. I notice WebRTC beamforming algo is much better than a lot others but the attenuation level is not so ideal as yours. Seems that hardware setup matters a lot.

How far are you and the speaker away from the mics?

rolyan_trauts · June 28, 2021, 6:39am

Its prob the unidirectional as that was part of my test a the rear holes cancel as it applies pressure on the rear of the mic diaphragm.

So you have a mixture of unidirectional mic beamforming with software as it was something I was also testing.
It was pure a quick desktop test as a usb speaker was at 45 degrees at approx 60-70mm from the array.
So being behind it got benefit from mic type and software.

I think the pulseaudio beamformer is no better than any else but pulseaudio runs the beamforming @32Khz

So speed of sound is 343M/s divide that by 32000 works out 10.1 mm per sample.
The array is 96mm so that is 9 samples for resolution as another thing I was wondering if array dimensions matter.
My test was far from empirical but pretty sure they do, so not just hardware the layout and sampling rate of hardware are relational and matter.

Its such a shame the respeaker mics are not on snapoffs that you can solder on headers and use jumper wires as also for mounting and isolation it would be a massive improvement whilst not snapped off it still retains a simple design.

I don’t think the WebRTC beamforming algo is any better than others as there are only a few types of alg avail, prob the 32khz feed of pulseaudio is higher than maybe the SR you have tried with others?
Also even though I find the code of WebRTC_Audio_Processing a bit of a mind twister I think its linear array only.

I may get round to hacking a crystal so sharing across to stereo ADCs as totally failed to find a quad ADC that runs from a singular clock without clock drift.
My eyes and hands are not great as age & MS has taken a toll and even if interested on how a x3 triangular array of directional mics would would its been on a todo list for some while.

Speechbrain have colab of various beamformers tested with a 16Khz sample SpeechBrain: Speech Processing under the Multi-microphone Beamforming section.
Its by Grondin F. & Aris W who are the university team behind ODAS.

rolyan_trauts · October 28, 2021, 12:40pm

@sanebow Haven’t heard from you for a while but finally with pytorch 1.10 pocketfft is now used and Intel MKL is no longer part of https://github.com/KumaTea/pytorch-aarch64/issues/6 build.

I haven’t tried yet but just going to give the https://github.com/KumaTea/pytorch-aarch64 wheels a go with Speechbrain and see what load and effect Google Colab has.

https://speechbrain.github.io/tutorial_processing.html