Speaker Cancellation

rolyan_trauts · February 19, 2021, 11:34pm

You are missing card 0 from your cut and paste but presume thats the loopback.
But that looks like the respeaker is not installed or installed but not found.

PS is this docker or just host install?

AlmostSerious · February 20, 2021, 10:58am

If Im not mistaken, you are using the normal 4mic Respeaker Board that doenst have an audio out. So you will not find it in your aplay.
In this case, your sound output is probably routed through the 3.5mm from the PI itself.

solid · February 20, 2021, 11:19am

Yeah, the respeaker is not listed with aplay -l. Does that mean I cannot use the echo cancelling program on my mic?
It’s Raspbian installed with the Rhasspy .deb, so I guess host?

rolyan_trauts · February 20, 2021, 4:49pm

Yeah it uses the clock sync on in and out to match the AEC, AEC works only with cards or USB that both play and record.

I sort of hate that 4 mic I have one in a box and doh it was late at night as always going on about how they are whole load of nothing with a pixel ring.
Nice Pixel ring but otherwise pretty useless.

Have a word with @JGKK as think he now also about a USB dongle or the https://www.raspberrypi.org/products/iqaudio-codec-zero/ looks pretty amazing if it ever gets stocked.

Without high speed DSP beamformingn, algs multiple omnidirectional mics are useless as even when summed they can create 1st order filters and even can be even detrimental than just pointless.
The are just more of the same so you have a mc and a pixel ring and the 4 mic is just one of those products they make cause they can because they sell even if they know they are pretty pointless.

solid · February 20, 2021, 5:08pm

Alright, thank you so much for explaining.
I think I’ll buy a ReSpeaker Mic Array v2.0 instead as it actually has EC and noise suppression built-in. Hopefully it will filter out loud music and background noise well.

rolyan_trauts · February 20, 2021, 5:10pm

A unidirectional electret mic has an element of natural AEC as well as directionality that with the addition of AEC probably with the hardware we have makes for the best MIC for a builder unless your going to spend a lot of $ on a DSP high end mic.

A USB cheap sound card is a few $ and with a MAX9814 mic module provides a double dose of AGC that can give really good far results on a budget.
I will see if I can get @JGKK to comment as always chanting this mantra but maybe he will give a honest review on a recent test he has done.

solid · February 20, 2021, 5:36pm

It would be awesome if a MAX9814 microphone with the EC program could be almost as good as a ReSpeaker Mic Array v2.0. Would love to try it out sometime to see.
I found this guide which seems very helpful for beginners.

rolyan_trauts · February 20, 2021, 6:11pm

The Respeaker Mic Array is not that great from other reviews, I have a Anker Powerconf which is a Xmos chip above the Respeaker and its OK but like all it fails when noise=near & voice=far.
noise=near & voice=far happens far often than you would expect in a domestic situation such as hifi, radio and TV to even the washing machine and is very dependent on the placement of a single device.
AEC can cope with the noise it plays but 3rd party noise has no input and is a problem for all.

Its better to have multiple devices and use the stream with the best KW hit confidence as with correct placement it is likely to have more chance of getting nearer to voice=near & noise=far.

Just a shame rhasspy doesn’t support this or multiple instances of KWS mics in a single device as the unidirectional can face out at different angles.

https://www.scan.co.uk/products/enermax-ap001e-dreambass-usb-soundcard-plus-earphones-genie-with-integrated-80-hz-plus6-db-bass-boos
Is prob the best usb sound card on the market for price as S243LE cards at 48kHz tend to be prosumer studio prices but £7 from scan.
Its stereo and you can phantom power electret mics with it.
So you can add another if rhasspy does ever do miltiple kws instance.

Mono mic cards I will give examples from raspberry shops but you can find them cheaper on ebay.

Or the CM108 module seems to be popular.

unidirectional electrets with good sensitivity need some sourcing but

Which is me as other you end up buying x25 from aliexpress as those seemed to have really good sensitivity.

Max9814 the ones with the onboard ldo regulator seem to be better for SNR

They end up about the same price as a 2mic or approx half the 2mic raspberry codec zero HAT, but generally I think they are better.

https://speechbrain.github.io/ is a speech toolkit that comes with beamforming algs its currently going through a stage beta / bug testing and should go public soon.
I have not looked enough to test but prob will as if it will run then multi-mic does become valid.

sskorol · March 21, 2021, 10:45pm

Hey folks,

Did anyone try AEC on Respeaker Core v2? They supply librespeaker with DSP algorithms from 3rd-party provider Alango. Specifically, there’s a VepAecBeamformingNode which should do AEC automatically (according to docs). But I’m wondering how? It’s not really clear how to supply an output audio stream to this library to be able to correctly apply AEC. When I tried to play a random music file while active listening, it didn’t seem like AEC works at all. As it’s a C++ library, I guess music should be played from code and somehow plugged into librespeaker node chain. Note that Alango delegates tech support to Seeed Studio and Seeed Studio doesn’t reply for a couple of years neither on GitHub nor on their forum / email.

P.S. I already have a code that successfully does WWD, NS, AGC, DOA and BF. Not perfect, but good enough to use this board for common smart home tasks. The only important thing left is the acoustic echo canceller. Any thoughts/suggestions would be greatly appreciated.

rolyan_trauts · March 22, 2021, 5:43am

I think the respeaker blurb if not is very close to snakeoil as yeah the Alango EC wasn’t free and you just have an API for it.
https://github.com/voice-engine/ec works pretty well in fact it seems the only EC that works on a PI level hardware as the pulseaudio webrtcec only works on really low levels and then fails.

Its based on SpeexDSP you have to have audio in/out on the same card and a bit of a hack with piping through a loopback adapter but the Respeaker Core does that not have hardware one?
If not you can just modprobe a kernel one.
On raspi os speexdsp & speex are missing from asound2-plugins as for some reason its an old version.
But instructions here.

I did some simple one-liner helper scripts here

Its echo attenuation but works well enough that barge-in should work.

PS that Respeaker Core v2 is some where between a Zero & 3 in oomph?
If your doing something yourself have a look at Googles state of the art KWS for tensorflow-lite
https://github.com/StuartIanNaylor/google-kws runs about 70% on a Zero so should work well on a Core2 but is heavily optimized for 64bit and a Pi3 will likely greatly outperform running TFL on Aarch64

Without beamforming algs the mic array on that board is a whole lot of pointless even when you sum its likely to be detrimental as the will form 1st order high pass filters based on the distance apart.
You prob should use a single channel.
I don’t think its got the Ooomf to run ODAS.

sskorol · March 22, 2021, 7:37am

Thanks for the extensive reply!

According to specs, Respeaker Core v2 has 8-channels ADCs for 6-microphone array and 2 loopbacks (hardware loopback).

The main problem I see with an external EC is the actual integration to my code. librespeaker already does beamforming, wake word detection, noise suppression, automatic gain control, etc for me. I can’t simply run an external service that does AEC separately. It should be tightly coupled to the existing chain of operations.

As far as I understood, the above EC writes the output to the file. And librespeaker works with mics directly. Well, technically it can read a 8-channels wav-file as an input. So theoretically, if the above EC had top processing priority, it could do AEC and save the output to wav, which could be used as an input for librespeaker in the next step. However, I’m not really sure about the real-time performance of such an approach, as it would lead to constant I/O operations against SD Card. Ideally, this EC code should behave as a middleware w/o saving intermediate results to the file. But I’m not sure if it’s even technically possible.

rolyan_trauts · March 22, 2021, 7:45am

The files are not files they are really buffers in /tmp that it uses.

Also not sure you do have beamforming working unless you have got ODAS working, maybe it was just me but from memory the API is there but it misses the Alango libs so can not work. (no libs for the API to link to)
If you check the forum link of Software EC it creates a virtual source based on the file fifo where what is played is subtracted from the mic input.

So you use that as your mic input so when installed EC works in the background and for config you just select that PCM.

rolyan_trauts · March 22, 2021, 8:54am

@sskorol I had a quick stalk and you are defo no dummy and maybe your the man to implement this?

sskorol · March 22, 2021, 9:14am

Also not sure you do have beamforming working unless you have got ODAS working, maybe it was just me but from memory the API is there but it misses the Alango libs so can not work. (no libs for the API to link to)

I’m not using ODAS lib. And in terms of Alango libs: I talked to Alango support and they told me that Seeed Studio is their partner and built a custom framework around Alango VEP package. As far as I understood, librespeaker is exactly what they call “framework”. And yes, it’s not bundled and applied by default on Respeaker Core v2. You have to install librespeaker-dev deb package which includes corresponding headers so that you can use them in your C++ code the way it’s described in docs.

In general, I like how KWS, NS and AGC works (based on VepAecBeamformingNode). Beamforming seems to be working where we have a single source input (it can be partially tracked by their DOA manager, which is a part of Alango as well). It gives a quite accurate direction of a detected wake word. On the other hand, it doesn’t seem to correctly focus on a required direction when we have an additional input source e.g. TV. Ideally, it should eliminate other sources when the wake word is detected and its direction is well-known.

P.S. I asked Alango for the required details, but they provide tech support only for commercial projects. Moreover, as Seeed Studio has built its own framework, it’s not the responsibility of Alango anymore. Unfortunately, it seems like a dead-end, assuming the fact that Seeed Studio representatives don’t reply at all.

rolyan_trauts · March 22, 2021, 9:23am

You got further than me as the dead-end of info halted my course.

This VepAecBeamformingNode provides beamforming, AEC, NR and DOA algorithms from Alango.

I could get no info on the algs from Alango, but reading again it does say provides.
I am not sure if there is more info now on the respeaker site or it makes more sense now my knowledge has evolved.

I haven’t got a sbc to test and think my 4 mic should work but its gathering dust somewhere.
If you are sure you got beamforming working maybe :-
int ref_channel_index should be mono audio out channel that aec uses as for beamforming a ref channel is not needed.

If you enable bool enable_wav_log does that give you a log to work on?

sskorol · March 22, 2021, 9:46am

Thanks for the link. Will check it later. However, according to installation section, it seems more like a server-side software. At least when I see keywords like CUDA, it wasn’t meant to be designed for hardware like Respeaker. I tried Vosk ASR toolkit which is based on Kaldi. It does MFCC and other stuff as well. But I wasn’t impressed with CPU performance at all. At least on RPi hardware it works very slow. So I’ve chosen NVIDIA Jetson boards that allow building and using all those math/ML stuff with GPU support.

sskorol · March 22, 2021, 9:52am

Yes, it saves wav-files for each channel. However, I haven’t yet tried it in the context of AEC processing. Will enable log, play some audio and check what’s going on there maybe later today. Thanks for the tips.

rolyan_trauts · March 22, 2021, 9:54am

With the respeaker librespeaker I think you are trailblazing as far as I know you are the 1st to get the beamforming working.
You might find https://speechbrain.github.io/ of interest if on Jetson as pytorch audio seems vendor locked to either intel mkl or nvidia ones.
It does do various beamforming dunno about load.

TyrionWarMage · March 22, 2021, 8:26pm

In case anyone is interested, i have speaker cancelation running on PulseAudio. I have this on a plain Ubuntu, no Docker, but should be the same if you run any Linux PulseAudio host.

Add this:

load-module module-echo-cancel use_master_format=true aec_method=webrtc rate=48000 source_name=echoCancel_source sin
set-default-source echoCancel_source
set-default-sink echoCancel_sink

to your pulse config (usually /etc/pulse/config.pa - you may have another in ~/.config/pulse). This sets AEC as default on boot. use_master_format=true preserves the speaker setup (e.g. multi channel).
Additionally, you need to ensure that rhasspy starts after pulse. In case you use systemd, use this in your service config file:

After=syslog.target network.target pulseaudio.service

In case you have a multi channel speaker setup (>2), you may also want to disable stereo upmix for stereo sources not beeing mirrored to the rear speakers. Usually, this is done via

enable-remixing = no

in your pulse audio daemon.conf (e.g. /etc/pulse/daemon.conf), but this will not work if your mic is multi channel (like PS3 Eye). In this case, you can use:

remixing-use-all-sink-channels = no

rolyan_trauts · March 23, 2021, 9:39am

Problem with PulseAudio Webrtc_AEC is that it will setup but when you test the output it fails at even moderate levels of echo on a Pi.
I am not sure if a Pi3/4 has the clock speed or maybe data length as you can install it but the results are very poor.
It actually does cancellation rather than attenuation but when the echo level hits a threshold (quite low) it totally fails and all echo enters the stream.

If you have an echo/home type arrangement with speaker/mic in close proximity last time I ran it, it was near useless as the threshold was hit most of the time and it did nothing.
Maybe we have had an update in raspbian now that pulse audio is the default on the desktop.
I never really got to the source of the problem but in the pulse audio code it assigned latency by platform type of Desktop, Mac & Android or something as it was quite a while ago, but latencies where hardcoded by platform.
I wondered if it was a typical thing when Arm is running full linux and gets lumped in as a X86 desktop.

Easiest way is to test and post a raw input vs resultant AEC result as do you have an example of it working?
Might of been fixed but think we are still on the same version that we we over a year ago.