Alright, thank you so much for explaining.
I think I’ll buy a ReSpeaker Mic Array v2.0 instead as it actually has EC and noise suppression built-in. Hopefully it will filter out loud music and background noise well.
A unidirectional electret mic has an element of natural AEC as well as directionality that with the addition of AEC probably with the hardware we have makes for the best MIC for a builder unless your going to spend a lot of $ on a DSP high end mic.
A USB cheap sound card is a few $ and with a MAX9814 mic module provides a double dose of AGC that can give really good far results on a budget.
I will see if I can get @JGKK to comment as always chanting this mantra but maybe he will give a honest review on a recent test he has done.
It would be awesome if a MAX9814 microphone with the EC program could be almost as good as a ReSpeaker Mic Array v2.0. Would love to try it out sometime to see.
I found this guide which seems very helpful for beginners.
The Respeaker Mic Array is not that great from other reviews, I have a Anker Powerconf which is a Xmos chip above the Respeaker and its OK but like all it fails when noise=near & voice=far.
noise=near & voice=far happens far often than you would expect in a domestic situation such as hifi, radio and TV to even the washing machine and is very dependent on the placement of a single device.
AEC can cope with the noise it plays but 3rd party noise has no input and is a problem for all.
Its better to have multiple devices and use the stream with the best KW hit confidence as with correct placement it is likely to have more chance of getting nearer to voice=near & noise=far.
Just a shame rhasspy doesn’t support this or multiple instances of KWS mics in a single device as the unidirectional can face out at different angles.
https://www.scan.co.uk/products/enermax-ap001e-dreambass-usb-soundcard-plus-earphones-genie-with-integrated-80-hz-plus6-db-bass-boos
Is prob the best usb sound card on the market for price as S243LE cards at 48kHz tend to be prosumer studio prices but £7 from scan.
Its stereo and you can phantom power electret mics with it.
So you can add another if rhasspy does ever do miltiple kws instance.
Mono mic cards I will give examples from raspberry shops but you can find them cheaper on ebay.
Or the CM108 module seems to be popular.
unidirectional electrets with good sensitivity need some sourcing but
Which is me as other you end up buying x25 from aliexpress as those seemed to have really good sensitivity.
Max9814 the ones with the onboard ldo regulator seem to be better for SNR
They end up about the same price as a 2mic or approx half the 2mic raspberry codec zero HAT, but generally I think they are better.
https://speechbrain.github.io/ is a speech toolkit that comes with beamforming algs its currently going through a stage beta / bug testing and should go public soon.
I have not looked enough to test but prob will as if it will run then multi-mic does become valid.
Hey folks,
Did anyone try AEC on Respeaker Core v2? They supply librespeaker with DSP algorithms from 3rd-party provider Alango. Specifically, there’s a VepAecBeamformingNode which should do AEC automatically (according to docs). But I’m wondering how? It’s not really clear how to supply an output audio stream to this library to be able to correctly apply AEC. When I tried to play a random music file while active listening, it didn’t seem like AEC works at all. As it’s a C++ library, I guess music should be played from code and somehow plugged into librespeaker node chain. Note that Alango delegates tech support to Seeed Studio and Seeed Studio doesn’t reply for a couple of years neither on GitHub nor on their forum / email.
P.S. I already have a code that successfully does WWD, NS, AGC, DOA and BF. Not perfect, but good enough to use this board for common smart home tasks. The only important thing left is the acoustic echo canceller. Any thoughts/suggestions would be greatly appreciated.
I think the respeaker blurb if not is very close to snakeoil as yeah the Alango EC wasn’t free and you just have an API for it.
https://github.com/voice-engine/ec works pretty well in fact it seems the only EC that works on a PI level hardware as the pulseaudio webrtcec only works on really low levels and then fails.
Its based on SpeexDSP you have to have audio in/out on the same card and a bit of a hack with piping through a loopback adapter but the Respeaker Core does that not have hardware one?
If not you can just modprobe a kernel one.
On raspi os speexdsp & speex are missing from asound2-plugins as for some reason its an old version.
But instructions here.
I did some simple one-liner helper scripts here
Its echo attenuation but works well enough that barge-in should work.
PS that Respeaker Core v2 is some where between a Zero & 3 in oomph?
If your doing something yourself have a look at Googles state of the art KWS for tensorflow-lite
https://github.com/StuartIanNaylor/google-kws runs about 70% on a Zero so should work well on a Core2 but is heavily optimized for 64bit and a Pi3 will likely greatly outperform running TFL on Aarch64
Without beamforming algs the mic array on that board is a whole lot of pointless even when you sum its likely to be detrimental as the will form 1st order high pass filters based on the distance apart.
You prob should use a single channel.
I don’t think its got the Ooomf to run ODAS.
Thanks for the extensive reply!
According to specs, Respeaker Core v2 has 8-channels ADCs for 6-microphone array and 2 loopbacks (hardware loopback).
The main problem I see with an external EC is the actual integration to my code. librespeaker already does beamforming, wake word detection, noise suppression, automatic gain control, etc for me. I can’t simply run an external service that does AEC separately. It should be tightly coupled to the existing chain of operations.
As far as I understood, the above EC writes the output to the file. And librespeaker works with mics directly. Well, technically it can read a 8-channels wav-file as an input. So theoretically, if the above EC had top processing priority, it could do AEC and save the output to wav, which could be used as an input for librespeaker in the next step. However, I’m not really sure about the real-time performance of such an approach, as it would lead to constant I/O operations against SD Card. Ideally, this EC code should behave as a middleware w/o saving intermediate results to the file. But I’m not sure if it’s even technically possible.
The files are not files they are really buffers in /tmp that it uses.
Also not sure you do have beamforming working unless you have got ODAS working, maybe it was just me but from memory the API is there but it misses the Alango libs so can not work. (no libs for the API to link to)
If you check the forum link of Software EC it creates a virtual source based on the file fifo where what is played is subtracted from the mic input.
So you use that as your mic input so when installed EC works in the background and for config you just select that PCM.
@sskorol I had a quick stalk and you are defo no dummy and maybe your the man to implement this?
Also not sure you do have beamforming working unless you have got ODAS working, maybe it was just me but from memory the API is there but it misses the Alango libs so can not work. (no libs for the API to link to)
I’m not using ODAS lib. And in terms of Alango libs: I talked to Alango support and they told me that Seeed Studio is their partner and built a custom framework around Alango VEP package. As far as I understood, librespeaker is exactly what they call “framework”. And yes, it’s not bundled and applied by default on Respeaker Core v2. You have to install librespeaker-dev deb package which includes corresponding headers so that you can use them in your C++ code the way it’s described in docs.
In general, I like how KWS, NS and AGC works (based on VepAecBeamformingNode). Beamforming seems to be working where we have a single source input (it can be partially tracked by their DOA manager, which is a part of Alango as well). It gives a quite accurate direction of a detected wake word. On the other hand, it doesn’t seem to correctly focus on a required direction when we have an additional input source e.g. TV. Ideally, it should eliminate other sources when the wake word is detected and its direction is well-known.
P.S. I asked Alango for the required details, but they provide tech support only for commercial projects. Moreover, as Seeed Studio has built its own framework, it’s not the responsibility of Alango anymore. Unfortunately, it seems like a dead-end, assuming the fact that Seeed Studio representatives don’t reply at all.
You got further than me as the dead-end of info halted my course.
This VepAecBeamformingNode provides beamforming, AEC, NR and DOA algorithms from Alango.
I could get no info on the algs from Alango, but reading again it does say provides.
I am not sure if there is more info now on the respeaker site or it makes more sense now my knowledge has evolved.
I haven’t got a sbc to test and think my 4 mic should work but its gathering dust somewhere.
If you are sure you got beamforming working maybe :-
int ref_channel_index should be mono audio out channel that aec uses as for beamforming a ref channel is not needed.
If you enable bool enable_wav_log does that give you a log to work on?
Thanks for the link. Will check it later. However, according to installation section, it seems more like a server-side software. At least when I see keywords like CUDA, it wasn’t meant to be designed for hardware like Respeaker. I tried Vosk ASR toolkit which is based on Kaldi. It does MFCC and other stuff as well. But I wasn’t impressed with CPU performance at all. At least on RPi hardware it works very slow. So I’ve chosen NVIDIA Jetson boards that allow building and using all those math/ML stuff with GPU support.
Yes, it saves wav-files for each channel. However, I haven’t yet tried it in the context of AEC processing. Will enable log, play some audio and check what’s going on there maybe later today. Thanks for the tips.
With the respeaker librespeaker I think you are trailblazing as far as I know you are the 1st to get the beamforming working.
You might find https://speechbrain.github.io/ of interest if on Jetson as pytorch audio seems vendor locked to either intel mkl or nvidia ones.
It does do various beamforming dunno about load.
In case anyone is interested, i have speaker cancelation running on PulseAudio. I have this on a plain Ubuntu, no Docker, but should be the same if you run any Linux PulseAudio host.
Add this:
load-module module-echo-cancel use_master_format=true aec_method=webrtc rate=48000 source_name=echoCancel_source sin
set-default-source echoCancel_source
set-default-sink echoCancel_sink
to your pulse config (usually /etc/pulse/config.pa - you may have another in ~/.config/pulse). This sets AEC as default on boot. use_master_format=true preserves the speaker setup (e.g. multi channel).
Additionally, you need to ensure that rhasspy starts after pulse. In case you use systemd, use this in your service config file:
After=syslog.target network.target pulseaudio.service
In case you have a multi channel speaker setup (>2), you may also want to disable stereo upmix for stereo sources not beeing mirrored to the rear speakers. Usually, this is done via
enable-remixing = no
in your pulse audio daemon.conf (e.g. /etc/pulse/daemon.conf), but this will not work if your mic is multi channel (like PS3 Eye). In this case, you can use:
remixing-use-all-sink-channels = no
Problem with PulseAudio Webrtc_AEC is that it will setup but when you test the output it fails at even moderate levels of echo on a Pi.
I am not sure if a Pi3/4 has the clock speed or maybe data length as you can install it but the results are very poor.
It actually does cancellation rather than attenuation but when the echo level hits a threshold (quite low) it totally fails and all echo enters the stream.
If you have an echo/home type arrangement with speaker/mic in close proximity last time I ran it, it was near useless as the threshold was hit most of the time and it did nothing.
Maybe we have had an update in raspbian now that pulse audio is the default on the desktop.
I never really got to the source of the problem but in the pulse audio code it assigned latency by platform type of Desktop, Mac & Android or something as it was quite a while ago, but latencies where hardcoded by platform.
I wondered if it was a typical thing when Arm is running full linux and gets lumped in as a X86 desktop.
Easiest way is to test and post a raw input vs resultant AEC result as do you have an example of it working?
Might of been fixed but think we are still on the same version that we we over a year ago.
I have to admit, this setup is running on a Celeron Processor N4100, not a PI. For validation, i simply checked the difference between the mic activation levels (raw and AEC) and it is pretty clearly working… Wake word false positive rate after 4-6h series/movie watching was zero. (at least, not due to speaker sound )
If you are interested in examples from the x86, i can provide some.
On a sidenote: i didnt test speedx, which is supposedly comsuning less resources.
No as your input is welcome but found it to work on x86 seemed to do with my desktop.
Its just on a Pi where it does actually work but on only really low levels of ‘echo’ in the mic stream.
AEC only cancels what you play it doesn’t handle 3rd party noise, presume you watched on device?
Yes, you are right - i supressed a known source (watched on same device). The docs are not 100% clear on that, but it seems like the pulse audio module is also supposed to do noise cancelation. However, i wouldnt assume it to be able to supress unkown, load, voice sources. I don’t see how such a module would be able to differentiate between “human” and “tv” voice.
Maybe microphone beamforming is an option for you?
Yeah the noise cancellation is a bit pants apart from newer techs like nvidia voice as they leave audio artifacts that can reduce recognition.
The Arch docs are prob best for PA AEC.
I have been playing with the new TFL state-of-art KWS models and with a custom dataset of my voice the crrn-state model does an amazing job of just picking up my voice, just haven’t got enough words of mine in ‘unknown’