Voiceengine ec with stereo playback support

sanebow · May 13, 2021, 2:44pm

The voiceengine ec is a very nice echo cancellation choice on Pi.

I noticed that it only support mono playback while the speexdsp echo canceller it depends on actually support multichannel echo cancellation. So I modified it a bit and made a fork. I also fixed some bugs and add more controls:

I made some changes to make it more controllable.

Add -p. Can now specify number of playback channels, so we can play stereo audio.
Add -l. Can set frame length (in ms). By default it is 10 ms, but speex document recommand 20 ms.
Change the filter length option -f to ms. Original is frame count.
Change the delay option -d to ms. Original is frame count (could be a bug as in the doc says it’s ms).

Usage:
 ec [options]
Options:
 -i PCM            playback PCM (default)
 -o PCM            capture PCM (default)
 -r rate           sample rate (16000)
 -c channels       recording channels (2)
 -p channels       playback channels (1)
 -b size           buffer size (262144)
 -d delay          system delay in ms between playback and capture (0)
 -f filter_len_ms  AEC filter length in ms (256)
 -l frame_len_ms   frame length in ms (10)
 -s                save audio to /tmp/playback.raw, /tmp/recording.raw and /tmp/out.raw
 -D                daemonize
 -h                display this help text

If anyone is using the voiceengine ec you may try this and feedback.

rolyan_trauts · May 13, 2021, 9:08pm

Its been a while since I looked but once you get it going really it does do a really good job of attenuation at quite high levels of echo.
Suffers with high energy such as bass but all seem to do.

Pulse audio webrtc I thought was completely broke but it does work and does cancellation but only to quite a low threshold and then fails.

I seem to remember in the sub folders there is a utility to test the delay parameter and I think its frames.
It took me some serious head scratching how to use it and still wasn’t if it should be start of frame or center.

github.com

voice-engine/ec/blob/master/util/get_delay.py


import sys
import wave
import numpy as np


if len(sys.argv) != 3:
    print('Usage: {} near.wav far.wav'.format(sys.argv[0]))
    sys.exit(1)


near = wave.open(sys.argv[1], 'rb')
far = wave.open(sys.argv[2], 'rb')
rate = near.getframerate()

channels = near.getnchannels()

N = rate

This file has been truncated. show original

Also the code and again my memory but seemed to think the frame & tail should be ^2 as with FFT its optimum the Speex docs do mention this.
So rather than a division maybe choice in steps of ^2

Also I always had to pipe through a loopback as unless you continously record EC halts and you have to restart but always used simple audio hardware that was minus a hardware loopback.

Its a great little utility that can have a huge effect on the ability to ‘barge-in’ and is underused because its far from easy to set up.

I think its the frames length which is set by ms which is dependent on SR but really with FFT should be a power of 2.

Also again on Raspberry and debian strangely Speexdsp even after all this time is a RC and usually update https://github.com/StuartIanNaylor/Alsa-plugins-speex-update

Also I did wonder if speex could be recompiled and optimised for 64bit and other FFT libs but my efforts failed the make file seems quite unforgiving and above my safety zone.

That util for the delay could be made a little more friendly as I did work it out eventually and you have to convert from frames to ms (it may of even been samples).
Setup as well as prob could be automated as you say it really is a nice echo cancellation on the Pi that runs well with reasonably low load even for a Pi3A+

sanebow · May 14, 2021, 1:26am

Yes I also notice that in speex documentation. Right now I set frame to 16ms so it’s ^2. But for tail I test with different values and it seems there’s not much difference whether it’s ^2 or not.

Also tried the get_delay.py before but it seems the correlation result is not very accurate. Currently I use the -s of ec to dump playback and recording into audacity and do measures myself.

rolyan_trauts · May 14, 2021, 1:49am

I never did check how accurate as took long enough to work it out and just went for that as the delay if approx right seems to make little difference.

I don’t think the filter length (tail) matter too much in terms of power ^2 just getting the balance right of size.
The frame_len I think should be ^2 and I think my old ears did think that 128 or 256 sounded optimal but it was last year and would have to look again.