DTLN: Realtime Machine Learning based Noise Suppression / AEC on Raspberry Pi

Compliments!!! :slight_smile:

Yeah! Really cutting edge Deep Noise Suppression Challenge 5th place winner.

Just didn’t think a RTXvoice like app would run on a Pi3 so great work quantising a tflite model without the Pb.
Obviously its not RTXvoice but the quality is amazingly good compared to what we had and isn’t all that far away.

Good job packaging it for deployment.

I really need to do some tests as curious if by being able to set the latency then it can work with say higher latency outputs such as bluetooth and like such as any wireless rtp that other NS/AEC often fail with.
I will give feedback as always :slight_smile:

1 Like

I did some optimization and fixed some bugs. Also conducted some measurements with my Pi 3B+ and this is the result (note that < 8ms is required for it to work in realtime):

This is a sample before/after of the realtime AEC (with ReSpeaker 2-Mics Pi HAT):

I use 256 model for AEC, it works well in most cases but still swallow my voices sometimes. Same input 512 model can do obviously better, but Pi 3B+ cannot afford it realtime. Maybe traditional AEC + DTLN NS is the most practical choice.

1st track is 256 model on Pi, 2nd track is 512 model run on my MacBook. Highlighted is a word I voiced out, which got attenuated by 256 model but kept by 512 model.

I never did work out the training routine as it seems the code is just for NS, but presume you can greatly improve like all models by providing or supplementing with data of use.
I am not sure what language the dataset was but likely the intonation of the dataset will effect speaker voice.

It would also be interesting what a Pi4 could do especially with an easy OC to 1.8Ghz.
Pi3 you can prob got to 1.5ghz but really the factory OC is quite high and .1Ghz obviously is less than the easy .3ghz of a pi4.

Been a while and here’s some progress on this project.

A large portion of my time was spent on configuring a virtual ALSA device for AEC. Recently I did more research and make it more usable. So I created a separate repo for it with detailed explanation.

I have optimized the multiprocessing to make it faster. Now on my Pi 4 I can run the 512 model under 4ms.

With the new flexible alsa-aec parameters, I also managed to experiment AEC while setting the playback and capture device from different cards. Even with bluetooth playback (use bluealsa) and set defaults.pcm.aec.playback_pcm "bluealsa" for alsa-aec. It can work but the result is not that perfect. Still cheering as most traditional aec requires in/out from the same sound card.

1 Like

Hey @sanebow! Awesome job :+1:

I just tested PiDTLN with alsa-aec but I have some errors:

[WARNING] pyfftw is not installed, use np.fft
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Expression 'alsa_snd_pcm_sw_params( self->pcm, swParams )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2131
Expression 'PaAlsaStreamComponent_FinishConfigure( &self->capture, hwParamsCapture, inParams, self->primeBuffers, realSr, inputLatency )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2731
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
PortAudioError: Error opening Stream: Unanticipated host error [PaErrorCode -9999]: 'Invalid argument' [ALSA error -22]

Here’s my asound.conf file:

defaults.pcm.aec.capture_hw.rate 48000
defaults.pcm.aec.playback_hw.card defaults.pcm.card 1

defaults.pcm.aec.capture_hw.rate 16000
defaults.pcm.aec.capture_hw.card defaults.pcm.card 1

pcm.!default {
  type asym
  capture.pcm "hw:CARD=Device"
  playback.pcm "hw:CARD=Device"
}

I can play with aplay -D aec and record with arecord -D aec_internal -f S16_LE -r 16000 -c 2 rec.wav -V stereo.

Any pointer as to what I have done wrong would be cool :slight_smile:

@fastjack you seem to have both SR versions whilst you just need one I presume and choose between 48k or 16k

Depending on your hardware many will only use 44.1 or 48k via hw: but presume you can use plughw: so it will do auto conversion. 16k sometimes isn’t supported via hw: direct

I often use https://www.volkerschatz.com/noise/alsacap.tgz rather than hunt around cat /proc/asound/cards

:man_facepalming:

Thanks @rolyan_trauts :blush:

Nice spotting @rolyan_trauts . alsacap is a nice tool also the author has a nice article to demystify alsa which helped me a lot.

It’s quite strange that when use the multi plugin we can’t use plug to warp hardware devices for auto rate conversion, it will result in xruns. Have to use a rate to explicitly set target rate. Maybe some bug in alsa?

Remember you have a Arch with latest alsa running. At your convenience, please help to test if this bug was fixed on more recent alsa versions.

Will try but been a bit busy with some House DiY and boy the pollen count has been getting my hay fever and even more useless than normal.

I have a gut feeling that with the Multi plugin it may be no for plughw: like you say and explicitly using Rate.

Alsa can be so much fun :nerd_face:

Hey all!

Still no luck with python3 aec.py -m 128 -i aec_internal -o aec_internal

sounddevice.PortAudioError: Error opening InputStream: Unanticipated host error [PaErrorCode -9999]: 'Invalid argument' [ALSA error -22]

I can record and play correctly from/to aec_internal using ALSA arecord and aplay. Same for aec.

arecord -D aec_internal -f S16_LE -r 16000 -c 2 rec.wav -V stereo
Recording WAVE 'rec.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
             +#################### 62%|00%+    
aplay -D aec_internal -r 16000 test.wav -V mono
Playing WAVE 'test.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
#############                                    + | 97%

Any idea?

Note: I’m using Python 3.7.3 on Raspbian GNU/Linux 10 (buster)

Post your /etc/asound.conf and aplay / arecord -l and will try and have a look as seems to be just alsa.

Here you go:

asound.conf

defaults.pcm.aec.playback_hw.card "1"
defaults.pcm.aec.playback_hw.rate 48000

defaults.pcm.aec.capture_hw.card "1"
defaults.pcm.aec.capture_hw.rate 48000

aplay -l

**** List of PLAYBACK Hardware Devices ****
card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 2: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7

arecord -l

**** List of CAPTURE Hardware Devices ****
card 1: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 2: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7

I may have seen the same error when I was on my first version of the alsa config. I thought current version fixed it but obviously it didn’t. I am not sure what’s the cause of the error but you may try the following:
Comment these two lines under both playback_hw and capture_hw in alsa-aec.conf

        period_time 8000
        periods 32

If it still doesn’t work, try to replace them with this:

   period_time 0
   period_size 128     # you may change this to different values like 1024 to test
   buffer_size 4096

Finally, if all not work, try to test with a different soundcard.

Forgot to mention. Are you also running PulseAudio? If so you may stop it during testing:
systemctl --global stop pulseaudio

I’ve changed the lines in 50-aec.conf but with the changes, arecord crashes.

arecord -D aec_internal -f S16_LE -r 16000 -c 2 rec.wav -V stereo -v
Recording WAVE 'rec.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
arecord: set_params:1310: Broken configuration for this PCM: no configurations available

Without the changes, arecord works.

arecord -D aec_internal -f S16_LE -r 16000 -c 2 rec.wav -V stereo -v
Recording WAVE 'rec.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Stereo
Route conversion PCM
  Transformation table:
    0 <- 0
    1 <- 1
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 16000
  exact rate   : 16000 (16000/1)
  msbits       : 16
  buffer_size  : 4096
  period_size  : 128
  period_time  : 8000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 128
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 536870912
Slave: Multi PCM
  Channel bindings:
    0: slave 0, channel 0
    1: slave 1, channel 0
Its setup is:
  stream       : CAPTURE
  access       : MMAP_COMPLEX
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 16000
  exact rate   : 16000 (16000/1)
  msbits       : 16
  buffer_size  : 4096
  period_size  : 128
  period_time  : 8000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 128
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 536870912
Slave #0: Rate conversion PCM (48000)
Converter: libspeex (external)
Protocol version: 10002
Its setup is:
  stream       : CAPTURE
  access       : MMAP_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 1
  rate         : 16000
  exact rate   : 16000 (16000/1)
  msbits       : 16
  buffer_size  : 4096
  period_size  : 128
  period_time  : 8000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 128
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 536870912
Slave: Direct Snoop PCM
Its setup is:
  stream       : CAPTURE
  access       : MMAP_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 1
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 12288
  period_size  : 384
  period_time  : 8000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 384
  period_event : 0
  start_threshold  : 3
  stop_threshold   : 12288
  silence_threshold: 0
  silence_size : 0
  boundary     : 1610612736
Hardware PCM card 1 'USB PnP Sound Device' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : MMAP_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 1
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 12288
  period_size  : 384
  period_time  : 8000
  tstamp_mode  : ENABLE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 384
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 1610612736
  silence_threshold: 0
  silence_size : 0
  boundary     : 1610612736
  appl_ptr     : 0
  hw_ptr       : 0
Slave #1: Hardware PCM card 2 'Loopback' device 1 subdevice 4
Its setup is:
  stream       : CAPTURE
  access       : MMAP_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 1
  rate         : 16000
  exact rate   : 16000 (16000/1)
  msbits       : 16
  buffer_size  : 4096
  period_size  : 128
  period_time  : 8000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 128
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 536870912
  appl_ptr     : 0
  hw_ptr       : 0
                            +  ### 19%|00%+

Wait… I just reproduced your error as I noticed that I have a USB soundcard also named “Device”, may be same as yours. I took it out and test with your config and the same error was there. Let me do some testing on my side and will let you know if there’s a fix.

1 Like

Should the default capture card now be the loopback that is now usb->capture->aec->loopback and not the usb card?

@fastjack This error seems to be related to some unknown PortAudio bug. I now have some work around and just uploaded a new version. Please test with this latest version (v0.3):

1 Like

The fix works! :+1: Thanks!

On with the AEC testing now :slight_smile: