PS3 eye users, show your configuration!

voice · May 2, 2020, 1:04am

I’m trying to have a decent configuration for the Playstation Eye. I’m using pulseaudio filters for echo cancellation, noise reduction, voice detection, high frequency filter-out, beamforming (using the 4 mics in the array to help single out the voice from the noise).

Setting up the AEC filters:
pactl load-module module-echo-cancel use_master_format=1 aec_method='webrtc' aec_args='"analog_gain_control=0 digital_gain_control=1 noise_suppression=1 high_pass_filter=1 voice_detection=1 beamforming=1 mic_geometry=-0.03,0,0,-0.01,0,0,0.01,0,0,0.03,0,0"'

Default source (audio input) for pulseaudio
pacmd set-default-source alsa_input.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01.multichannel-input.echo-cancel

Improve the audio volume by 350% (I’m still trying to find a good value):
pactl set-source-volume alsa_input.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01.multichannel-input.echo-cancel 350%

Finally, configure Rhasspy 2.4.20 to use arecord (instead of pyaudio):
"microphone": { "system": "arecord", "arecord": { "device": "pulse", "chunk_size": 960 }

When I use parecord to get a sound file, it works very well:
parecord --channels=1 --format=s16le --rate=16000 -d alsa_input.usb-OmniVision_Technologies__Inc._USB_Camera-B4.09.24.1-01.multichannel-input.echo-cancel test.wav

My system: Raspberry Pi4, 4GB with Debian 64-bit (AArch64), original Raspberry Pi4 64-bit kernel.

Well, the wakeword part if not good (I don’t know if the STT is any better), but standalone recording seem to be good.

What are your setups? Are they working (better than mine, at least)?

rolyan_trauts · May 2, 2020, 2:19am

The webrtc AEC is really good but its debatable if the beamforming works and how without a differing DoA (Direction of Arrival)

I asked the pulse guys and they kindly replied.

Hey Stuart,
The webrtc library doesn’t implement DOA. I’m not sure how much doing steering (changing target_direction) dynamically works either. Unfortunately, the team has dropped beamforming upstream altogether, so when we next update the library, this support will be lost.

Best regards,
Arun

On Sat, 14 Mar 2020, at 4:57 AM, Stuart Naylor wrote:

Hi

Just a question but with the webrtc beamforming does it update the
target_direction?

target_direction

The target position relative to the centre of the mic array, for
beamforming. The value is a list of three numbers (a spherical point
https://en.wikipedia.org/wiki/Spherical_coordinate_system): “a,e,r”.
‘a’ is the azimuth of the target in radians. Zero radians azimuth
points to the right of the mic array, and positive angles move in a
counter-clockwise direction. ‘e’ is the elevation of the target in
radians. Zero radians elevation means that the target is on the same
level horizontally as the center of the array, and positive angles go
upwards. ‘r’ is the radius, i.e. the distance from the center of the
array (in meters).
In that Direction Of Arrival is part of webrtc and its handled
automatically or do you have to provide updates to module-echo-cancel
with “a,e,r” as always wondered how you could without loading and
unloading the module each time?

Apols but I can not find any info on this and my ability hasn’t really
made any sense of the code that it seems DOA (Direction Of Arrival)
isn’t implemented?

Many, many thanks if you can give any info?

Stuart

You can also setup as you have and create a asound.conf and all your alsa devices and commands will use pulseaudio and pulseaudio-alsa.

# Use PulseAudio by default
pcm.!default {
  type pulse
  fallback "sysdefault"
  hint {
    show on
    description "Default ALSA Output (currently PulseAudio Sound Server)"
  }
}

ctl.!default {
  type pulse
  fallback "sysdefault"
}

# vim:set ft=alsaconf:

The ArchLinux wiki as usual is a great source of info.
https://wiki.archlinux.org/index.php/PulseAudio

Also you have AGC enabled so volume should be automatically levelled digital_gain_control=1 but after that include agc_start_volume=85 as if its too low it never gets a signal to kick in AGC Possible values 0-255.
The voice_detection=1 is also another great mystery as even if enabled and it does detect it does nothing.
I have searched every where from the pulseaudio dbus and even stream name and nothing is available to actually use VAD as there isn’t a message or signal.

There is also mobile=0 which wen enabled has a different aec section of the code and may be more suited for a Pi Voice AI.
I never got that far and maybe when enabled some of the function that seems pointless then kicks in somehow.

Much I discovered by browsing the code

rolyan_trauts · May 2, 2020, 3:34am

PS also noticed that you are running Debian Aarch64 on the Pi4 and the perf gain by just switching at least on a Pi4 is considerable.
I just ran the FFT benches of http://www.roylongbottom.org.uk/Raspberry%20Pi%20Benchmarks.htm

 ###################################################

processor	: 0
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 1
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 2
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 3
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CLinux version 4.19.97-v7l+ (dom@buildbot) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1294 SMP Thu Jan 30 13:21:14 GMT 2020
 

From File /proc/version
Linux version 4.19.97-v7l+ (dom@buildbot) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1294 SMP Thu Jan 30 13:21:14 GMT 2020
 

 ###################################################

   armv8 64 Bit FFT Benchmark Version 1.0 Sat May  2 03:52:22 2020

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.095     0.093     0.093     0.097     0.095     0.095
    2     0.206     0.257     0.205     0.318     0.317     0.316
    4     0.700     0.696     0.695     0.925     0.918     0.917
    8     1.774     1.767     1.764     2.183     2.136     2.138
   16     2.469     2.490     2.454     2.830     2.755     2.832
   32     5.546     5.440     5.408     7.082     7.032     7.016
   64     9.030     8.909     8.888    33.621    34.841    34.208
  128    58.459    57.295    57.051   143.013   145.200   137.666
  256   290.174   273.022   271.806   408.030   411.823   413.427
  512   824.471   835.709   821.872  1060.246  1051.617  1049.761
 1024  1665.303  1678.618  1674.195  1988.812  1997.958  1990.857

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

               End at Sat May  2 03:52:43 2020
From File /proc/cpuinfo
processor       : 0
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2835
Revision        : b03111
Serial          : 10000000c6e9e69d
Model           : Raspberry Pi 4 Model B Rev 1.1


From File /proc/version
Linux version 5.4.0-1008-raspi (buildd@bos02-arm64-039) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu1)) #8-Ubuntu SMP Wed Apr 8 11:13:06 UTC 2020


 ###################################################

  Raspberry Pi 3 Running time > 30 seconds before results display

   armv8 64 Bit FFT Benchmark Version 1.0 Sat May  2 03:27:36 2020

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.092     0.089     0.089     0.097     0.094     0.093
    2     0.191     0.192     0.190     0.124     0.122     0.123
    4     0.312     0.310     0.310     0.327     0.327     0.327
    8     0.682     0.666     0.667     0.779     0.733     0.732
   16     1.498     1.493     1.483     1.683     1.665     1.668
   32     3.507     3.436     3.434     4.208     4.033     4.045
   64     8.079     7.934     7.883    26.427    25.807    25.763
  128    45.571    45.315    46.318    96.315    96.052    96.324
  256   184.814   164.349   164.180   209.539   219.269   218.994
  512   525.712   500.734   520.485   677.491   681.140   678.169
 1024  1349.151  1345.309  1343.874  1650.913  1643.969  1644.503

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

               End at Sat May  2 03:27:52 2020

 ###################################################

processor	: 0
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 1
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 2
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

processor	: 3
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer	: 0x41
CPU architecture: 7
CLinux version 4.19.97-v7l+ (dom@buildbot) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1294 SMP Thu Jan 30 13:21:14 GMT 2020
 

From File /proc/version
Linux version 4.19.97-v7l+ (dom@buildbot) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1294 SMP Thu Jan 30 13:21:14 GMT 2020
 

 ###################################################

   armv8 64 Bit FFT Benchmark Version 3c.0 Sat May  2 03:52:56 2020

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.184     0.111     0.109     0.042     0.039     0.039
    2     0.293     0.229     0.229     0.100     0.098     0.098
    4     0.700     0.575     0.572     0.230     0.225     0.224
    8     1.565     1.298     1.297     0.519     0.501     0.500
   16     3.248     2.892     2.940     1.241     1.160     1.189
   32     7.138     6.480     6.413     3.290     3.270     3.217
   64     6.736     6.016     5.976     9.558     9.542     9.514
  128    18.369    17.257    17.299    24.430    24.419    24.385
  256    43.173    41.501    40.405    55.989    53.840    55.412
  512    99.337    94.574    95.281   135.116   134.872   134.614
 1024   229.091   220.074   220.129   318.806   302.227   302.217

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

               End at Sat May  2 03:53:00 2020

From File /proc/cpuinfo
processor       : 0
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2835
Revision        : b03111
Serial          : 10000000c6e9e69d
Model           : Raspberry Pi 4 Model B Rev 1.1


From File /proc/version
Linux version 5.4.0-1008-raspi (buildd@bos02-arm64-039) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu1)) #8-Ubuntu SMP Wed Apr 8 11:13:06 UTC 2020


 ###################################################

  Raspberry Pi 3 Running time > 15 seconds before results display

   armv8 64 Bit FFT Benchmark Version 3c.0 Sat May  2 03:26:02 2020

  Size                     milliseconds
    K     Single Precision              Double Precision
    1     0.120     0.088     0.136     0.039     0.035     0.035
    2     0.214     0.190     0.189     0.103     0.101     0.101
    4     0.665     0.348     0.204     0.234     0.229     0.230
    8     0.503     0.448     0.449     0.530     0.513     0.512
   16     1.065     0.989     0.990     1.281     1.178     1.196
   32     2.360     2.229     2.207     3.430     3.433     3.366
   64     5.625     5.397     5.308    10.370    10.295    10.309
  128    16.170    15.531    15.535    24.314    24.370    24.452
  256    36.808    35.887    35.724    53.699    53.808    53.929
  512    77.890    75.945    76.038   120.495   122.108   121.565
 1024   175.636   171.202   171.853   271.290   263.455   265.213

        1024 Square Check Maximum Noise Average Noise
        SP   9.999520e-01  3.346482e-06  4.565234e-11
        DP   1.000000e+00  1.133294e-23  1.428110e-28

               End at Sat May  2 03:26:06 2020

The first of each is Raspbian 32 bit and then after that the move to Ubuntu64 20.04 which can hover around 20% performance gain.
For voice AI your SoC, distro and compile options could have much effect on algorithms such as AEC.
Also now some of the temperature problems have been sorted the Pi4 is a first version which is pretty safe to OC.
I have mine with a 12vdc 40mm fan on the 5v to give silent extra cooling but 1.8Ghz should be a relatively safe and stable OC.
The above was at stock clock as new flashed image.

Pi4 definately can run software EC but there is better audio hardware available at very similar cost to the PS3eye.

voice · May 3, 2020, 1:42am

I’m having a hard time to figure out good configuration parameters. I’ve placed every switch that is available, but Rhasspy (Precise) still refuses to “wake” most of the time (but not always).

pactl load-module module-echo-cancel use_master_format=1 aec_method='webrtc' \ aec_args='"analog_gain_control=0 digital_gain_control=1 noise_suppression=1 high_pass_filter=1 beamforming=1 agc_start_volume=170 mic_geometry=-0.03,0,0,-0.01,0,0,0.01,0,0,0.03,0,0 target_direction=1.5708,0,0"'

Does anyone have a good enough result with other parameters? My guess is that the sound is still too low, or too distorted, I don’t know.

PS -> Maybe the PS3 Eye is not the best option, but it is what I have now…And it seems that there are people using it with good results.

rolyan_trauts · May 3, 2020, 2:34am

The beamforming doesn’t work and the AEC with the PS3eye is minimal at best.

Open 2 ssh windows.
Paplay an example wav in one.
Payrec in another.
After complete listen to the recording.

I used it and generally especially with AEC results where terrible.
I have no idea why the Ps3eye is recommended yes it has 4 mics but apart from that its pretty awful for what you are trying to do.

You are getting what others are getting and if that is a good recomendation or not I will let you decide.

But also your command doesn’t seem to be setting the sink & source.
echoCancelEnable.sh

#!/bin/bash
aecArgs="$*"
# If no "aec_args" are passed on to the script, use this "aec_args" as default:
[ -z "$aecArgs" ] && aecArgs="analog_gain_control=0 digital_gain_control=1"
newSourceName="echoCancelSource"
newSinkName="echoCancelSink"

# "module-switch-on-connect" with "ignore_virtual=no" (needs PulseAudio 12 or higher) is needed to automatically move existing streams to a new (virtual) default source and sink.
if ! pactl list modules short | grep "module-switch-on-connect.*ignore_virtual=no" >/dev/null 2>&1; then
	echo Load module \"module-switch-on-connect\" with \"ignore_virtual=no\"
	pactl unload-module module-switch-on-connect 2>/dev/null
	pactl load-module module-switch-on-connect ignore_virtual=no
fi

# Reload "module-echo-cancel"
echo Reload \"module-echo-cancel\" with \"aec_args=$aecArgs\"
pactl unload-module module-echo-cancel 2>/dev/null
if pactl load-module module-echo-cancel use_master_format=1 aec_method=webrtc aec_args=\"$aecArgs\" source_name=$newSourceName sink_name=$newSinkName; then
	# Set a new default source and sink, if module-echo-cancel has loaded successfully.
	pacmd set-default-source $newSourceName
	pacmd set-default-sink $newSinkName
fi

Or via /etc/pulse/default.pa

### Enable Echo/Noise-Cancellation
load-module module-echo-cancel use_master_format=1 aec_method=webrtc aec_args=“analog_gain_control=0 digital_gain_control=1 agc_start_volume=85 high_pass_filter=1 noise_suppression=1 voice_detection=1 beamforming=1 mic_geometry=-0.03,0,0,-0.01,0,0,0.01,0,0,0.03,0,0” source_name=echoCancel_source sink_name=echoCancel_sink

set-default-source echoCancel_source
set-default-sink echoCancel_sink

But that is still only for pulse audio
aplay or commands that use alsa will still not use pulse audio unless pulseaudio is a default.

I also have a ps3eye on my desk but after experience with it I am not going to bother to use it.

voice · May 3, 2020, 2:50am

What is the idea? To let the microphones record what is being played by the speakers?

Anyway, when I use above settings the resulting recording isn’t terrible. It actually removes most of the background noise (I didn’t try with music). The only thing is that the sound seems to have a reverb effect applied to it or something like it.
I’ve tried to use without beamforming and the resulting wave is not too bad. There is no distortion, but there is much more background noise.

Did you have success in using it for the wakeword part? I’m using Precise, but I have to say “Hey Microft” around 6 times before it triggers.

It’s true, but my script sets the created virtual source as the default source and the RPI4 sink as the default sink. Also, my .asoundrc sets pulse as the default card. When I use arecord or parecord the resulting wav is the same (with the current applied pulse filters).

rolyan_trauts · May 3, 2020, 2:51am

No its why I am not using it as media playback for me is a common use.

The idea is so you can listen to the results of the microphone recording after AEC.

also do the same with aplay/arecord.

wget https://file-examples.com/wp-content/uploads/2017/11/file_example_WAV_10MG.wav

pulseaudio-alsa and this asound.conf should force alsa through pulse

# Use PulseAudio by default
pcm.!default {
  type pulse
  fallback "sysdefault"
  hint {
    show on
    description "Default ALSA Output (currently PulseAudio Sound Server)"
  }
}

ctl.!default {
  type pulse
  fallback "sysdefault"
}

# vim:set ft=alsaconf:

voice · May 3, 2020, 2:56am

Sorry, I forgot to tell you that I already had this configuration. (I edited the post)

rolyan_trauts · May 3, 2020, 2:57am

Its OK but doubt you will get any good results with the AEC but the AGC should work well though.
I think the drift compensation of webrtc_audio_processing needs a higher clock with more oomf than we have on a pi3/4.

If you journalctl -b and scroll to the end you will see it constantly resyncing.

I think its likely there is just far too much variable latency and drift between onboard I2S and USB PS3Eye.

Or its the hard coded platform hacks that don’t deal with a Arm SoC like the Pi correctly.

github.com

freedesktop/pulseaudio-webrtc-audio-processing/blob/master/webrtc/modules/audio_processing/aec/echo_cancellation.c

/*
 *  Copyright (c) 2012 The WebRTC project authors. All Rights Reserved.
 *
 *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
 *  tree. An additional intellectual property rights grant can be found
 *  in the file PATENTS.  All contributing project authors may
 *  be found in the AUTHORS file in the root of the source tree.
 */

/*
 * Contains the API functions for the AEC.
 */
#include "webrtc/modules/audio_processing/aec/include/echo_cancellation.h"

#include <math.h>
#ifdef WEBRTC_AEC_DEBUG_DUMP
#include <stdio.h>
#endif
#include <stdlib.h>

This file has been truncated. show original

// Measured delays [ms]
// Device                Chrome  GTP
// MacBook Air           10
// MacBook Retina        10      100
// MacPro                30?
//
// Win7 Desktop          70      80?
// Win7 T430s            110
// Win8 T420s            70
//
// Daisy                 50
// Pixel (w/ preproc?)           240
// Pixel (w/o preproc?)  110     110

// The extended filter mode gives us the flexibility to ignore the system's
// reported delays. We do this for platforms which we believe provide results
// which are incompatible with the AEC's expectations. Based on measurements
// (some provided above) we set a conservative (i.e. lower than measured)
// fixed delay.
//
// WEBRTC_UNTRUSTED_DELAY will only have an impact when |extended_filter_mode|
// is enabled. See the note along with |DelayCorrection| in
// echo_cancellation_impl.h for more details on the mode.
//
// Justification:
// Chromium/Mac: Here, the true latency is so low (~10-20 ms), that it plays
// havoc with the AEC's buffering. To avoid this, we set a fixed delay of 20 ms
// and then compensate by rewinding by 10 ms (in wideband) through
// kDelayDiffOffsetSamples. This trick does not seem to work for larger rewind
// values, but fortunately this is sufficient.
//
// Chromium/Linux(ChromeOS): The values we get on this platform don't correspond
// well to reality. The variance doesn't match the AEC's buffer changes, and the
// bulk values tend to be too low. However, the range across different hardware
// appears to be too large to choose a single value.
//
// GTP/Linux(ChromeOS): TBD, but for the moment we will trust the values.
#if defined(WEBRTC_CHROMIUM_BUILD) && defined(WEBRTC_MAC)
#define WEBRTC_UNTRUSTED_DELAY
#endif

#if defined(WEBRTC_UNTRUSTED_DELAY) && defined(WEBRTC_MAC)
static const int kDelayDiffOffsetSamples = -160;
#else
// Not enabled for now.
static const int kDelayDiffOffsetSamples = 0;
#endif

#if defined(WEBRTC_MAC)
static const int kFixedDelayMs = 20;
#else
static const int kFixedDelayMs = 50;
#endif
#if !defined(WEBRTC_UNTRUSTED_DELAY)
static const int kMinTrustedDelayMs = 20;
#endif
static const int kMaxTrustedDelayMs = 500;

// Maximum length of resampled signal. Must be an integer multiple of frames
// (ceil(1/(1 + MIN_SKEW)*2) + 1)*FRAME_LEN
// The factor of 2 handles wb, and the + 1 is as a safety margin
// TODO(bjornv): Replace with kResamplerBufferSize
#define MAX_RESAMP_LEN (5 * FRAME_LEN)

static const int kMaxBufSizeStart = 62;  // In partitions
static const int sampMsNb = 8;           // samples per ms in nb
static const int initCheck = 42;

#ifdef WEBRTC_AEC_DEBUG_DUMP
int webrtc_aec_instance_count = 0;
#endif

To be honest not really sure but did give up trying with it.

lilbuh · May 4, 2020, 9:25am

hey all i m running a raspberry pi with raspbian could anyone tell me how did you install pulse audio WITHOUT X11 desktop since im running it headless ?

rolyan_trauts · May 4, 2020, 11:44am

Prob easiest way is to do what everyone says don’t and run pulseaudio systemwide.

Tip:
It is strongly suggested not to edit system-wide configuration files, but rather edit user ones. Create the ~/.config/pulse directory, then copy the system configuration files into it and edit according to your need.
Make sure you keep user configuration in sync with changes to the packaged files in /etc/pulse/. Otherwise, PulseAudio may refuse to start due to configuration errors.
There is usually no need to add your user to the audio group, as PulseAudio uses udev and logind to give access dynamically to the currently “active” user. Exceptions would include running the machine headless so that there is no currently “active” user.

Its same for docker but generally your using a server on the host and connecting by the network layer.
I am not all that keen on pulseaudio for headless but yeah it can be done, but you will have to google.
But check out a systemwide install rather than user based as that is essentially how you are running.

You can still run user-based with an cli autologin also. But its a mweh to pulseaudio in this instance from me.

voice · May 5, 2020, 1:00am

Sorry for the delay in answering…My Rpi4 setup (Debian Buster aarch64 + Pi 64-bit kernel) doesn’t show any resynching going on, but I have disabled beamforming for the moment. I still think the sound is amazing unbelievable, but there is a kind of distortion that happens. I’ve disabled it to try get better results from everything. Currently, the Precise (hey Mycroft) detection rate is around 1 out of 6 (with a lot of effort to speak correctly). That just sucks. But after the wake word is detected, the STT phase (PocketSphinx) is still worse. With only two intents (What time is it and Hello) is still can’t decide what to do most of the time. If it just detected a mic volume and randomized the intent would feel a lot better than timing out all the time.

When I use my old notebook’s single mic after pressing “tap to record” in the web UI, most of the time it correctly recognizes the intent. So it seems to me that either: 1) There is some problem while “passing command” from the wake word to the STT thing or 2) The PS eye is too bad.

However, the option 2 seems to be unlikely because when I turned off beamforming, the mic picks my voice very well (after a 350% increase in volume, that is) with a low background noise, at 0,50m to 4m distance. That’'s without any distortion. With beamforming ON, the low background noise disappears, but there is audible distortion to the voice (sounds like high compression).

So, I’m stuck again.

voice · May 5, 2020, 1:09am

Well, I’ve started with the raspbian lite image in a headless configuration. Then I’ve just installed the alsa util packages and then almost all pulseaudio packages that I thought would be handy, but no graphical tools at all.

I then migrated to pure Debian Buster (not raspbian) because I’d rather have less non-free software as possible and because of better CPU support (not only 64 bits, but better CPU features). After the basic system was working (with SSH) I’ve just apt-get installed alsa and pulseaudio packages.

Now, to get it working, I decided not to have it working as root. I’ve just used the standard configuration telling it to start with socket access. When the rhasspy user tries to get the audio device, systemd spawns a pulseaudio daemon for that user and it stays there forever.

Take a look at Archlinux’s excellent documentation: https://wiki.archlinux.org/index.php/PulseAudio

rolyan_trauts · May 5, 2020, 1:10am

I was the same though it was this great fantastic 4 array mic and reuse of old technology rather than this consumer dumping culture…
Brilliant I like that, I thought and its cheap, but results and problems turned out much less and more than what a recommendation was expected.
I spent ages battling and confused with the PS3eye and it does work but it seems to cause so many problems that I can not see it as a good recomendation, but alternatives are sparse.

I am trying to source and document alternatives at this moment and apologise if that might be a bit late for you, but I was exactly the same.
But the clincher for me is its untrue software AEC isn’t possible on Arm SoC but its very true of product such as the PS3eye.

I think I might have some similar priced solutions that might work much better but been hampered by deliveries due to current situation.
If I haven’t found alternative solutions then apart from my dislike of USB hardware DSP audio cost they might be the only solution.
But still following a hunch that it is untrue as the 2 Mic Respeaker proves that but its drivers are so lack lustre that just like the PS3eye I would not make that a recommend.

voice · May 5, 2020, 1:27am

Are you sure the problem you were having was hardware related? On my setup the sound recorded from the ps eye sounds much better than from both my notebook’s external mic and my headset connected to the notebook’s analog audio ports. At least to my human ears.

I’m thinking that there might be timing problems (rhasspy delays too much or too little to relay the sound to Precise and then to PocketSphinx and things get all messed-up os something).

I’ve connected the PS3 eye to the USB 2.0 port. I don’t see hardware errors or obvious glitches in the recorded sound, either.

rolyan_trauts · May 5, 2020, 1:32am

All my testing with the PS3eye was with Mycroft and there I found exactly the same.
For me with debugging and journalctl I could often see resyncing, sample rate mismatch and ctl problems.
Not sure what it is but use alternative hardware with horrid drivers aka Respeaker and those problems don’t seem to exist but you get locked into a specific kernel with some drivers than seem to take exception at a range of further pretty standard linux fair packages.
So both where a cul-de-sac to me.

Maybe its Precise?

Run pulseaudio from the cli with a pulseaudio --start -vv is it for the debugging? do a -h as like usual I knew but now forgot.
Or edit systemctl --user --full edit pulseaudio
Use vanilla Raspbian with pulseaudio and webrtc with an example wav playing on a cli terminal and record in another then playback.
Listen and test your results without all the additional Rhasspy or Mycroft overhead.
Strangely it works great at times and not so at others and never did work it out.

voice · May 5, 2020, 1:49am

I’m going to try that. What I’m getting now is some hope. If I choose “Hold to record” in the browser, even from very far and almost whispering, rhasspy STT seems fine. But Hey Microft doesn’t like to trigger. I’m trying to change the settings. What does " trigger_level - number of events to trigger activation (default 3)" mean? An event as in the DNN detecting parts of the phonemes?

rolyan_trauts · May 5, 2020, 3:20am

I like your dogged determination as I gave up, maybe you will fix.
I don’t like some of the other problems with the alsa ctl that stops even simple commands like alsactl store.
Its not very noob friendly as an introduction and introduced to noobs (me) and maybe I should of persevered, but the AEC like said kills it for me as any seperate card for playback/capture will retain lesser results to non at all.

voice · May 6, 2020, 6:13pm

Ok. Finally I got pulseaudio and Precise working with the playstation eye!

It turns out that the default configuration for Precise is too strict. I’ve set the “trigger level” to 1 (instead of 3) and the sensitivity to 0.9 (instead of 0.5). Now it almost always respond to my voice. I’d say that it also correctly gets my intent 80% of the time. Those two even at a distance of 4m, when there is no music playing.

However, the beamforming part is still missing. When I turn on beamforming from pulseaudio, the accuracy seems to drop a bit. I start to get around 70% accuracy for the wakeword and around 40% for the intents. But when I turn on music playing, it drops to around 20% for the wake word and Zero for the intents.

I’m thinking that maybe I’ve understood all wrong about how this pulseaudio filter works.

I thought that it would interpret things like this:
Sink1: some pulseaudio defined sink (audio output)
Source1: some pulseaudio defined source (audio input)
Sink1 defined as the RPI4 output hardware (41Khz 16-bit)
Source1 defined as the PS3 eye 4-mics
Virtual source defined as the result of applying pulseaudio AEC and beamforming to Source1 and also taking some input of currently being played sound from Sink1. This source is called echo_cancel and it is already in mono 16Khz 16-bit format.
Then echo_cancel is defined as the new defaut Source and the default Sink1 is still used.

Did I get this wrong?
In all the examples, it seems that when people define the echo-cancel module, they also define a new source and a new sink name! How is that possible? The new source_name I get, it where you must read audio from to get the background-cancelled and filtered voice. But where does the new Sink nName comes from? Does this mean that I have to use this new Sink Name to send all audio output? Or maybe do I have to change the default Sink name to this new one?

What I want to do:

Have one sink with high quality audio (44KHz 16-bit stereo is just fine);
Somehow process the 4-mics + what is currently in the output buffer in order to generate a cleaner “voice” input for rhasspy. For this I’m trying the pulseaudio AEC features.

rolyan_trauts · May 7, 2020, 4:26am

Dunno as all was fine until I started using docker and the respeaker 2mic and pulseaudio just seemed to be a problematic mare so stopped.

From memory I thought use_master_format=1 was basically it uses the format and setting from the original master.
You have a sink format and a source format but guess you could pipe paplay into parec and load up your results in audacity to check.
You can set sink_master and source_master and its best to have a look at the code.

Beamforming your definately correct so much so that next release its being removed as it already has upstream.