PS3 eye users, show your configuration!

What is the idea? To let the microphones record what is being played by the speakers?

Anyway, when I use above settings the resulting recording isn’t terrible. It actually removes most of the background noise (I didn’t try with music). The only thing is that the sound seems to have a reverb effect applied to it or something like it.
I’ve tried to use without beamforming and the resulting wave is not too bad. There is no distortion, but there is much more background noise.

Did you have success in using it for the wakeword part? I’m using Precise, but I have to say “Hey Microft” around 6 times before it triggers.

It’s true, but my script sets the created virtual source as the default source and the RPI4 sink as the default sink. Also, my .asoundrc sets pulse as the default card. When I use arecord or parecord the resulting wav is the same (with the current applied pulse filters).

No its why I am not using it as media playback for me is a common use.

The idea is so you can listen to the results of the microphone recording after AEC.

also do the same with aplay/arecord.

wget https://file-examples.com/wp-content/uploads/2017/11/file_example_WAV_10MG.wav

pulseaudio-alsa and this asound.conf should force alsa through pulse

# Use PulseAudio by default
pcm.!default {
  type pulse
  fallback "sysdefault"
  hint {
    show on
    description "Default ALSA Output (currently PulseAudio Sound Server)"
  }
}

ctl.!default {
  type pulse
  fallback "sysdefault"
}

# vim:set ft=alsaconf:

Sorry, I forgot to tell you that I already had this configuration. (I edited the post)

Its OK but doubt you will get any good results with the AEC but the AGC should work well though.
I think the drift compensation of webrtc_audio_processing needs a higher clock with more oomf than we have on a pi3/4.

If you journalctl -b and scroll to the end you will see it constantly resyncing.

I think its likely there is just far too much variable latency and drift between onboard I2S and USB PS3Eye.

Or its the hard coded platform hacks that don’t deal with a Arm SoC like the Pi correctly.

// Measured delays [ms]
// Device                Chrome  GTP
// MacBook Air           10
// MacBook Retina        10      100
// MacPro                30?
//
// Win7 Desktop          70      80?
// Win7 T430s            110
// Win8 T420s            70
//
// Daisy                 50
// Pixel (w/ preproc?)           240
// Pixel (w/o preproc?)  110     110

// The extended filter mode gives us the flexibility to ignore the system's
// reported delays. We do this for platforms which we believe provide results
// which are incompatible with the AEC's expectations. Based on measurements
// (some provided above) we set a conservative (i.e. lower than measured)
// fixed delay.
//
// WEBRTC_UNTRUSTED_DELAY will only have an impact when |extended_filter_mode|
// is enabled. See the note along with |DelayCorrection| in
// echo_cancellation_impl.h for more details on the mode.
//
// Justification:
// Chromium/Mac: Here, the true latency is so low (~10-20 ms), that it plays
// havoc with the AEC's buffering. To avoid this, we set a fixed delay of 20 ms
// and then compensate by rewinding by 10 ms (in wideband) through
// kDelayDiffOffsetSamples. This trick does not seem to work for larger rewind
// values, but fortunately this is sufficient.
//
// Chromium/Linux(ChromeOS): The values we get on this platform don't correspond
// well to reality. The variance doesn't match the AEC's buffer changes, and the
// bulk values tend to be too low. However, the range across different hardware
// appears to be too large to choose a single value.
//
// GTP/Linux(ChromeOS): TBD, but for the moment we will trust the values.
#if defined(WEBRTC_CHROMIUM_BUILD) && defined(WEBRTC_MAC)
#define WEBRTC_UNTRUSTED_DELAY
#endif

#if defined(WEBRTC_UNTRUSTED_DELAY) && defined(WEBRTC_MAC)
static const int kDelayDiffOffsetSamples = -160;
#else
// Not enabled for now.
static const int kDelayDiffOffsetSamples = 0;
#endif

#if defined(WEBRTC_MAC)
static const int kFixedDelayMs = 20;
#else
static const int kFixedDelayMs = 50;
#endif
#if !defined(WEBRTC_UNTRUSTED_DELAY)
static const int kMinTrustedDelayMs = 20;
#endif
static const int kMaxTrustedDelayMs = 500;

// Maximum length of resampled signal. Must be an integer multiple of frames
// (ceil(1/(1 + MIN_SKEW)*2) + 1)*FRAME_LEN
// The factor of 2 handles wb, and the + 1 is as a safety margin
// TODO(bjornv): Replace with kResamplerBufferSize
#define MAX_RESAMP_LEN (5 * FRAME_LEN)

static const int kMaxBufSizeStart = 62;  // In partitions
static const int sampMsNb = 8;           // samples per ms in nb
static const int initCheck = 42;

#ifdef WEBRTC_AEC_DEBUG_DUMP
int webrtc_aec_instance_count = 0;
#endif

To be honest not really sure but did give up trying with it.

hey all :slight_smile: i m running a raspberry pi with raspbian could anyone tell me how did you install pulse audio WITHOUT X11 desktop since im running it headless ?

Prob easiest way is to do what everyone says don’t and run pulseaudio systemwide.

Tip:
It is strongly suggested not to edit system-wide configuration files, but rather edit user ones. Create the ~/.config/pulse directory, then copy the system configuration files into it and edit according to your need.
Make sure you keep user configuration in sync with changes to the packaged files in /etc/pulse/. Otherwise, PulseAudio may refuse to start due to configuration errors.
There is usually no need to add your user to the audio group, as PulseAudio uses udev and logind to give access dynamically to the currently “active” user. Exceptions would include running the machine headless so that there is no currently “active” user.

Its same for docker but generally your using a server on the host and connecting by the network layer.
I am not all that keen on pulseaudio for headless but yeah it can be done, but you will have to google.
But check out a systemwide install rather than user based as that is essentially how you are running.

You can still run user-based with an cli autologin also. But its a mweh to pulseaudio in this instance from me.

Sorry for the delay in answering…My Rpi4 setup (Debian Buster aarch64 + Pi 64-bit kernel) doesn’t show any resynching going on, but I have disabled beamforming for the moment. I still think the sound is amazing unbelievable, but there is a kind of distortion that happens. I’ve disabled it to try get better results from everything. Currently, the Precise (hey Mycroft) detection rate is around 1 out of 6 (with a lot of effort to speak correctly). That just sucks. But after the wake word is detected, the STT phase (PocketSphinx) is still worse. With only two intents (What time is it and Hello) is still can’t decide what to do most of the time. If it just detected a mic volume and randomized the intent would feel a lot better than timing out all the time.

When I use my old notebook’s single mic after pressing “tap to record” in the web UI, most of the time it correctly recognizes the intent. So it seems to me that either: 1) There is some problem while “passing command” from the wake word to the STT thing or 2) The PS eye is too bad.

However, the option 2 seems to be unlikely because when I turned off beamforming, the mic picks my voice very well (after a 350% increase in volume, that is) with a low background noise, at 0,50m to 4m distance. That’'s without any distortion. With beamforming ON, the low background noise disappears, but there is audible distortion to the voice (sounds like high compression).

So, I’m stuck again.

Well, I’ve started with the raspbian lite image in a headless configuration. Then I’ve just installed the alsa util packages and then almost all pulseaudio packages that I thought would be handy, but no graphical tools at all.

I then migrated to pure Debian Buster (not raspbian) because I’d rather have less non-free software as possible and because of better CPU support (not only 64 bits, but better CPU features). After the basic system was working (with SSH) I’ve just apt-get installed alsa and pulseaudio packages.

Now, to get it working, I decided not to have it working as root. I’ve just used the standard configuration telling it to start with socket access. When the rhasspy user tries to get the audio device, systemd spawns a pulseaudio daemon for that user and it stays there forever.

Take a look at Archlinux’s excellent documentation: https://wiki.archlinux.org/index.php/PulseAudio

I was the same though it was this great fantastic 4 array mic and reuse of old technology rather than this consumer dumping culture…
Brilliant I like that, I thought and its cheap, but results and problems turned out much less and more than what a recommendation was expected.
I spent ages battling and confused with the PS3eye and it does work but it seems to cause so many problems that I can not see it as a good recomendation, but alternatives are sparse.

I am trying to source and document alternatives at this moment and apologise if that might be a bit late for you, but I was exactly the same.
But the clincher for me is its untrue software AEC isn’t possible on Arm SoC but its very true of product such as the PS3eye.

I think I might have some similar priced solutions that might work much better but been hampered by deliveries due to current situation.
If I haven’t found alternative solutions then apart from my dislike of USB hardware DSP audio cost they might be the only solution.
But still following a hunch that it is untrue as the 2 Mic Respeaker proves that but its drivers are so lack lustre that just like the PS3eye I would not make that a recommend.

Are you sure the problem you were having was hardware related? On my setup the sound recorded from the ps eye sounds much better than from both my notebook’s external mic and my headset connected to the notebook’s analog audio ports. At least to my human ears.

I’m thinking that there might be timing problems (rhasspy delays too much or too little to relay the sound to Precise and then to PocketSphinx and things get all messed-up os something).

I’ve connected the PS3 eye to the USB 2.0 port. I don’t see hardware errors or obvious glitches in the recorded sound, either.

All my testing with the PS3eye was with Mycroft and there I found exactly the same.
For me with debugging and journalctl I could often see resyncing, sample rate mismatch and ctl problems.
Not sure what it is but use alternative hardware with horrid drivers aka Respeaker and those problems don’t seem to exist but you get locked into a specific kernel with some drivers than seem to take exception at a range of further pretty standard linux fair packages.
So both where a cul-de-sac to me.

Maybe its Precise?

Run pulseaudio from the cli with a pulseaudio --start -vv is it for the debugging? do a -h as like usual I knew but now forgot.
Or edit systemctl --user --full edit pulseaudio
Use vanilla Raspbian with pulseaudio and webrtc with an example wav playing on a cli terminal and record in another then playback.
Listen and test your results without all the additional Rhasspy or Mycroft overhead.
Strangely it works great at times and not so at others and never did work it out.

I’m going to try that. What I’m getting now is some hope. If I choose “Hold to record” in the browser, even from very far and almost whispering, rhasspy STT seems fine. But Hey Microft doesn’t like to trigger. I’m trying to change the settings. What does " trigger_level - number of events to trigger activation (default 3)" mean? An event as in the DNN detecting parts of the phonemes?

I like your dogged determination as I gave up, maybe you will fix.
I don’t like some of the other problems with the alsa ctl that stops even simple commands like alsactl store.
Its not very noob friendly as an introduction and introduced to noobs (me) and maybe I should of persevered, but the AEC like said kills it for me as any seperate card for playback/capture will retain lesser results to non at all.

Ok. Finally I got pulseaudio and Precise working with the playstation eye!

It turns out that the default configuration for Precise is too strict. I’ve set the “trigger level” to 1 (instead of 3) and the sensitivity to 0.9 (instead of 0.5). Now it almost always respond to my voice. I’d say that it also correctly gets my intent 80% of the time. Those two even at a distance of 4m, when there is no music playing.

However, the beamforming part is still missing. When I turn on beamforming from pulseaudio, the accuracy seems to drop a bit. I start to get around 70% accuracy for the wakeword and around 40% for the intents. But when I turn on music playing, it drops to around 20% for the wake word and Zero for the intents.

I’m thinking that maybe I’ve understood all wrong about how this pulseaudio filter works.

I thought that it would interpret things like this:
Sink1: some pulseaudio defined sink (audio output)
Source1: some pulseaudio defined source (audio input)
Sink1 defined as the RPI4 output hardware (41Khz 16-bit)
Source1 defined as the PS3 eye 4-mics
Virtual source defined as the result of applying pulseaudio AEC and beamforming to Source1 and also taking some input of currently being played sound from Sink1. This source is called echo_cancel and it is already in mono 16Khz 16-bit format.
Then echo_cancel is defined as the new defaut Source and the default Sink1 is still used.

  1. Did I get this wrong?
  2. In all the examples, it seems that when people define the echo-cancel module, they also define a new source and a new sink name! How is that possible? The new source_name I get, it where you must read audio from to get the background-cancelled and filtered voice. But where does the new Sink nName comes from? Does this mean that I have to use this new Sink Name to send all audio output? Or maybe do I have to change the default Sink name to this new one?

What I want to do:

  1. Have one sink with high quality audio (44KHz 16-bit stereo is just fine);
  2. Somehow process the 4-mics + what is currently in the output buffer in order to generate a cleaner “voice” input for rhasspy. For this I’m trying the pulseaudio AEC features.

Dunno as all was fine until I started using docker and the respeaker 2mic and pulseaudio just seemed to be a problematic mare so stopped.

From memory I thought use_master_format=1 was basically it uses the format and setting from the original master.
You have a sink format and a source format but guess you could pipe paplay into parec and load up your results in audacity to check.
You can set sink_master and source_master and its best to have a look at the code.

Beamforming your definately correct so much so that next release its being removed as it already has upstream.

I tried to send the pulseaudio guy an email, but I guess it went to the spam folder. I would like to beg him/they to leave it there. It does wonders to the audio, even though it also distorts a little the voice.

Ok, I’m quite pleased now with my current Raspberry Pi4 + pulseaudio configuration. For anyone reading this, I’m posting my findings and workarounds for a great microphone capture experience and reasonable audio quality.

Overview:

  1. Pulseaudio is the main sound architecture. I use an alsa configuration only to tell alsa utils to use the pulse devices so that I get automatic audio mixing, converting and adjusting plus pulseaudio filters.
  2. Rhasspy uses arecord to get the sounds and it is fine because it is aactually using pulse behind.
  3. The 4-mic playstation eye is the source of sounds and the 4 mics are processed with the currently playing audio so that there is echo cancelation and noise removal from the voice input. Beamforming is also working to an extant, but it causes a little distortion to the resulting voice. In my humble tests, the results were better with it turned off.
  4. The way it works is: you load an echo cancellation module which gets the currently playing audio AND the sound from the mics to do its magic. BUT, all applications MUST use the module’s created SOURCE for all voice capture AND the module’s SINK for all sound output. The quirk I’ve found with this is that I cound’t find a way to force the echo cancellation module’s sink to be stereo. So I had to use a virtual sink (the workaround).
  5. In the end the audio quality is as good as your speakers and audio hardware (mine is the default Raspberry Pi Soc + a good old stereo system). Maybe in the future I’ll get another audio output device.

The workaround is simple:

  1. Create a virtual sink and let the echo cancellation module think that is the output sink;
  2. Load the echo cancellation module and tell it to read from the 4-mic playstation eye source and also read from the virtual sink (yes, it reads from a sink). The module creates a new source (aec_source) and a new sink (this sink is unfortunatelly only a mono channel, called aec_sink).
  3. Load a combine module which creates a new sink that accepts high quality stereo sound and sends it to two places: A) the real stereo speakers; B) The aec_sink (downmixing the sound only for processing).
  4. Tell the applications (MPD, Rhasspy, etc) to use the sink crated in 3 for all audio output and the aec_source to read any audio.

When I find out how to post the script here I do it.

You probably need to do it in the docker version as Rhasspy is heavily docker oriented.

I tried pulseaudio AEC with a PS3eye and really you need to post some comparison wavs of with-aec/no-aec as for me there was litttle difference on a Pi with PS3eye and 3.5mm Pi output.

I did some tests with the supposedly lesser quality Speexdsp with respeaker 2mic that produced noticeable AEC results that wasn’t that noticeable via PS3eye and Pulseaudio.
Its clock drift between PS3eye and Pi 3.5mm but could never get pulseaudio to run with the respeaker 2mic drivers as presume it is better but could never prove.

PS extant doesn’t mean it works. The default beam setting if ommited like often it is starts at 90’ right of the mic. That is some beamforming in action that is 90 degrees out of phase and it makes liitle tangible difference to recorded audio.
Its already been removed upstream from Webrtc and next pulseaudio will also follow suit.
There is no DoA input or feedback in the code and its absolutely impossible for any beamforming to work without (directional of angle) to beamform to unless fixed position, but hey it is extant until next version.

I am always strugging to get my PS Eye working. Every now and then it stops to work and I usually have a hard time to get it back. Sometimes it works as “default”, sometimes as “USB Camera-B4.09.24.1 Multichannel” audio input. HA detects it but it is always a pain… image
When it works, it is not a very good performance, taking advantage of noise cancellation and so on, with the 4 mic array (I believe).

I found this thread with a lot of fancy configurations that would probably give me more reliabilty, less struggling and also improve its performance. But you know what you are doing, what files you are changing and so on. I only use the UI till the moment.
So, I would like to know if there is some place, orientation, where I could learn exactly what to do to solve the issues that I mentioned above… I See command lines but I don´t know where it should be applied.
Thanks to anyone that could show me the path.

Using: Proxmox > Ubuntu 18.04.3 > Docker

The PS3 eye is a 4 channel omni-directional broadside array where the clever stuff was in software inside the PS3-Eye.
It never had windows or linux drivers and was reversed engineered and is hacky.
It can not even do a alsactl store without error.

The errors you see have never been resolved in what now is almost 2 decades and even if it did the closed source of the algs in the PS3 have never been available apart when used with a PS3.

You have a broadside array when you sum your channels its dependent on the SNR and sensitivity of each electret which nobody knows.
Even the wiki is incorrect guess work as https://en.wikipedia.org/wiki/PlayStation_Eye and

The PlayStation Eye microphone array operates with each channel processing 16-bit samples at a sampling rate of 48 kilohertz, and a signal-to-noise ratio of 90 decibels.

Is a total WTF as considering decibels is a logarithmic scale you will be hard pushed to find any microphone that goes past 70db SNR! I guess the silicon could but doubt the result is anywhere near.

So it a omndirectional broadside array that has attenuation on the sides and also acts as an 12db? (because of 4 mics summed) Low pass filter from audio on the side.
Its all in the above app note but no just because its an array means you have the antenna but you don’t have the control software and there isn’t any available opensource for the Pi.
It has zero noise cancellation its just a mic array, just another microphone without the software that was in the PS3.