Respeaker 2 Hat - Buster - Rhasspy + HA as service install + picotts + snips wakeword

voice · April 14, 2020, 7:24pm

So, If one send the output through Raspberry’s line out device, Respeaker won’t be able to work or will perform like it doesn’t have AEC enabled, right?

fastjack · April 14, 2020, 7:48pm

Yes, you are correct. Respeaker needs the playback audio data to be able to remove it from input audio data.

voice · April 14, 2020, 8:04pm

I was thinking of trying to get one myself, but this is a cold shower. How does the 16Khz audio sound like? Is it obvious? Is it at least stereo?

fastjack · April 15, 2020, 7:11am

The Respeaker Mic Array v2 outputs stereo in 16Khz 16bits. You can ear the difference between 16Khz and 48Khz when playing music through earphones. The sound is like muffled. To me, It is not obvious when playing through loudspeakers. For a vocal assistant that can play radio stream and Spotify it’ll do for me. I prefer a lesser audio quality but a great voice interactivity.

I tried Seeed’s 48Khz firmware that improves the playback quality (with strange « shhh » sounds that are not acceptable though) and the AEC is almost non existant. I notified them on their forum and their response was that the XMos chip is not capable of processing AEC at this sample rate.

If I have enough time, I’ll give PulseAudio AEC module another go because I’d really like to not be dependent on a specific Mic in the future.

Hope this helps.

maxbachmann · April 15, 2020, 8:01am

Is it an option to output the audio both to the respeaker and your normal speakers, so the XMos Chip can do it’s processing at 16kHz, while you listen to it with a higher sample rate?

fastjack · April 15, 2020, 8:05am

I was not able to make it work unfortunately. I doubt it is possible.

maxbachmann · April 15, 2020, 8:11am

Hm ok. Yeah at least for me that makes the respeaker pretty useless, since there is no way I will listen to all my music with 16kHz for the AEC. Especially since the respeaker has quite some latency with all the preprocessing and USB communication already
I guess I will give Pulseaudio AEC with a faster mic using I2S a try when I find the time aswell then.

fastjack · April 15, 2020, 8:20am

You can also try Seeed software AEC for a lighter solution using ALSA instead of the heavier PulseAudio service.

I was not able to make it work correctly either but I did not put a lot of effort into it though.

I wonder if something like this could be incorporated directly into Rhasspy audio input chain using a python c++ binding.

What do you think ?

maxbachmann · April 15, 2020, 11:48am

I will try it out when I have some time. When it works well I can absolutely create python bindings using the Python C Api so we can integrate it better into rhasspy.

rolyan_trauts · April 24, 2020, 10:44pm

It does work the coder might of made one mistake and like the best of us slipped up and prob got thinsg the wrong way round.
The frame_size is a singular FFT process and should be a power of 2 but the code does it the other way round and sets a frames to a division of the sample rate to give 10ms worth. The tail_length has been provided as a power of 2 (4096) which actually could be 100ms + of the samplerate.

Prob needs someone just to check over some simple compile optimisations.
I did try the fftw3 libs in the speechdsp compile the 2nd time (pi3) but either my inexperience or that the internal fft routine is specific for what is needed it either didn’t make things better or runs slightly worse.
Is there an overhead when calling external libs to internal code as presume so.

I haven’t a clue but with my lack of knowledge if the FFT can be batched then maybe gpu_fft would be a major boost.

Generally though the webrtc AEC is generally regarded as better and its a much more complex piece of code.
Its a shame that it doesn’t have python wrappers in entirety as from what I have seen its just the VAD that seems to crop up. Is this complete or just VAD?

Its part of Pulseaudio but they have done this thing for drift compensation that I am not sure if its a postive or negative for embedded.
It was done to allow webrtc to run on sperate sound cards for playback and capture and cope with clock drift.
I have a hunch that if that was removed and vanilla webrtc_audio_processing was in its entirety with AEC that worked in a similar way of the fifo system of the above, we may get something much better than speex.
But the speexdsp is running quite well and this was something many said would never work.

Its not about being cheap its about enabling multiple satelites in wide aray network microphone systems that even with a Pi3A+ & $10 sound card isn’t really cheap when your talking 2 or 4 of them.
But wow its at least possible rather than $70 USB mics before you even have a soc. $140 is much more palitable than $500+ if you want a 4x speaker/mic setup or $70 vs $250 for x2.

I have even been thinking with snapcast that a room might also have a echo channel which is not just what the device is playing but what the whole room might be playing that could be sourced from HDMI arch.
Then in domestic situation if we can get AEC working to a resonable level and maybe the webrtc clock drift is needed after all the common source of domestic media noise inteference is negated.
But would love to see a native webrtc AEC running, but not sure about the drift compensation freedesktop implemented with the Pi, but webrtc rocks.

I will set up pulseaudio tomorrow and see how it runs with el cheapo sound card with AEC and post some audio and opinion.

We have no audio preprocessing in what we are doing unless purchased in through hardware which I think is probably unessacary.
There are filters we can attach ‘notch’ for frequency AEC and compress that could give really good results especially if there is a model recorded the same also.
It would be really good to set a standard audio preprocessor and also have it in the project as it may vastly improve accuracy with noise and media.

PS not going to edit that as my dyslexia got thinsg wrong in a sentence about things being the wrong way round, change the code and ignore my rambles

@fastjack I haven’t got one but doesn’t the respeaker have a loopback that actually you could loop back what you play on another card. Why is another question but you could and still get AEC?
Actually yeah maybe you could get all audiophile and maybe use a usb dac for audio out but loopback that audio for aec? (It still plays through a WM8960 that the 2 mic respeaker does)

rolyan_trauts · April 25, 2020, 2:42am

Just do as it says.

sudo apt-get -y install libasound2-dev libspeexdsp-dev
git clone https://github.com/voice-engine/ec.git
cd ec
make

Install the alsa fifo

git clone https://github.com/voice-engine/alsa_plugin_fifo.git
cd alsa_plugin_fifo
make && sudo make install

Copy the asound.conf to /etc/

Run ec in cli console ./ec -i plughw:1 -o plughw:1 -d 75 -f 2048

In another cli console arecord -fS16_LE -r16000 rec.wav

In another cli console aplay file_example_WAV_10MG.wav

When you stop recording you will notice EC ends.
So to make it permanent you will just have to sort the pcm names and device names but using examples

sudo modprobe snd-aloop
arecord -D my-ec-output | aplay -D aloop,dev0

Make aloop,dev1 the default capture and ec will always record but only kick on on media play.
I think there is prob a slight mistake in the code ec.c where the small 10ms filter_frame should be a power of 2 (128, 256 or 512) as FFT like it that way. 10ms of 16000hz is between 128 & 256 but speex seems to think 20ms and why maybe I think 256 sounds a little better and 512 seems to work OK aswell.
He has actually set the filter_tail to a power of 2 and actually that doesn’t matter so much, but no harm in keeping to the power of 2 .
So maybe do a hard edit and make.

So it all works via alsa, kicks in when only media is played and with the loopback will not need a restart on each recording end.

Only thing I did any different as noticed raspbian speex & speexdsp are quite old and an RC I just downloaded the tar and compiled and installed the 1.2.0 release before making EC.
ec_hw doesn’t seem to work with my 4 mic linear but not surprised as I am unsure if it has a loopback channel that is stated in the sales lit, think they are mixing it with the usb 4 mic.

If someone can provide a working latency measurement of that arrangement, it would be really appreciated as my 75ms delay is a total guess.
I tried the alsabat --roundtriplatency and couldn’t seem to get it to work but it does with straight hardware. Also prob wouldn’t be correct because we have a pipe through a loopback after EC.

Again once more my annoying voice but a default install of ec without any tinkering on a Pi4 should sound like this.

hawkeye217 · April 26, 2020, 3:26pm

So I bought a Respeaker 2 hat to try some of this with. Are you aware of any module or alsa settings that would disable power saving on the card? I’m getting a pretty nasty click/pop at the beginning of each audio file as the card plays it.

I can hack around this by running the following in another terminal:

$ aplay -f S8 /dev/zero

And this of course keeps the card “awake”. It works, but it’s a bit of a hack.

rolyan_trauts · April 26, 2020, 3:55pm

No clicks or pops on play at all, but I am running from the 3.5mm and to be honest never tried the in built if you are.
Worst thing about the card is the drivers as they are a s-storm of mess.
Such a strange conception to do a driver that also is a service and kernel downgrade that is an all-in-one for several cards.
Try the EC though as the mission for software EC is an honourable crusade!
I have a USB VIA VT1620A to try next as happy about EC results not so happy about the driver situation but hoping Respeaker sort that as they are doing work at the moment.
I have to stay away from the github as the more I delved into what they where doing the more incredulous I became.

Also Hawkeye alsa-plugins also implement speexdsp.
This is an old asound.conf and should give it a go again.

 pcm.!default {
    type asym
    playback.pcm "plughw:CARD=ALSA,DEV=0"
    capture.pcm  "cap"
}

pcm.array {
 type hw
 card 1
}

pcm.cap {
 type plug
 slave {
   pcm "array"
   channels 4
   }
 route_policy sum
}

pcm.echo {
 type speex
 slave.pcm "cap"
 echo yes
 frames 256
 filter_length 1600
 denoise false
}

pcm.agc {
 type speex
 slave.pcm "echo"
 agc 1
 denoise yes
 dereverb yes
}

Again its never simple as the repo version of libspeexdsp-dev is older than alsa-plugins expects so we don’t get the speexdsp-plugins.
You have to download from speex and compile and then alsaplugins 1.8.1 again and then they arrive.

Dunno about the pops and clicks on play as never had that problem but if your using the onboard I think you also must provide separate power via the usb as the amp pulls to much for the Pi regulators, which have never tried it, might be, but a guess. I guess a psu splitter cable would be needed.

Also have you ever setup a LADSPA plugin as want to setup an expander after EC to drop the noise
floor, as might have some excellent results.

hawkeye217 · April 26, 2020, 4:01pm

I’m actually using the 3.5mm out on the Respeaker. I have an issue in on their github, I’ll see if they say anything there.

I agree, the kernel downgrade is a huge mess.

I have a feeling ec on my Pi Zero will bring it to its knees, but I’m going to give it a try anyways. Thanks!

rolyan_trauts · April 26, 2020, 4:21pm

Yeah only tried on the Pi3a+ and Pi4b+2gb.

The results will be interesting though.
But no clicks on play here? Only thing I haven’t done is try a zero.

hawkeye217 · April 26, 2020, 4:23pm

Perhaps it’s only an issue on the Zero. I have a 3B+ I could try it with. But the aplay -f S8 /dev/zero does the job for me and is using minimal cpu.

rolyan_trauts · April 26, 2020, 4:26pm

Unless its something to do with the dmix.
I usually cast off the respeaker asound.conf setup and use straight plughw:
I do get occasional clicks from it but they are random and really occasional and not on play.
Also due to having it set with no driver installed, its pretty silent that way when installed, so dunno.
One annoying thing it does do if you have a 50watt amp connected is set up all volumes at 100% by default.

PS alsa-plugins speexdsp echo seems to suck eggs.

AlmostSerious · January 16, 2021, 6:29pm

Has anybody managed to get the EC working with Snapclient as the Playback device?
Also, which device would you choose in the Rhasspy UI as aplay device for TTS with EC on?
It doesn’t show me ECI and ECO and the default speaker also doesnt give any output, so I cannot get TTS working with EC on.

What worked, was being able to install it and test it on the Pi3A+ with the 2MIC Hat from ReSpeaker with the aplay and arecord directly, and the results are awesome. The voice is so much clearer.

My endgoal would be to have only one device per room doing both multiroom mediaplayback (via Snapcast) and Voice Assistant. Ideally with the 10$ 2Mic Hat.
However, when I am trying to feed Snapclient as the Soundsource I only get error messages that I do not understand.

rolyan_trauts · January 16, 2021, 7:22pm

From memory snapclient just plays on the default device but it might be docker syndrome again where if using your setup needs to be in the container and you are mixing host and container up as what happens on one might not be how the other is set-up.

Snapcast is great but really as long as eco and eci are setup correctly but that alsaconf needs to be either shared via the docker run command or created in the container.

Is it a container thing?

I did some one liners and this repo to make things easier and it may help

I keep threatening to do an image one time

AlmostSerious · January 16, 2021, 7:39pm

Yes you can select the soundcard snapclient should use. I tried all of them, including eci and eco. But as soon as I am playing any content, it just logs tons of errors. I dont think its a container thing as snapclient is just installed on the machine. It might be though a container thing for the rhasspy install.