Respeaker 2 Hat - Buster - Rhasspy + HA as service install + picotts + snips wakeword

adrianofoschi · January 6, 2020, 10:47pm

Does Respeaker Mic Array v2.0 (USB) work out of the box on raspbian or does it need driver installation?
I bought one I will receive it in the next days

fastjack · January 6, 2020, 11:19pm

No driver installation is needed. It works out of the box as both an ALSA input and output device.

Enjoy

adrianofoschi · January 12, 2020, 3:57pm

Fastjack did you modify webrtcvad default parameters for respeaker mic array v2.0?
Are you using a satellite configuration?

fastjack · January 12, 2020, 5:50pm

I did not modify the default Respeaker config.

Though I have not yet tested the full Rhasspy chain (mic->wakeword->asr->nlu->intent->tts->speaker).

I got good results with the snips software.

For now, I’m concentrating on the ASR and NLU part as this seems like the most complex stuff to get right.

Once I get satisfied with the ASR and NLU results I’ll move on the wakeword.

adrianofoschi · January 12, 2020, 6:05pm

ok thanks you

TomHagen · March 29, 2020, 10:14am

Sorry to resurrect this thread, but I’m trying to get the same setup (snips-satellite (I like snips’ hotword detection) and rhasspy handling the rest (I dislike the rest of modern-day snips)).

Do you run the snips part on the same raspy as the rhasspy software? I’m using the ps3 eye mic and run into “arecord: main:828: audio open error: Device or resource busy” errors when I have snips-satellite running.

Any idea how to solve that?

voice · April 13, 2020, 8:21pm

Hey, do you have to use it as an output device in order for the AEC and the rest of its magic to work? Can I use for output, for instance, a regular lineout from the board?

fastjack · April 14, 2020, 8:31am

The AEC needs the audio ouput to remove it from the input signal.

Sending the playback to 2 outputs at the same time is quite problematic due to clock synchronisation and latency issue between audio devices (I’m not savy enough to get past that).

voice · April 14, 2020, 7:24pm

So, If one send the output through Raspberry’s line out device, Respeaker won’t be able to work or will perform like it doesn’t have AEC enabled, right?

fastjack · April 14, 2020, 7:48pm

Yes, you are correct. Respeaker needs the playback audio data to be able to remove it from input audio data.

voice · April 14, 2020, 8:04pm

I was thinking of trying to get one myself, but this is a cold shower. How does the 16Khz audio sound like? Is it obvious? Is it at least stereo?

fastjack · April 15, 2020, 7:11am

The Respeaker Mic Array v2 outputs stereo in 16Khz 16bits. You can ear the difference between 16Khz and 48Khz when playing music through earphones. The sound is like muffled. To me, It is not obvious when playing through loudspeakers. For a vocal assistant that can play radio stream and Spotify it’ll do for me. I prefer a lesser audio quality but a great voice interactivity.

I tried Seeed’s 48Khz firmware that improves the playback quality (with strange « shhh » sounds that are not acceptable though) and the AEC is almost non existant. I notified them on their forum and their response was that the XMos chip is not capable of processing AEC at this sample rate.

If I have enough time, I’ll give PulseAudio AEC module another go because I’d really like to not be dependent on a specific Mic in the future.

Hope this helps.

maxbachmann · April 15, 2020, 8:01am

Is it an option to output the audio both to the respeaker and your normal speakers, so the XMos Chip can do it’s processing at 16kHz, while you listen to it with a higher sample rate?

fastjack · April 15, 2020, 8:05am

I was not able to make it work unfortunately. I doubt it is possible.

maxbachmann · April 15, 2020, 8:11am

Hm ok. Yeah at least for me that makes the respeaker pretty useless, since there is no way I will listen to all my music with 16kHz for the AEC. Especially since the respeaker has quite some latency with all the preprocessing and USB communication already
I guess I will give Pulseaudio AEC with a faster mic using I2S a try when I find the time aswell then.

fastjack · April 15, 2020, 8:20am

You can also try Seeed software AEC for a lighter solution using ALSA instead of the heavier PulseAudio service.

I was not able to make it work correctly either but I did not put a lot of effort into it though.

I wonder if something like this could be incorporated directly into Rhasspy audio input chain using a python c++ binding.

What do you think ?

maxbachmann · April 15, 2020, 11:48am

I will try it out when I have some time. When it works well I can absolutely create python bindings using the Python C Api so we can integrate it better into rhasspy.

rolyan_trauts · April 24, 2020, 10:44pm

It does work the coder might of made one mistake and like the best of us slipped up and prob got thinsg the wrong way round.
The frame_size is a singular FFT process and should be a power of 2 but the code does it the other way round and sets a frames to a division of the sample rate to give 10ms worth. The tail_length has been provided as a power of 2 (4096) which actually could be 100ms + of the samplerate.

Prob needs someone just to check over some simple compile optimisations.
I did try the fftw3 libs in the speechdsp compile the 2nd time (pi3) but either my inexperience or that the internal fft routine is specific for what is needed it either didn’t make things better or runs slightly worse.
Is there an overhead when calling external libs to internal code as presume so.

I haven’t a clue but with my lack of knowledge if the FFT can be batched then maybe gpu_fft would be a major boost.

Generally though the webrtc AEC is generally regarded as better and its a much more complex piece of code.
Its a shame that it doesn’t have python wrappers in entirety as from what I have seen its just the VAD that seems to crop up. Is this complete or just VAD?

Its part of Pulseaudio but they have done this thing for drift compensation that I am not sure if its a postive or negative for embedded.
It was done to allow webrtc to run on sperate sound cards for playback and capture and cope with clock drift.
I have a hunch that if that was removed and vanilla webrtc_audio_processing was in its entirety with AEC that worked in a similar way of the fifo system of the above, we may get something much better than speex.
But the speexdsp is running quite well and this was something many said would never work.

Its not about being cheap its about enabling multiple satelites in wide aray network microphone systems that even with a Pi3A+ & $10 sound card isn’t really cheap when your talking 2 or 4 of them.
But wow its at least possible rather than $70 USB mics before you even have a soc. $140 is much more palitable than $500+ if you want a 4x speaker/mic setup or $70 vs $250 for x2.

I have even been thinking with snapcast that a room might also have a echo channel which is not just what the device is playing but what the whole room might be playing that could be sourced from HDMI arch.
Then in domestic situation if we can get AEC working to a resonable level and maybe the webrtc clock drift is needed after all the common source of domestic media noise inteference is negated.
But would love to see a native webrtc AEC running, but not sure about the drift compensation freedesktop implemented with the Pi, but webrtc rocks.

I will set up pulseaudio tomorrow and see how it runs with el cheapo sound card with AEC and post some audio and opinion.

We have no audio preprocessing in what we are doing unless purchased in through hardware which I think is probably unessacary.
There are filters we can attach ‘notch’ for frequency AEC and compress that could give really good results especially if there is a model recorded the same also.
It would be really good to set a standard audio preprocessor and also have it in the project as it may vastly improve accuracy with noise and media.

PS not going to edit that as my dyslexia got thinsg wrong in a sentence about things being the wrong way round, change the code and ignore my rambles

@fastjack I haven’t got one but doesn’t the respeaker have a loopback that actually you could loop back what you play on another card. Why is another question but you could and still get AEC?
Actually yeah maybe you could get all audiophile and maybe use a usb dac for audio out but loopback that audio for aec? (It still plays through a WM8960 that the 2 mic respeaker does)

rolyan_trauts · April 25, 2020, 2:42am

Just do as it says.

sudo apt-get -y install libasound2-dev libspeexdsp-dev
git clone https://github.com/voice-engine/ec.git
cd ec
make

Install the alsa fifo

git clone https://github.com/voice-engine/alsa_plugin_fifo.git
cd alsa_plugin_fifo
make && sudo make install

Copy the asound.conf to /etc/

Run ec in cli console ./ec -i plughw:1 -o plughw:1 -d 75 -f 2048

In another cli console arecord -fS16_LE -r16000 rec.wav

In another cli console aplay file_example_WAV_10MG.wav

When you stop recording you will notice EC ends.
So to make it permanent you will just have to sort the pcm names and device names but using examples

sudo modprobe snd-aloop
arecord -D my-ec-output | aplay -D aloop,dev0

Make aloop,dev1 the default capture and ec will always record but only kick on on media play.
I think there is prob a slight mistake in the code ec.c where the small 10ms filter_frame should be a power of 2 (128, 256 or 512) as FFT like it that way. 10ms of 16000hz is between 128 & 256 but speex seems to think 20ms and why maybe I think 256 sounds a little better and 512 seems to work OK aswell.
He has actually set the filter_tail to a power of 2 and actually that doesn’t matter so much, but no harm in keeping to the power of 2 .
So maybe do a hard edit and make.

So it all works via alsa, kicks in when only media is played and with the loopback will not need a restart on each recording end.

Only thing I did any different as noticed raspbian speex & speexdsp are quite old and an RC I just downloaded the tar and compiled and installed the 1.2.0 release before making EC.
ec_hw doesn’t seem to work with my 4 mic linear but not surprised as I am unsure if it has a loopback channel that is stated in the sales lit, think they are mixing it with the usb 4 mic.

If someone can provide a working latency measurement of that arrangement, it would be really appreciated as my 75ms delay is a total guess.
I tried the alsabat --roundtriplatency and couldn’t seem to get it to work but it does with straight hardware. Also prob wouldn’t be correct because we have a pipe through a loopback after EC.

Again once more my annoying voice but a default install of ec without any tinkering on a Pi4 should sound like this.

hawkeye217 · April 26, 2020, 3:26pm

So I bought a Respeaker 2 hat to try some of this with. Are you aware of any module or alsa settings that would disable power saving on the card? I’m getting a pretty nasty click/pop at the beginning of each audio file as the card plays it.

I can hack around this by running the following in another terminal:

$ aplay -f S8 /dev/zero

And this of course keeps the card “awake”. It works, but it’s a bit of a hack.