Since my radio streaming is working now and I’ve just accomplished to mute mopidy on hotword detection and unmute after playFinished is published I’ve got another problem.
As soon as radio is running I need to speak directly into the mic. Although the speakers are about a meter away. I don’t understand why that is. Are there any ways to get this better?
You need to play the music/radio stream through the Respeaker 3.5mm jack audio output to get the embedded AEC alto to kick in during playback and remove it from the audio capture.
Without AEC, it won’t work. Even by increasing the hotword sensibility.
Check out these recent topics about AEC for more detail:
Also check that you are capturing from the dedicated ASR input channel of the Respeaker (with noise suppression, AEC and beam forming) and not from the raw mic input channels.
The 1 channel firmware only provides this special channel as input.
The 6 channels provides this special channel and 4 raw mic channels. The last one is the playback loopback and is not used by Rhasspy.
For simplicity sake, either flash the 1 channel firmware or configure your ALSA devices to capture from this specific channel.
I haven’t got a USB 4 mic but on that mic AEC is built in. Also think it has a loopback channel that you can add to the ‘echo’ channel to be removed.
The software EC is for those without hardware EC but your mic should have that.
I think on the standard settings its just channel 0 that has the processing the other channels are just raw mic.
So unless you pull channel 0 only you could be mixing in the echo again.
You really need to have a look at http://wiki.seeedstudio.com/ReSpeaker-USB-Mic-Array/
As without a USB 4 mic I don’t know but @fastjack does have one I think.
Far too many settings for someone who is blind of that particular mic.
There really is a lot going on with that mic.
pi@raspberrypi:~/usb_4_mic_array $ python tuning.py -p
name type max min r/w info
-------------------------------
AECFREEZEONOFF int 1 0 rw Adaptive Echo Canceler updates inhibit.
0 = Adaptation enabled
1 = Freeze adaptation, filter only
AECNORM float 16 0.25 rw Limit on norm of AEC filter coefficients
AECPATHCHANGE int 1 0 ro AEC Path Change Detection.
0 = false (no path change detected)
1 = true (path change detected)
AECSILENCELEVEL float 1 1e-09 rw Threshold for signal detection in AEC [-inf .. 0] dBov (Default: -80dBov = 10log10(1x10-8))
AECSILENCEMODE int 1 0 ro AEC far-end silence detection status.
0 = false (signal detected)
1 = true (silence detected)
AGCDESIREDLEVEL float 0.99 1e-08 rw Target power level of the output signal.
[−inf .. 0] dBov (default: −23dBov = 10log10(0.005))
AGCGAIN float 1000 1 rw Current AGC gain factor.
[0 .. 60] dB (default: 0.0dB = 20log10(1.0))
AGCMAXGAIN float 1000 1 rw Maximum AGC gain factor.
[0 .. 60] dB (default 30dB = 20log10(31.6))
AGCONOFF int 1 0 rw Automatic Gain Control.
0 = OFF
1 = ON
AGCTIME float 1 0.1 rw Ramps-up / down time-constant in seconds.
CNIONOFF int 1 0 rw Comfort Noise Insertion.
0 = OFF
1 = ON
DOAANGLE int 359 0 ro DOA angle. Current value. Orientation depends on build configuration.
ECHOONOFF int 1 0 rw Echo suppression.
0 = OFF
1 = ON
FREEZEONOFF int 1 0 rw Adaptive beamformer updates.
0 = Adaptation enabled
1 = Freeze adaptation, filter only
FSBPATHCHANGE int 1 0 ro FSB Path Change Detection.
0 = false (no path change detected)
1 = true (path change detected)
FSBUPDATED int 1 0 ro FSB Update Decision.
0 = false (FSB was not updated)
1 = true (FSB was updated)
GAMMAVAD_SR float 1000 0 rw Set the threshold for voice activity detection.
[−inf .. 60] dB (default: 3.5dB 20log10(1.5))
GAMMA_E float 3 0 rw Over-subtraction factor of echo (direct and early components). min .. max attenuation
GAMMA_ENL float 5 0 rw Over-subtraction factor of non-linear echo. min .. max attenuation
GAMMA_ETAIL float 3 0 rw Over-subtraction factor of echo (tail components). min .. max attenuation
GAMMA_NN float 3 0 rw Over-subtraction factor of non- stationary noise. min .. max attenuation
GAMMA_NN_SR float 3 0 rw Over-subtraction factor of non-stationary noise for ASR.
[0.0 .. 3.0] (default: 1.1)
GAMMA_NS float 3 0 rw Over-subtraction factor of stationary noise. min .. max attenuation
GAMMA_NS_SR float 3 0 rw Over-subtraction factor of stationary noise for ASR.
[0.0 .. 3.0] (default: 1.0)
HPFONOFF int 3 0 rw High-pass Filter on microphone signals.
0 = OFF
1 = ON - 70 Hz cut-off
2 = ON - 125 Hz cut-off
3 = ON - 180 Hz cut-off
MIN_NN float 1 0 rw Gain-floor for non-stationary noise suppression.
[−inf .. 0] dB (default: −10dB = 20log10(0.3))
MIN_NN_SR float 1 0 rw Gain-floor for non-stationary noise suppression for ASR.
[−inf .. 0] dB (default: −10dB = 20log10(0.3))
MIN_NS float 1 0 rw Gain-floor for stationary noise suppression.
[−inf .. 0] dB (default: −16dB = 20log10(0.15))
MIN_NS_SR float 1 0 rw Gain-floor for stationary noise suppression for ASR.
[−inf .. 0] dB (default: −16dB = 20log10(0.15))
NLAEC_MODE int 2 0 rw Non-Linear AEC training mode.
0 = OFF
1 = ON - phase 1
2 = ON - phase 2
NLATTENONOFF int 1 0 rw Non-Linear echo attenuation.
0 = OFF
1 = ON
NONSTATNOISEONOFF int 1 0 rw Non-stationary noise suppression.
0 = OFF
1 = ON
NONSTATNOISEONOFF_SR int 1 0 rw Non-stationary noise suppression for ASR.
0 = OFF
1 = ON
RT60 float 0.9 0.25 ro Current RT60 estimate in seconds
RT60ONOFF int 1 0 rw RT60 Estimation for AES. 0 = OFF 1 = ON
SPEECHDETECTED int 1 0 ro Speech detection status.
0 = false (no speech detected)
1 = true (speech detected)
STATNOISEONOFF int 1 0 rw Stationary noise suppression.
0 = OFF
1 = ON
STATNOISEONOFF_SR int 1 0 rw Stationary noise suppression for ASR.
0 = OFF
1 = ON
TRANSIENTONOFF int 1 0 rw Transient echo suppression.
0 = OFF
1 = ON
VOICEACTIVITY int 1 0 ro VAD voice activity status.
0 = false (no voice activity)
1 = true (voice activity)
Quite happy though as it gives some example wavs and to be honest its not much better if any than software but if you have hardware you shouldn’t need software.
This always confuses me as these things like the pulseaudio webrtc-aec plugin have vad but it doesn’t tell you where you can access that status?!
If you have AEC running then echo should be attenuated and VAD should only pick up on spoken voice that isn’t echo as media voice can be a real problem.
If you can work out how to turn on EC only use channel 0 for recording and access the VAD status you need to write something to mute media on VAD and then your barging in like a good one
@fastjack I think something must be wrong with setup as yeah they should just work but don’t you have to create an asound.conf and just pull channel 0?
I did find the VAD on the wiki
from tuning import Tuning
import usb.core
import usb.util
import time
dev = usb.core.find(idVendor=0x2886, idProduct=0x0018)
#print dev
if dev:
Mic_tuning = Tuning(dev)
print Mic_tuning.is_voice()
while True:
try:
print Mic_tuning.is_voice()
time.sleep(1)
except KeyboardInterrupt:
break
With the 1 channel firmware (which is the factory default I think… not really sure though) you only get one input channel (which is the processed one with NS, beam forming and AEC). So all good.
If the Respeaker installed firmware is the 6 channels one, you have to setup asound.conf to create a PCM that only forward the channel 0 (processed input signal).
Well that is really weird as the output doesn’t show current settings.
Presuming sudo python tuning.py ECHOONOFF 1
Also dunno how you know if you have the 1 channel or 6 channel firmware running.
But guess if you stick it in a usb run audacity record 6 channels if there is audio in any of them other than 1 you have the 6 channel firmware.
Then you have to flash the mic with the firmware you want
I was the same with SpeexDsp with clock drift but I used a very bad source for the Mic as the clock drift on a USB mic and Pi 3.5mm output completely killed AEC even webrtc AEC that is supposed to have drift compensation.
After successful AEC with Speex and a Respeaker 2 mic where clock drift is not an issue I was extremely interested in trying Webrtc even if it meant installing pulse audio.
Pulseaudio and the Respeakers don’t seem to like each other and if I have to be honest I like the hardware of the 2mic but think the drivers stink.
The Pi 3.5mm is a bit of a stinker but I am thinking if you get 2x I2S microphones such as.
You may well be able to use in conjunction with the Pi3.5mm or even a DAC on the other I2S as thinking they all share the Pi clock.
Its a shame I2S mics don’t seem to work on the Pi4?! But do on 3/zero or so Adafruit say.
I will come back to you on those results
Also webrtc audio processing is in pypi and someone has done a webrtcvad off it but it would be so amazing if someone could do a similar job to the voiceen/ec project and create a similar webrtc alsa fifo ec with all the bells and whistles of the audio processing lib.
webrtc is supposed to have a superior AEC alg than the speex one.