Tricks to reduce miss detections / wake ups

ChrizZz · December 17, 2021, 10:27pm

For some weeks I am testing Rhasspy with the idea to use Rhasspy in my house with satellites in most of the rooms with good speakers in some rooms and with cheap speakers in other rooms - depending on the type of room as I want to use the satellites also as multiroom speakers based on Squeezelite. In my test environment, I have Rhasspy hosted as a docker container on a Raspberry Pi 4 4GB just as a base station without microphone/speaker connected and 2 satellites: 1 Raspberry Pi 4 4GB with ReSpeaker 4-Mic Array and 1 Raspberry Pi 0 2 with ReSpeaker 2-Mics Pi HAT.

Rhasspy settings are the following with Porcupine for wake word detection and Kaldi for Speach to Text:

{
    "dialogue": {
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "system": "rhasspy",
        "volume": "0.65"
    },
    "handle": {
        "remote": {
            "url": "http://IP_RHASSPY_BASE:PORT/intent"
        },
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "system": "remote"
    },
    "intent": {
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "system": "fsticuffs"
    },
    "mqtt": {
        "enabled": true,
        "host": "IP:Port",
        "password": "xxx",
        "site_id": "pi4base",
        "username": "USERNAME"
    },
    "sounds": {
        "error": "${RHASSPY_PROFILE_DIR}/wav/error.wav",
        "recorded": "${RHASSPY_PROFILE_DIR}/wav/recorded.wav",
        "system": "hermes",
        "wake": "${RHASSPY_PROFILE_DIR}/wav/wake.wav"
    },
    "speech_to_text": {
        "kaldi": {
            "cancel_probability": "0.01",
            "min_confidence": "0.4",
            "unknown_words_probability": "0.000001"
        },
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "system": "kaldi"
    },
    "text_to_speech": {
        "espeak": {
            "voice": "de"
        },
        "larynx": {
            "vocoder": "vctk_medium"
        },
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "system": "nanotts"
    },
    "wake": {
        "porcupine": {
            "keyword_path": "jarvis_raspberry-pi.ppn",
            "sensitivity": "0.5",
            "udp_audio": "172.21.0.2:20000"
        },
        "satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
        "snowboy": {
            "model": "kalypso.pmdl",
            "sensitivity": "0.5",
            "udp_audio": "172.21.0.2:20000"
        },
        "system": "porcupine"
    }
}

But I am really struggling with the quality of correct wake-ups and the understanding of the sentences. If the TV is on or I am in video conference sessions, I have multiple wrong wake-ups and also wrong detections of what was said. Rhasspy tells me the time or the weather situation but nobody was talking about these topics.

Because of this, I tried snowboy with the wake word “Kalypso” but the detection rate dropped. I also tried “Kalypso” with porcupine but this didn’t work, Rhasspy never woke up. I changed the settings back to Porcupine with “Jarvis” and tried my luck with the wake word sensitivity, but “0.5” is the lowest value I can use. If I change this to “0.4” Rhasspy never wakes up. I also tried to change the “Unknown Words Probability”, “Silence Probability” and “Cancel Probability” but without any changes in the overall result.

Is there something else I could try or should change to improve the quality or is this the best result I can have with this hardware and the available open-source solutions?

romkabouter · December 17, 2021, 11:33pm

If you need better accuracy with the wake-word and background noise, you need more advanced things like Noise Suppression and Acoustic Echo Canceling.

There are various topics here on the forum, but I am not sure there is an “easy” way to get that going.

SLatour007 · December 22, 2021, 2:01pm

It may sound conter intuitive but I had better result with low quality microphone than with the ReSpeaker. It seems like the high quality speaker capture too much ambiant sound which makes the detection difficult.

ChrizZz · December 22, 2021, 7:41pm

can you share which microphone you are using?

SLatour007 · December 22, 2021, 10:54pm

Microsoft LifeCam Cinema (USB Web Cam). Must be 5 to 7 years old.