For some weeks I am testing Rhasspy with the idea to use Rhasspy in my house with satellites in most of the rooms with good speakers in some rooms and with cheap speakers in other rooms - depending on the type of room as I want to use the satellites also as multiroom speakers based on Squeezelite. In my test environment, I have Rhasspy hosted as a docker container on a Raspberry Pi 4 4GB just as a base station without microphone/speaker connected and 2 satellites: 1 Raspberry Pi 4 4GB with ReSpeaker 4-Mic Array and 1 Raspberry Pi 0 2 with ReSpeaker 2-Mics Pi HAT.
Rhasspy settings are the following with Porcupine for wake word detection and Kaldi for Speach to Text:
{
"dialogue": {
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"system": "rhasspy",
"volume": "0.65"
},
"handle": {
"remote": {
"url": "http://IP_RHASSPY_BASE:PORT/intent"
},
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"system": "remote"
},
"intent": {
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"system": "fsticuffs"
},
"mqtt": {
"enabled": true,
"host": "IP:Port",
"password": "xxx",
"site_id": "pi4base",
"username": "USERNAME"
},
"sounds": {
"error": "${RHASSPY_PROFILE_DIR}/wav/error.wav",
"recorded": "${RHASSPY_PROFILE_DIR}/wav/recorded.wav",
"system": "hermes",
"wake": "${RHASSPY_PROFILE_DIR}/wav/wake.wav"
},
"speech_to_text": {
"kaldi": {
"cancel_probability": "0.01",
"min_confidence": "0.4",
"unknown_words_probability": "0.000001"
},
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"system": "kaldi"
},
"text_to_speech": {
"espeak": {
"voice": "de"
},
"larynx": {
"vocoder": "vctk_medium"
},
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"system": "nanotts"
},
"wake": {
"porcupine": {
"keyword_path": "jarvis_raspberry-pi.ppn",
"sensitivity": "0.5",
"udp_audio": "172.21.0.2:20000"
},
"satellite_site_ids": "rhasspy-mobile-chris,atom1,pi4sat,pi0sat",
"snowboy": {
"model": "kalypso.pmdl",
"sensitivity": "0.5",
"udp_audio": "172.21.0.2:20000"
},
"system": "porcupine"
}
}
But I am really struggling with the quality of correct wake-ups and the understanding of the sentences. If the TV is on or I am in video conference sessions, I have multiple wrong wake-ups and also wrong detections of what was said. Rhasspy tells me the time or the weather situation but nobody was talking about these topics.
Because of this, I tried snowboy with the wake word “Kalypso” but the detection rate dropped. I also tried “Kalypso” with porcupine but this didn’t work, Rhasspy never woke up. I changed the settings back to Porcupine with “Jarvis” and tried my luck with the wake word sensitivity, but “0.5” is the lowest value I can use. If I change this to “0.4” Rhasspy never wakes up. I also tried to change the “Unknown Words Probability”, “Silence Probability” and “Cancel Probability” but without any changes in the overall result.
Is there something else I could try or should change to improve the quality or is this the best result I can have with this hardware and the available open-source solutions?