Almost there but but just can't deploy :(

DaveKearley · April 22, 2021, 11:37am

Been playing with Rhasspy on a Pi3b+ and 2-mic hat for a month or so now, I wanted to use it to replace an Alexa so we get offline voice assist.

Seems no matter what I try i just can’t get it good enough to deploy it, take today for example - I wanted to turn the Sky box on (which also turns TV on) the sentence is “turn sky on”, there is also a sentence “turn plex on” which turns our media player on.

1st attempt gave no match, 2nd attempt gave the affirmative beep but then proceeded to turn plex on instead of sky

No matter what i try it seems impossible to set ASR score high enough to block falsies while still allowing it to work, it needs to be up around 0.95 but a sentence like “set a timer for 25 minutes” will always fail as its down around 0.55 - the actual intent is…
[SetTimer]
minutes = (1){min} minute | (2…59){min} minutes
seconds = (1){sec} second | (2…59){sec} seconds
set [a] timer for

It also cannot recognise “bedtime” and just opts to go for “whats the time” and gives me the time instead of turning everything off

Please don’t take this as a dig, I think Rhasspy is a brilliant project and something that really is needed today but I just can not get it to work well enough to deploy it and turn off Alexa. I’m not sure of what else i can try on it next, I’m not that skilled with pi code so more of a cut-n-paste user.

Have any of you guys got it good enough to use in the real-world???

romkabouter · April 22, 2021, 5:43pm

Maybe it helps if your rephrase sky into skybox and bedtime into something with sleep?

DaveKearley · April 22, 2021, 6:04pm

Yeah I can play a bit but just chose those phrases as thats what we have used on Alexa for over a year so its sort of stuck

It does seem to default to time which is odd.

Daenara · April 22, 2021, 7:27pm

That could be because it weights between all the sentences it knows, and the timer has tons of sentences because it generates different combinations of minutes and seconds (maybe even all of them) for training. So sentences with fewer slots are represented less and so it defaults to the timer if it has no idea. I had the same problem a while ago and I think I fixed it by switching the default language model for kaldi to text fst, but that should be the default now. You can check that and see if switching helps.

DaveKearley · April 22, 2021, 7:33pm

Yes it is on text.fst so i guess it defaults to that.

It never chooses the timer, its most favoured error choice is to just tell me the time In fact with ASR confidence above 0.7 or so its impossible to set a timer as it scores way lower.

Daenara · April 22, 2021, 7:47pm

Well, mine defaults to the weather, but then it doesn’t know much more than that at this point.

You could try switching around the oder of the intents and see if that helps, or playing around with the wording of the timer or the time, maybe that is just too similar. Also, you could try and save you asking for each intent and compare them, could be that your pronunciation for timer is closer to what the model expects it to be for time, after all, the model wasn’t trained for you specifically. I had that issue with the default wakeword computer that exists for precise, had to train my own for it to understand me. To test for that, maybe use tts for mic input and see if it still happens, or temporarily use open transcription and see what it actually understands when not matching to the closest intent.

DaveKearley · April 22, 2021, 7:51pm

Thats a good idea, i’ll try open transcribe and see what goes on.

DaveKearley · April 23, 2021, 7:15am

I tried turning on open transcription, it downloaded some files but seems to make no difference??

Where is the transcribed text output???

grizewald · April 23, 2021, 9:20am

If you are experiencing poor recognition, have you recorded some audio and listened to the result? If your microphone isn’t delivering good quality audio, then you’ve failed to clear the first hurdle.

DaveKearley · April 23, 2021, 10:01am

It tends to sound muffled unless i’m within 1m of it, i have the 2-mic hat but no idea if there are better mics - its also handy as it runs the speaker direct.

JGKK · April 23, 2021, 11:31am

The thing with the respeaker 2mic hat is that you really have to tune the settings to get better performance, especially to improve the far field capabilities.
You actually have to manually turn on things like the noise gate and the automatic gain control as they are not on by default. Afterwards tune settings like attack, decay, gain by recording and listening back a ton.
If you google the spec sheet for the audio chip it uses it actually tells you what does what and in which increments.

DaveKearley · April 23, 2021, 11:33am

Really?

That sounds like an advanced course in one go - I have no idea how to do any of that

JGKK · April 23, 2021, 11:41am

You can do everything in alsamixer. Just type alsamixer in to your shell than F6 to choose the right card and than F4 to change to only output settings.
For a starting point you can also create a bash file somewhere on your Rhasspy system, for example call it settings.sh with this content:

#!/bin/bash

amixer -c "seeed2micvoicec" cset numid=1 0,0
amixer -c "seeed2micvoicec" cset numid=10 235,235
amixer -c "seeed2micvoicec" cset numid=26 3
amixer -c "seeed2micvoicec" cset numid=27 3
amixer -c "seeed2micvoicec" cset numid=28 3
amixer -c "seeed2micvoicec" cset numid=29 1
amixer -c "seeed2micvoicec" cset numid=30 0
amixer -c "seeed2micvoicec" cset numid=32 7
amixer -c "seeed2micvoicec" cset numid=33 7
amixer -c "seeed2micvoicec" cset numid=34 31
amixer -c "seeed2micvoicec" cset numid=35 on
alsactl store

and than run it with sudo bash settings.sh.
This is my current starting point for the 2mic hat. This script uses amixer‚s cset to set some parameters like agc on and the noisegate or target gain directly.
When you run it you will see what did. what in the output of the Shell.
For some exciting reading here is the spec sheet for the chip that the 2 mic uses:

DaveKearley · April 23, 2021, 12:51pm

Thanks, those code snips gave me this setting in alsamixer, presume it worked?

I’ll do some testing later today

DaveKearley · April 24, 2021, 7:24am

Its a tiny bit better, but i downloaded a wav or two and in mac quicktime player the audio is barely there, extremely quiet.

I don’t have audacity yet as mac needs an OS update so cant do any serious analysis but it is very very quiet

In my mixer pic above, should “capture” be on zero???

JGKK · April 24, 2021, 7:46am

if agc is on capture will not do anything. You will have to tune alc target gain and alc max gain for more loudness. But even with the settings above its not quite for me at all.