Training a better STT model?

Hello @synesthesiam and everyone!
first off, thanks for this amazing piece of software !
I posted the same question on the Home Assistant forum, before knowing there is a rhasspy one, sorry :slight_smile:

I’m using the virtualenv installation on a rpi 3, with a ps3eye mic via pyaudio, with the Italian language model.

I’d like to know if there is any way to improve the speech recognition, providing more examples for a more reliable word matching, as I’m experiencing a very frustrating unreliable recognition.
In fact, the commands accendi (turn on) and spegni (turn off) get mixed very often (70% of the times, even without any noise in the room) I’m puzzled since they sound very different, and i’m not even talking with my hands (which is very difficult for an Italian :smile: )
jokes aside, looking ad the phonemes for these custom words, they look, and sound, reasonably close to the real pronunciation, but I’m wondering if and how I can provide more examples for a better model.
Thanks !

1 Like

Hi, I’m italian too and I’m using Pocketsphinx and probably you too and I have the same problems…
I heard that Kaldi is better than Pocketsphinx but actually there isn’t a italian model for Kaldi :frowning:
I’m trying to obtain one with @synesthesiam :slight_smile:

As a quik fix, try to increase your ps3eye volume
I see a big difference even for en profile :crazy_face:

That’s great ! maybe we can share our custom words phonetics, meanwhile I’ll have a read here about different phonemes, so my approach in enriching the custom words, would be less based on trial and error!

Hi !
didn’t think of that !
but I actually have no idea on how to do it, since I’m using pyaudio and not alsa, where I could raise the mic threshold via alsa-config. Or am I wrong ? :wink:

Uhm, sadly adding more pronunciations to the custom words didn’t help,
moreover I’m getting an incredible amount of false positives for the wake word (almost continuously, if there’s people talking) but I think this could be tweaked the sensitivity, or better, switching from pocketsphinx, to Mycroft.
Despite the false positives, when I was alone and testing, there is also a weird behaviour on the wake word recognition timing:
After training, the first time I speak it recognizes the wake word, and then the “Voice command Begin” starts right as I dictate the command, but when after a few seconds, when I use the wake word again, nothing appears to happen, but as soon as I speak again, it starts recognizind the command too late, missing the first part of the command. Something like this:

< mycustomwakeword >
wakeword detected
< do something >
as soon as I speak, Voice command is detected

but after a few seconds:
< mycustomwakeword >


nothing
< do something >
wake word detected and Voice command detected
and it only gets the last part of the sentence.

and it keeps to not catch up with my dictation.
anyone got an insight on this ?

Maybe it’s not the best way, but I would try docker installation…
Btw, I saw you mentioned home assistant. Is it hassio?

I have rpi3b+, ps3eye and rhasspy as hassio addon… using it every day to control my devices and see absolutely no issues :upside_down_face:

Hi
I’m not quite sure what are you talking about: could you elaborate a bit more your answer ?

Maybe it’s not the best way, but I would try docker installation…

for ?

Btw, I saw you mentioned home assistant. Is it hassio?

yeah, I mentioned that I posted this issue on that forum, before realizing thare is the Rhasspy one :wink: and no, my aim is to use a custom made dashboard in node-red for controlling all my mqtt enabled devices.

I have rpi3b+, ps3eye and rhasspy as hassio addon… using it every day to control my devices and see absolutely no issues

Are you also running Rhasspy in italian language without so many false positives ?
thanks for your time :wink:

Hi @Vik
I mean this type of installation
https://rhasspy.readthedocs.io/en/latest/installation/#docker

To be sure that problem is not in virtual env

And no, I’m using en profile… but delays you mentioned are really strange🙈

1 Like

Is your Rhasspy RPi connected via wired Ethernet or WiFi?

Thanks, @frkos :slight_smile: i’ll try that!
@FredTheFrog currently it’s wired, so i doubt it has something to do with network delays or packet lost.
Doesn’t matter how much I wait, it just recognizes the hotword, as soon as I speak the actual command.
not sure if that started to happen as soon as I trained many different pronunciations for the same word, since I did not test it intensively before that.

I couldn’t get pocketsphinx to work for wake word detection at all.

As a separate issue, there was also a recent version of Rhasspy where if I used the text-to-speech function, after that it would log that it was detecting my wakeword (snowboy) but wouldn’t actually wake or recognize commands unless I restarted. That has been fixed in the more recent versions, so I wonder what version you’re using?

hi @OC2019OC
I had to manually add my pocketsphinx custom hotword (and it’s phonemes) to the dictionary.txt (and have to do it again everytime I re-train the model) You may have success this way.
meanwhile I’m away and have no means to ssh into my pi, but I was using the latest version avaiable pulled two weeks ago.
I still have to try the installation via Docker.

I had actually tried to add the wake word to the custom word but it still didn’t work. Snowboy just worked much better, immediately, for me.

I think the text-to-speech problem I had was in 2.4.15. Like I said, after I had Rhasspy speak the log would show hotword detected but wouldn’t do anything about it (no wake, no wake sound, no voice command recognized, etc.). So it knew it was hearing the wake word but did nothing. If you’re getting the voice command recognized, too, it sounds like a different issue than mine was. I think it would still recognize and act on intents that I typed into the UI. It was solved in .16 for me.

Are the log entries not keeping up with your speech or is it the response behavior/actions/sound of the Rhasspy system? My log is always behind.

Hello everyone,
I installed the latest version with the Docker method, and still experiencing the same poor quantity of recognition (still getting the turn on and turn off commands crossed most of the times) and the same delay with the pocketsphinx custom word.
I suspect it’s something wrong I’m doing with the custom hotword: I always have to add it to the dictionary.txt file, after each training (it gets deleted everytime, no matter if i put it inside my custom words file, along with it’s phonemes) and then restart the engine.
After each restart, the hotword works promptly only at the first wake attempt. on the second and so on, hotword (and wake sound) only start when I pronounce the command (and it often gets truncated in the beginning, hence maybe the confusion with the turn off/turn command on recognition.
I think I may notice some changes if someone can step in and advice me on how to properly set the custom hotword, so that I don’t need to edit the dictionary.txt after the training is done and before restarting the engine.
Likely a good place to start, don’t you think @synesthesiam ? :slight_smile:

I still have problems with the wake word. I have tried the following approaches:

You always have to play around with sensitivity to find a compromise between sufficiently sensitive and false positive reactions.

As for STT: Kaldi has improved everything for me! If you will get a model in your language you will probably be satisfied.

Silly question: are you saving the custom words file before you train and before you restart?

Training has taken up to 45 seconds for me (even with very little content or changes) on occasion, so I’ve had to wait longer than I wanted to restart. If you’re training or restarting too quickly, that could cause the issue you’re reporting.

Another possibility: many people are having trouble with PocketSphinx wake words (including me - I never got it to work even with wake words found in the dictionary and an English profile). Have you tried Porcupine for a wake word system? This is the first thing I would try to figure out if it’s PocketSphinx wake word causing your problem.

Then next, some people are reporting better speech recognition with Kaldi than with PocketSphinx (both worked fine for me with English profiles). I’m not sure if this is an option for your language.

I hope someone can help you fix this! Good luck!

HI,
I add my custom wakeword after the training but before restarting,
If I remember correctly, kaldi is not avaiable for Italian language, can’t remember if i’m wrong, in that case, I’ll give it a try!
meanwhile, anyone got the right procedure to add a custom wakeword to pocketsphinx ? since I feel mine is a dirty workaround :wink:
Thanks!

I went back installing Rhasspy via virtual env, both because I couldn’t find a way to watch it’s log in realtime (had to stick to the log tab in it’s web ui) and it was failing to connect to my mqtt server, that was perfectly reachable on the same rpi.
with this fresh install on a raspberry pi running debian buster, I tried to replicate my issues, without adding too much to the “equation”, so:
I downloaded the IT profile, and trained the assistant, without touching the default sentences/words, hit the “wake” button and spoke the command “accendi (turn on) lampada (lamp) soggiorno(living room)” and got, frustratingly enough, recognized “spegnere (turn off) lampada soggiorno”… and of course also the opposite: I’m starting to think I have a serious speech impairment :zipper_mouth_face:

Now, looking at the custom words, and their pronunciation, I’m not sure how close they are supposed to be to how they actually sound, while some of them are quite right, a lot of them are way off, but overall, when I push the “Pronounce” button, it all sounds english to me (a glitching english speaking robot at an italian restaurant, actually :rofl:)

Anyone else using it in a different language than english experiencing this ? Or am I being naive, and pocketsphins phonemes are supposed to be an approximation of english sounds anyway, no matter what language model we use ?

Despite all this, the only disastrous unreliability I experience, is mostly about “turn on, turn off”, other words are very acceptably recognized, that really confuses me.

On the pocketsphinx wakeword detection, I also didn’t change anything but just enabled it with it’s default “okay rhasspy”, and got this:

WARNING:PocketsphinxWakeListener:okay not in dictionary
WARNING:PocketsphinxWakeListener:rhasspy not in dictionary

(but it added both words to the custom words list)
How can we use pocketsphinx wake word at all ?

I really hope testing both issues on a fresh install, could provide some useful feedback

P.S. anyone else getting random crashes on command recognition ?

DEBUG:WebrtcvadCommandListener:Voice command started
corrupted double-linked list
./run-venv.sh: line 28: 4422 Aborted python3 app.py “$@”