Remote Server Not Recognizing Words

I’m trying to setup a remote server for STT, as my RPi 3B+ takes too long to process through kaldi. However, the remote server (Intel NUC), is having a difficult time recognizing words. It sees weather as lamp or red, TV as tell on, etc. I am using the same setup as my RPi 3B+, which recognizes the words fine (just takes a long time). Is there anyway I can improve the recognition? I have tried both pocketsphinx and kaldi, and added custom words, but the failures persist

Any errors in the logs? I had some trouble initially too (mainly openfst not finding a library and crashing), but have kind of the same setup now and it works well.

What type of NUC do you have? I had a too old one initially, so most of the AI libraries used here are compiled with processor features that the NUC doesn’t support. I tried to compile tensorflow for that NUC once and it failed after 3 days of compiling (then I switched to cross compiling). But I retired it now as it’s too much efffort. If your NUC is not a celeron like mine and has i3/i5 or i7 and should actually be fine.

It’s an NUC7JY with an Intel® Pentium® Silver J5005 CPU. Did you have to do anything specific to get to work or did it start recognizing better with the new NUC?

Oh, that’s a much more modern NUC than mine - that should work fine as a server - I use an i5-3320M based thinkpad x230 with 8GB Ram now as server (running Manjaro Linux) - it doesn’t listen itself to the microphone, just a normal rhasspy virtual env installation with wakeword detection switched off and kaldi and openfst enabaled (the same machine also runs a mary tts docker image). So, my NUC is sitting on my desk currently waiting for another project.

However, I have my raspberry pi in the kitchen connected to that server, just using the spech to text web link. The raspberry pi 3 still runs its own wakeword detection based on porcupine, but text to speech, speech to text, and intent recognition happen on the x230 server. And the speed increase is very significant - down from 5s-10s to less than 1s. And all this already works on 2.4.19.

Anything in your logs (on pi or NUC) that indicate that something is going wrong?

Yeah, the server is working fine and the response is almost instantaneous. The only indication of a problem is in the raw_text of the client log which displays what the server is transcribing (described in OP)

Are you running the docker or a virtual environment (on which distribution) on the NUC?

And definetely try using kaldi and openfst and really make sure training was successful.

using Docker on 16.01 Ubuntu. I tried both kaldi and pocketsphinx, and left the Openfst enabled by default. All training seemed to work properly (i.e. no problems on the gui)

Hmm, docker again - here I have the virtualenv adn an uptodate 2020 OS. Sorry, just can tell you at this point that it works on my setup - I know it’s not a big help, but it is possible to use speech to text server with rhasspy. Maybe someone else has an idea? I really need to write down my setup here and publish it to the show us category.

Good luck, please let us know when you find the error.

Thanks, maybe I’ll try removing the image and rebuilding. By chance, what version of rhasspy are you on? Do client and server need to be on the same version?

I have both on 2.4.19 and yes I think had some trouble with the setup when I was still on 2.4.18.

Maybe that’s my issue, server is .19, whereas the client is .17

Ah, try an upgrade - maybe that’s an easy solve.

hmm now i’m just being dumb. How do I upgraded to the newest version? I removed my old docker image and pulled the latest, but when I run the docker command from the docs, 2.4.17 starts up vs .19

make sure to really remove the old docker file?
docker ps doesn’t show everything, try docker images and then docker rm?

I had to clear my cache, after that it popped up to .19. No word on whether the out of sync version numbers are the cause for the recognition failure, but I’ll report back

Edit: no difference using the same versions, remote server still isn’t recognizing words. Even tried deleting remote server and starting over, but same outcome

Edit #2: It seems to have a problem with the very end of a sentence (with the exception of "whats the temperature). I’m wondering if it not sending the full WAV file or something is getting cut off. Is there any setting for something like that?

Edit #3: Here is a sample of an error I am getting:

[DEBUG:5895183] APlayAudioPlayer: [‘aplay’, ‘-q’, ‘/usr/share/rhasspy/etc/wav/beep_error.wav’]
[INFO:5894577] quart.serving: 192.168.86.24:58146 GET /api/events/intent 1.1 101 - 1702785
[DEBUG:5894566] SnowboyWakeListener: loaded -> listening
[DEBUG:5894564] DialogueManager: ready -> asleep
[INFO:5894563] DialogueManager: Automatically listening for wake word
[DEBUG:5894560] DialogueManager: handling -> ready
[DEBUG:5894558] WebSocketObserver: {“text”: “”, “intent”: {“name”: “”, “confidence”: 0}, “entities”: [], “raw_text”: “”, “speech_confidence”: 1, “wakeId”: “snowboy/alexa.umdl”, “siteId”: “default”, “slots”: {}}
[DEBUG:5894556] DialogueManager: recognizing -> handling
[DEBUG:5894555] DialogueManager: {‘text’: ‘’, ‘intent’: {‘name’: ‘’, ‘confidence’: 0}, ‘entities’: [], ‘raw_text’: ‘’, ‘speech_confidence’: 1, ‘wakeId’: ‘snowboy/alexa.umdl’, ‘siteId’: ‘default’}
[ERROR:5894551] FsticuffsRecognizer: in_loaded
Traceback (most recent call last):
File “/usr/share/rhasspy/rhasspy/intent.py”, line 183, in in_loaded
assert recognitions, “No intent recognized”
AssertionError: No intent recognized
[DEBUG:5894546] DialogueManager: decoding -> recognizing
[DEBUG:5894544] DialogueManager: (confidence=1)
[DEBUG:5894540] urllib3.connectionpool: http://192.168.86.46:12101 “POST /api/speech-to-text?profile=en HTTP/1.1” 200 0
[DEBUG:5894398] urllib3.connectionpool: Starting new HTTP connection (1): 192.168.86.46:12101

Is your Kaldi configuration set with “Open transcription mode” ticked.
If so it would probably account for what you are describing, so turn it off and try again.

Nope not set to open transcription. And I just tried to rebuild the remote server (deleted and rebuilt the docker). Same thing happens. Hears weather as lamp or red (“what’s the weather” is my test, but it fails on other things).

I am junior for all of these, I just installed Rhasspy in an remote server (Debian Bullseys), then how I could get audio in local windows 10?

It will be very appreciated if you can help!