[rhasspy-asr-google] ASR Google Cloud STT

Hello! I’m here to inform mainly @synesthesiam (but everyone else interested too) that I’ve started porting the Google Cloud STT module to Rhasspy 2.5.

@synesthesiam before I start with questions :angel: I’d like to know if you already started working on it (to avoid doing the same job, I already spent a couple of hours and forgot to ask you eheh). Thanks!

5 Likes

Here is also the Hermes module:

The whole thing seems to be working (some stuff is still missing though, mainly scripts and other boilerplate).

I miss the min_confidence management. The pocketsphinx module just passes the likelihood to the Rhasspy “core”, but then any transcription is accepted without even considering the confidence. Is that right? Am I missing something?

2 Likes

Thank you for getting started on this! Once it’s in a workable state, let’s talk about the best way to get it into users’ hands.

One integration approach is to add it as a git submodule in rhasspy-voltron and add a configuration section in the web interface.

Another approach is to leave it separate, and come up with an easy way for users to add external services to their Rhasspy installation (probably via Docker). The user would then select “Hermes MQTT” for speech to text, and probably do additional configuration via the service itself (or something in their profile).

The first approach is probably easier for users, but the second approach gives you more control to change defaults, how configuration works, and which version of your service is in use without having to go through a pull request.

Let me know your thoughts.

No, I’ve just not found a good solution to the different ways that each speech system represents “confidence”. I’m thinking ultimately it will just have to be a setting within each service.

I would prefer this approach. I’d like to stick to the existing working model for now.

The code is working, I’ve been using it for a few weeks now. I did test it only on a Python venv-based installation. No Docker, no Debian packaging and so on. I guess I should test those before inclusion in the voltron repo?

I’ve done something too, since. I made some modifications to make the dialogue manager send NluIntent when receiving an NluIntentParsed (remember this?). This implied that the dialogue manager is the only central entity that knows about confidence and should handle minimum confidence in dialogue (that’s what I thought at least). What do you think?

1 Like