[rhasspy-asr-google] ASR Google Cloud STT

Hello! I’m here to inform mainly @synesthesiam (but everyone else interested too) that I’ve started porting the Google Cloud STT module to Rhasspy 2.5.

@synesthesiam before I start with questions :angel: I’d like to know if you already started working on it (to avoid doing the same job, I already spent a couple of hours and forgot to ask you eheh). Thanks!


Here is also the Hermes module:

The whole thing seems to be working (some stuff is still missing though, mainly scripts and other boilerplate).

I miss the min_confidence management. The pocketsphinx module just passes the likelihood to the Rhasspy “core”, but then any transcription is accepted without even considering the confidence. Is that right? Am I missing something?


Thank you for getting started on this! Once it’s in a workable state, let’s talk about the best way to get it into users’ hands.

One integration approach is to add it as a git submodule in rhasspy-voltron and add a configuration section in the web interface.

Another approach is to leave it separate, and come up with an easy way for users to add external services to their Rhasspy installation (probably via Docker). The user would then select “Hermes MQTT” for speech to text, and probably do additional configuration via the service itself (or something in their profile).

The first approach is probably easier for users, but the second approach gives you more control to change defaults, how configuration works, and which version of your service is in use without having to go through a pull request.

Let me know your thoughts.

No, I’ve just not found a good solution to the different ways that each speech system represents “confidence”. I’m thinking ultimately it will just have to be a setting within each service.

I would prefer this approach. I’d like to stick to the existing working model for now.

The code is working, I’ve been using it for a few weeks now. I did test it only on a Python venv-based installation. No Docker, no Debian packaging and so on. I guess I should test those before inclusion in the voltron repo?

I’ve done something too, since. I made some modifications to make the dialogue manager send NluIntent when receiving an NluIntentParsed (remember this?). This implied that the dialogue manager is the only central entity that knows about confidence and should handle minimum confidence in dialogue (that’s what I thought at least). What do you think?

1 Like

Thank you @daniele_athome for looking into this!
When I look at the project, it states “Currently a work in progress. Stay tuned.” in the readme, is it possible to update it with a getting started guide? So people easier can try it out and discover potential bugs before we add it to the rhasspy project? I would love to try it out.

1 Like

Some people have been asking me that as well, so here are some untested instructions. They are untested because I use my own fork of Rhasspy to ease integration (and to add my own patches), but it should work just fine with the original Rhasspy source code.

  1. clone both my repositories inside rhasspy-voltron
  2. add two lines to the file RHASSPY_DIRS with the two new folder names
  3. add the below lines to rhasspy/__main__.py (approximately after line 8):
  1. add the following to rhasspy-server-hermes/profiles/default.json and rhasspy-profile/profiles/default.json:
"google": {
  "credentials": "api-credentials.json",
  "min_confidence": 0,
  "language_code": "en_US",
  "compatible": true
  1. run installation process from scratch (see Rhasspy docs)
  2. obtain a JSON credentials file from Google Cloud Console and put into your profile folder
  3. configure your profile (you have to add this manually, no UI is available for this):
"speech_to_text": {
  "google": {
    "credentials": "full_path_to_credentials_json_file",
    "language_code": "yourlanguage_yourcountry",
    "min_confidence": 0.7,
    "webrtcvad": {your webrtcvad configuration, refere to Rhasspy docs}
  "system": "google"
  1. start Rhasspy!