Start questions

Hello. I have four questions:

  1. If I installed it with Docker and hypothetically the Creator decides to delete Rhasspy, I will lose everything or stay running at home?
  2. Everything works well without internet?
  3. Which STT and TTS do you recommend?
  4. Can you install without Docker? That is, with the code, that I can see and touch everything.


1: Stay running, the container is on your device
2: Yes, with the downloading of language models during startup as exception
3: depends on your requirements
4: Yes, use the virtual env:

  1. Can you give me some example, please?
  2. Is it possible to install it without a virtual environment?
    I still do not read the documentation, but I suppose you have to start the environment every time you turn on the Raspberry.

By the way, thank you very much for your response.

I use pocketsphinx as STT, seems to work best in my setup. But you just have to experiment a bit.
For TTS I use Google Wavenet, because I like the (Dutch) voices the best.

4: No, Rhasspy uses a venv to run if you do not want to use docker.
If you use Docker (the simplest way in my opinion), Rhasspy will start automatically as well if you have setup Docker correctly.

I appreciate your help, at the end, use Docker, it is much simpler, without mistakes.

But, I have a question of something you said above and I can not find it in documentation.

Google WaveNet, is offline or google charges money for using it?
I guess it’s the best because google did, or I’m wrong?

Google Wavenet is used only for new sentences (a combination of text, language and samplerate)
That is cached and replayed from cache if the same text needs to be spoken.
But if Google Wavenet shuts down AND you want to use new text that will not be possible,
It has a free tier.

Check here and how to start with venv and all other installation methods

Hi @principe_mestizo,

Which language will you be using Rhasspy for? In general, I recommend Kaldi for STT and nanoTTS for TTS on a Raspberry Pi. If you have a Pi 4, you might give Larynx TTS a try (use “Low Quality” for speed).

If you follow the docs to create a virtual environment, a script gets generated which automatically activates the environment.

I’d highly recommend sticking with Docker, though. It’s much easier to set up and to keep updated.

Languages Spanish and English, why do you recommend Kaldi?

I suppose in DeepSpeech is better because it is from a Mozilla team (I do not know). I have to try them both, I have only tried the one of mozilla and everything is going well, kaldi would be better?

Use RPI 4 (4 and 8 GB of RAM).

By default, Kaldi is set to generate a “Text FST” instead of something called an n-gram model (which is what DeepSpeech uses).

Kaldi’s Text FST will only ever recognize sentences from your trained voice commands, whereas the n-gram model will accept “similar” sentences. Depending on your use case, you may want the extra strictness from Kaldi.

What do you mean by “Similar”? Words that sound the same or synonymous with the words?

Obviously I do not want false positives, but, that if I want to say something, he understands the first one.

Not similar words, but similar sentences. The words are fixed by what’s in sentences.ini, but with the n-gram model, the sentences are matched according to a 3-word context window.

So if you have lots of slots with phrases like “the living room” and “the downstairs bathroom”, it might accept “the downstairs room” even if you never had that in a slot. With Kaldi + text FST, that will not happen.

1 Like