Is it possible to install it without a virtual environment?
I still do not read the documentation, but I suppose you have to start the environment every time you turn on the Raspberry.
By the way, thank you very much for your response.
I use pocketsphinx as STT, seems to work best in my setup. But you just have to experiment a bit.
For TTS I use Google Wavenet, because I like the (Dutch) voices the best.
4: No, Rhasspy uses a venv to run if you do not want to use docker.
If you use Docker (the simplest way in my opinion), Rhasspy will start automatically as well if you have setup Docker correctly.
Google Wavenet is used only for new sentences (a combination of text, language and samplerate)
That is cached and replayed from cache if the same text needs to be spoken.
But if Google Wavenet shuts down AND you want to use new text that will not be possible,
It has a free tier.
Check here and how to start with venv and all other installation methods
Which language will you be using Rhasspy for? In general, I recommend Kaldi for STT and nanoTTS for TTS on a Raspberry Pi. If you have a Pi 4, you might give Larynx TTS a try (use “Low Quality” for speed).
If you follow the docs to create a virtual environment, a rhasspy.sh script gets generated which automatically activates the environment.
I’d highly recommend sticking with Docker, though. It’s much easier to set up and to keep updated.
Languages Spanish and English, why do you recommend Kaldi?
I suppose in DeepSpeech is better because it is from a Mozilla team (I do not know). I have to try them both, I have only tried the one of mozilla and everything is going well, kaldi would be better?
By default, Kaldi is set to generate a “Text FST” instead of something called an n-gram model (which is what DeepSpeech uses).
Kaldi’s Text FST will only ever recognize sentences from your trained voice commands, whereas the n-gram model will accept “similar” sentences. Depending on your use case, you may want the extra strictness from Kaldi.
Not similar words, but similar sentences. The words are fixed by what’s in sentences.ini, but with the n-gram model, the sentences are matched according to a 3-word context window.
So if you have lots of slots with phrases like “the living room” and “the downstairs bathroom”, it might accept “the downstairs room” even if you never had that in a slot. With Kaldi + text FST, that will not happen.