Start questions

principe_mestizo · April 14, 2021, 8:40pm

Hello. I have four questions:

If I installed it with Docker and hypothetically the Creator decides to delete Rhasspy, I will lose everything or stay running at home?
Everything works well without internet?
Which STT and TTS do you recommend?
Can you install without Docker? That is, with the code, that I can see and touch everything.

Greetings.

romkabouter · April 14, 2021, 8:48pm

1: Stay running, the container is on your device
2: Yes, with the downloading of language models during startup as exception
3: depends on your requirements
4: Yes, use the virtual env: https://rhasspy.readthedocs.io/en/latest/installation/#virtual-environment

principe_mestizo · April 14, 2021, 9:00pm

Can you give me some example, please?
Is it possible to install it without a virtual environment?
I still do not read the documentation, but I suppose you have to start the environment every time you turn on the Raspberry.

By the way, thank you very much for your response.

romkabouter · April 14, 2021, 9:04pm

I use pocketsphinx as STT, seems to work best in my setup. But you just have to experiment a bit.
For TTS I use Google Wavenet, because I like the (Dutch) voices the best.

4: No, Rhasspy uses a venv to run if you do not want to use docker.
If you use Docker (the simplest way in my opinion), Rhasspy will start automatically as well if you have setup Docker correctly.

principe_mestizo · April 15, 2021, 5:40am

I appreciate your help, at the end, use Docker, it is much simpler, without mistakes.

But, I have a question of something you said above and I can not find it in documentation.

Google WaveNet, is offline or google charges money for using it?
I guess it’s the best because google did, or I’m wrong?

romkabouter · April 15, 2021, 7:25am

Google Wavenet is used only for new sentences (a combination of text, language and samplerate)
That is cached and replayed from cache if the same text needs to be spoken.
But if Google Wavenet shuts down AND you want to use new text that will not be possible,
It has a free tier.

Check here and how to start with venv and all other installation methods

synesthesiam · April 15, 2021, 1:23pm

Hi @principe_mestizo,

Which language will you be using Rhasspy for? In general, I recommend Kaldi for STT and nanoTTS for TTS on a Raspberry Pi. If you have a Pi 4, you might give Larynx TTS a try (use “Low Quality” for speed).

If you follow the docs to create a virtual environment, a rhasspy.sh script gets generated which automatically activates the environment.

I’d highly recommend sticking with Docker, though. It’s much easier to set up and to keep updated.

principe_mestizo · April 15, 2021, 4:43pm

Languages Spanish and English, why do you recommend Kaldi?

I suppose in DeepSpeech is better because it is from a Mozilla team (I do not know). I have to try them both, I have only tried the one of mozilla and everything is going well, kaldi would be better?

Use RPI 4 (4 and 8 GB of RAM).

synesthesiam · April 15, 2021, 6:13pm

By default, Kaldi is set to generate a “Text FST” instead of something called an n-gram model (which is what DeepSpeech uses).

Kaldi’s Text FST will only ever recognize sentences from your trained voice commands, whereas the n-gram model will accept “similar” sentences. Depending on your use case, you may want the extra strictness from Kaldi.

principe_mestizo · April 15, 2021, 8:24pm

What do you mean by “Similar”? Words that sound the same or synonymous with the words?

Obviously I do not want false positives, but, that if I want to say something, he understands the first one.

synesthesiam · April 16, 2021, 2:09am

Not similar words, but similar sentences. The words are fixed by what’s in sentences.ini, but with the n-gram model, the sentences are matched according to a 3-word context window.

So if you have lots of slots with phrases like “the living room” and “the downstairs bathroom”, it might accept “the downstairs room” even if you never had that in a slot. With Kaldi + text FST, that will not happen.