Mozillatts performance

This is not really related to Rhasspy - but I am interested in using this text-to-speech engine, as the quality is very good. My problem is related to the rendering speed, for example, this phrase:

curl -G --output - --data-urlencode 'text=Welcome to the world of speech synthesis!' '' | aplay

Takes about 10 seconds to render before playback on the synesthesiam mozilla-tts container on an AMD FX-6300 6core processor with 16GB ram.

I know its quite old hardware, but has anyone else used this tts service and obtained near instant (1-2 second delay or less) results on newer generation processors, or if there are any tweaks I can do to speed up the rendering on my hardware?

I’ve been working on adding a fork of Mozilla TTS to Rhasspy, called Larynx. Most of the models I’ve trained are smaller and faster than the LJSpeech one included in that Docker image.

Larynx is available in 2.5.8, but I don’t have an English voice trained just yet. When I do, you might want to give that a try and see if it works better on your system :slight_smile:

