I’m having performance issues when generating audio samples using Larynx and MaryTTS. When I switch back to NanoTTS, performance is very good and the generation of the sample is almost instantaneous. But I would like to have better quality, so I tried both Larynx and MaryTTS.
When using the MaryTTS web UI, samples generate very quickly, with what feels like real time or close to real time performance. However, when going through Rhasspy, generating the same sample takes many seconds more.
Larynx is even slower. Generating the sample for “what time is it?” on low quality takes about 7 seconds, and I see CPU spiking throughout that period.
This is all on a Pi 4.
I have two questions:
- is this expected behaviour for Larynx? I thought I had read somewhere that performance is close to realtime on a Pi4
- does anyone have any idea how I might explain/address the performance difference I am seeing between the MaryTTS web ui and using MaryTTS through Rhasspy?
Any pointers would be much appreciated! Thanks!