Any improvements on german voice quality?

Damn · December 16, 2021, 11:54am

Hi,

I tried using rhasspy about a year ago, but had several issues with the performance of tts-engines with german language.

most of them sound like crap (early 80’s computers) and the only professional sounding voices like larynx had extreme performance problems which leads to waiting for a lifetime for rhasspy to create the answer with larynx.

because all the other voices sounded crappy to me I decided to wait for better performance…

is there any progress made to the performance of larynx or is there actual another tts voice which is sounding somewhat professional (human)??

my rhasspy server is running on an intel nuk and the satellite is running on pi3 with a 4mic hat.

thanks for your answers!

Dan

jrb5665 · December 16, 2021, 12:10pm

I’m using the latest version of the opentts engine and it is very good for English.

I even wrote a little script to take an ebook in txt format, break it into sentences and read it to me.
The response for the tts was about 1/2 the time it took to actually play the sound to transcode it with a small format and only slightly longer for medium, so it was reading almost in real time.

I ended up letting the script create an mp3 file out from it (took about 3.5 days to encode 267 hours of audio at medium) and I have been listening to it for about a week or more now when I can.
I have been listening to The Wheel of Time and there are only a few words it gets wrong regularly but these are only due to it not being able to determine correctly from the context (it always says the word “bow” as “bo” instead of “bough”)

Damn · December 16, 2021, 12:24pm

thanks @jrb5665

forgot to mention that I’m using german because it’s my native language…
but I’ll give it a try - maybe it’s also working with german language

just edited my first post to reflect that

anyone can tell if there’s any progress made on larynx performance (german)?

rejoe2 · December 16, 2021, 1:42pm

In the meantime there had been this announcement Preview of New TTS Voices, and afai recall, there had been some small hints on larynx speed:

lower the requested quality
don’t get disappointed after the first waiting periode. It might speed up much after the first time of useage.
(This is from memory and might not be right or all of the tipps; perhaps especially have a look in the recent announcements-threads)

Damn · December 16, 2021, 1:46pm

thanks a lot @rejoe2

I’ll have a look and try to implement these changes

Damn · December 16, 2021, 2:10pm

ok I installed opentts via docker and it’s running fine…
but the only available german tts for opentts is espeak:German which really sounds awful

so I think my main problem with rhasspy is the lack of an acceptable german voice
I tried some english voices and the soundquality was way better on some voices…

I also tried some english versions of opentts larynx voices and it’s right, decreasing the quality leads to an acceptable performance but I would like to use a german voice

any other suggestions are welcome

rejoe2 · December 16, 2021, 2:32pm

espeak really is awsome…

Wrt. to larynx: Did you give “thorsten” a try? That’s afaik the only version based on natural voice recordings (The Master Plan - #35 by synesthesiam).

Damn · December 16, 2021, 2:40pm

yes tried that but even decreasing the soundquality doesn’t lead to an acceptable performance, sadly…
and it’s not available for opentts

romkabouter · December 16, 2021, 10:02pm

Why not try wavenet? Output is cached, so every time a response is generated, it is played from cache the best time.

I do not know how many variations you have, but probably not very random

Damn · December 17, 2021, 9:22am

hi @romkabouter

Opentts would have been a big compromise for me because I don’t like my data handled by internet services… this is why I used snips and this is why I’m trying to get rhasspy running.

But google’s wavenet needs a google account to work and I don’t send data to a registered google account…

thanks anyway

romkabouter · December 17, 2021, 9:43am

I understand that, but I have two arguments here:

You only send the text you want google to create a spoken response for. So YOU are in control of what is being sent.
The spoken response is cached, so it is only sent and created ONCE

In my opinion does that outweigh the crappy TTS with other systems, but that is obviously a choice only you can make.
I think expecting a high quality TTS for an offline system is setting a high requirement.
When I used Snips I found the quality not to be very high, and that was English (Dutch was not supported) so I can only image that German was not very good either.

Damn · December 17, 2021, 10:49am

I understand what you’re talking about…

and as I said I would use opentts, even it’s not local, if the soundquality would be better for german language. But I can not understand why an account is required to use a service from google.

it’s not the safest option to trust a global monopolist like google imo…

maybe I got to think about it…