Most realistic voice?

DaveKearley · April 8, 2021, 2:59pm

Whats the best voice for TTS please - most sound similar and one did not work at all (Larynx i think)

Are there and semi human ones available?

Thanks

AlmostSerious · April 8, 2021, 4:17pm

In my opinion the currently best available voice is the Google Wavenet. However, those are not locally created but in the Google Cloud.
On the second place, at least for me is the Larynx TTS. If this doesnt work for you, check if you are already using Rhasspy 2.5.10 which got rid of the AVX Requirement on the CPU.

To try out the Wavenet:

Listen to the Larynx Voices available in Rhasspy:

DaveKearley · April 8, 2021, 4:28pm

Thanks,

any idea how i get Larynx to function??

currently its set on en-kathleen but i have no idea what goes in the “voice” area and it just hangs when tested, leaving the LEDS on 100% plus a timeout error.

I really dont want to use any external resources

tjiho · April 8, 2021, 4:43pm

In french, the best ratio cpu/quality is nano tts. It’s takes no time to be generated on a raspberry 3, and it’s ok.
Larynx is great, but it takes 30s to generate the audio file (and it has some strange pronunciation for some words )

DaveKearley · April 8, 2021, 4:50pm

Interesting thanks, thats a lot of overhead and not really workable for a pi i think then

tjiho · April 8, 2021, 4:53pm

Did you install rhasspy from source ? To test Larynx, you can run something like
/path/to/rhasspy/.venv/bin/python3 -m larynx.server --voices-dir /home/pi/.config/rhasspy/profiles/fr/tts/larynx

(you need to change fr by your profile name)

romkabouter · April 8, 2021, 5:55pm

For me it’s Google Wavenet. The Larynx is my best when you do not want anything generated by cloud services.
Google Wavenet only generated the audio once per sentence and plays from cache when the same sentence is spoken.

DaveKearley · April 9, 2021, 5:48am

Thanks all, i’ll try some of these as soon as i can get some time on it again. Its on nano.tts at the moment and thats pretty good.

rolyan_trauts · April 9, 2021, 8:39am

There are many high quality TTS to even singing ones https://nv-adlr.github.io/Mellotron

The challenge is to make them lite weight where the likes of larynx or https://github.com/TensorSpeech/TensorFlowTTS make a fairly good job but for myself there is nothing good about any cloud service.

It more about what hardware you can use locally than what you can use remotely and if you are just using a Pi then Larynx or TensorflowTTS on a Pi4 is prob best you will get currently.

DaveKearley · April 9, 2021, 10:28am

Thanks, i’ll try Larynx again soon

synesthesiam · April 9, 2021, 9:02pm

I’m hoping we can fix these as time goes on. Luckily, it shouldn’t require re-training the model, just re-ordering the pronunciation dictionary so the correct pronunciation is picked.

For English, I use part-of-speech and tense to determine pronunciation – for example “I read (RED) the book” versus “I read (REED) books”. Would this be helpful in French too?

tjiho · April 9, 2021, 10:12pm

I just open a pull request on gruut. I inverted the ordering. What do you think about ?

In french, I did some tests and it improve pronunciation for many words.

synesthesiam · April 12, 2021, 8:09pm

I’m re-ordering the lexicon according to pronunciation frequencies from the French Kaldi model I trained. Once I roll this update into Rhasspy, I’ll see what you think

@tjiho, could you also provide some feedback on this discussion regarding liasons in French please? https://github.com/rhasspy/larynx/issues/7

manju-rn · April 13, 2021, 5:04am

Tested Layrnx and able to confirm that the cmu_slp and cmu_aup looks good for indian voice. cmu_slp has a bit of unusual accent in few words, but it is understandable

DaveKearley · April 13, 2021, 8:09pm

Any ideas why i cannot run Larynx, any of the voices??
They all give me this error or very similar…

[ERROR:2021-04-13 20:06:24,198] rhasspyserver_hermes: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed:Load model /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed. File doesn’t exist
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1692, in api_text_to_speech
results = await asyncio.gather(*aws)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1678, in speak
say_chars_per_second=say_chars_per_second,
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 625, in speak_sentence
raise TtsException(say_response.error)
rhasspyserver_hermes.TtsException: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed:Load model /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed. File doesn’t exist
[ERROR:2021-04-13 20:06:24,195] rhasspyserver_hermes: TtsError(error="[ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed:Load model /profiles/en/tts/larynx/en-us/harvard-glow_tts/generator.onnx failed. File doesn’t exist", site_id=‘Voice1’, context=‘cd33bd7c-eaab-410a-9298-36bc65966ae3’, session_id=’’)
[DEBUG:2021-04-13 20:06:24,185] rhasspyserver_hermes: Handling TtsError (topic=hermes/error/tts, id=a3b5fe98-46a3-444e-82c3-ff3d5e56b55c)
[DEBUG:2021-04-13 20:06:24,148] rhasspyserver_hermes: Publishing 142 bytes(s) to hermes/tts/say
[DEBUG:2021-04-13 20:06:24,147] rhasspyserver_hermes: -> TtsSay(text=‘Arse biscuits’, site_id=‘Voice1’, lang=‘harvard’, id=‘cd33bd7c-eaab-410a-9298-36bc65966ae3’, session_id=’’, volume=1.0)
[DEBUG:2021-04-13 20:06:24,143] rhasspyserver_hermes: TTS timeout will be 30 second(s)

rolyan_trauts · April 13, 2021, 8:13pm

Seems like the profiles folder isn’t shared to docker

docker run -d -p 12101:12101
–name rhasspy
–network host
–restart unless-stopped
-v “$HOME/.config/rhasspy/profiles:/profiles”
–device /dev/snd:/dev/snd
rhasspy/rhasspy
–user-profiles /profiles
–profile en

Are you missing that -v “$HOME/.config/rhasspy/profiles:/profiles” ?

DaveKearley · April 13, 2021, 8:15pm

I used this…

start docker…
docker run -d -p 12101:12101
–name rhasspy
–network host
–restart unless-stopped
-v “$HOME/.config/rhasspy/profiles:/profiles”
–device /dev/snd:/dev/snd
rhasspy/rhasspy
–user-profiles /profiles
–profile en

rolyan_trauts · April 13, 2021, 8:51pm

Duuno Dave as seems correct profiles and stuff all in $HOME/.config/rhasspy/profiles then so they are shared to /profiles in the container?

synesthesiam · April 13, 2021, 8:53pm

After you select a voice and restart, make sure you check the top of the web page for a Download button. The models all together are over 1 GB in size, so I have Rhasspy just download the voices you select.

DaveKearley · April 13, 2021, 9:10pm

Thats the one Its working now, thanks again