Is there a natural Dutch female text to speech voice setup possible?

Hi, I have Rhasspy up and running in a docker on my Raspberry pi. Together with the Node-RED integration. I can control all my smart devices via speech, very cool!!

The only down side is the text to speech quality. I have setup espeak for that which sounds like a robot. You can hardly understand what he said, sorry…

On my phone I use the google text to speech engine com.google.android.tts:nl-nl in Tasker which can read my notifications out loud. Thats perfect and sounds very natural.

Question 1: I wonder if there is also a comparable natural Dutch female voice offline available for Rhasspy?

Question 2: I read something about MBROLA voices, I found there a female Dutch voice but the installation is not very clear yet for me. Do I only have to copy the file in my docker? No future config?

Question 3: Otherwise I have to use the same as is available on android, the online Google voice engine.
Q2: I suppose that is the Google WaveNet implementation?

I hope some people have already some knowledge about this and can inform me about the current status before I try every possible speak engine.

Thanks in advanced!

Setup Google Wavenet, works well :slight_smile:

Sentences are cached, so the texttospeech is per sentence called once. After that the same text will be played from cache

@romkabouter great to hear about the used cache.
With removing some (unnecessary) variables from current notifications I don’t need to trigger the online service so much. So it’s than almost offline. Great solution, I will setup that way!

Even with variables, it is probably a limited set and no random text every time.
So the cache builds, until every combination is in the set.
Changing voice and/or sample rate triggers a new call to the service (the cache is MD5 hash by voice and samplerate), so choose wisely :smiley:
I suggest using 16000 or 22050 as output.

1 Like

I use temperature values in 2 decimals in my notifications so that takes a while before all those unique combinations are indexed :stuck_out_tongue_winking_eye:
So I will change those to real integer values.

1 Like

Ja, dat lijkt me dan een goed plan

I’m a bit further.
I enabled the google service and created and place the generated json on my local profile location.
I had to run also marytts.
I defined the right voice, matching my langauge from https://cloud.google.com/text-to-speech/docs/voices (can I also use the voice type Standard?)

But now this, what goes wrong?

This is the error I got.

Traceback (most recent call last):


  File "/usr/lib/rhasspy/rhasspy-tts-wavenet-hermes/rhasspytts_wavenet_hermes/__init__.py", line 125, in handle_say


    "audio_config": audio_config,


  File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/google/cloud/texttospeech_v1/services/text_to_speech/client.py", line 353, in synthesize_speech


    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata)


  File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__


    return wrapped_func(*args, **kwargs)


  File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable


    six.raise_from(exceptions.from_grpc_error(exc), exc)


  File "<string>", line 3, in raise_from


google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.


[DEBUG:2021-01-25 00:04:32,945] rhasspytts_wavenet_hermes: -> TtsError(error='400 Request contains an invalid argument.', site_id='default', context='9ca50550-71d2-4c27-9e20-473677fc00841', session_id='')

Must the site_id be a number or should the session_id not empty?

You do not need MayTTS when you use Google Wavenet as TTS.
What are your Google Wavenet settings? It seems this is not coming from Rhasspy.

Please give some info on your setup and settings

I added MaryTTS because in the logs I got the error port 59125 was unreachable. I found out that was the MaryTTS port. Then that error was also gone.

This is text_to_speech block in profile.json. I placed the downloaded credentials file on the defined path tts/googlewavenet/credentials.json

 "text_to_speech": {
    "system": "wavenet",
    "wavenet": {
      "cache_dir": "tts/googlewavenet/cache",
      "credentials_json": "tts/googlewavenet/credentials.json",
      "gender": "FEMALE",
      "language_code": "nl-NL",
      "sample_rate": 44100,
      "url": "https://texttospeech.googleapis.com/v1/text:synthesize",
      "voice": "nl-NL-Wavenet-A"
    }
  }

I don’t know what Google Wavenet settings you need to see.
Are there api calls I can use to check (maybe if the authentication works correct)?

Can you post screenshots of Rhasspy config?

wavenet settings should be:

"wavenet": {
    "sample_rate": "16000",
    "voice": "nl-NL-Wavenet-B"
}

No other settings are needed.
Which Rhasspy version are you using?

I use version 2.5.9 in a docker.

I used the settings from the documentation page https://rhasspy.readthedocs.io/en/latest/text-to-speech/#google-wavenet So this is out-dated.

Before the pulldown with voices was empty. Now it’s filled (maybe it where some settings in google wavenet), I selected the correct voice and set the sample rate and now it works! Great!

What is the difference in sample rate? I don’t hear it. The quality (and wav file size)?

@romkabouter tnx for your help/time!

Yes, this size is bigger and the quality higher.
But, like you already found, 16000 or maybe 22050 is good enough for voice and there will not be a lot of quality gain with 44100.

The documentation is outdated indeed.