Multi-platform MaryTTS Docker Image with 19 Voices

synesthesiam · May 18, 2020, 5:44pm

I’ve updated my MaryTTS Docker Image with support for armv7 and arm64 systems as well as 19 voices across 8 languages!

For Raspberry Pi’s, I’ve added a command-line option to restrict which voice(s) are loaded. This can reduce the memory footprint significantly, making room for more services to run:

$ docker run -it -p 59125:59125 synesthesiam/marytts:5.2 --voice cmu-slt-hsmm

Supported languages are:

English (US/GB, 7 voices)
German (2 voices)
French (4 voices)
Italian (1 voice)
Russian (1 voice)
Swedish (1 voice)
Telugu (1 voice)
Turkish (1 voice)

Command-line users can also use the same Docker image as a text to WAV tool.

koan · May 18, 2020, 6:29pm

Cool, that sounds much better than eSpeak for my Rhasspy setup

CrankyCoder · May 18, 2020, 6:52pm

I have not played with marytts that much. However, I do have a question… have you, or anyone here every created a custom voice for something like marytts?

I have always wondered about that. I used to work in a field where we would hire people to do voice overs, professional phone greetings ect and would find people on voices.com and the like. Always wondered if it were possible to create your own (in a somewhat easy fashion) so you could fine someone that has the voice you want for your assistant, have them do whatever is needed to create the voice and then build the voice model.

synesthesiam · May 18, 2020, 7:34pm

Yes. I’ve successfully done this for an English voice, and I’ve been trying for a long time to get create a Dutch voice.

There are two main challenges I’ve found:

You need voice data aligned properly with text. One WAV file per sentence seems to work well. This is great if you can find a pre-aligned public domain book, but it’s more difficult otherwise (especially if you’re not a native speaker). You can partially automate the process with aenas for 38 languages, at least.
MaryTTS needs to have a language-specific Java module before you can train your voice (French example). I already have phonetic dictionaries for all of Rhasspy’s supported languages, but MaryTTS seems to need some Java code too.

I would be very interested in finding a way we could crowdfund some new MaryTTS voices! If we can get the data and have it properly aligned, there’s a good chance I can handle the Java/training part and get it over the finish line.

daywalker1180 · May 21, 2020, 7:43pm

Very cool and working out-of-the box. Sound indeed even better than picoTTS and ways better than robotic eSpeak.

But as I felt that the german voices (I’m from Austria) are a little bit slow I wanted to speed them up, using the Rate - durScale Paramter of the Audio Effects as denoted here:

I played around with this value from 0.5 to 2.0, but it seems to have no effect.
I marked the checkbox next to “Rate” of course.
The voice sounds always the same and the duration of the audio output is also regardless of this setting always the same time.

Is this param not working or am I doing something wrong?

Thanks regards
Martin

kookic · May 21, 2020, 7:52pm

Hi,
I use Marytts outside of Rhasspy, I adapted the code to use the effects included in Marytts, and modify the voices, here is an example. (marytts server started)

import httplib, urllib
import sys, os

class maryclient:

	def __init__(self):
		
		self.host = "127.0.0.1"
		self.port = 59125
		self.input_type = "TEXT" 
		self.output_type = "AUDIO"
		self.audio = "WAVE_FILE"
		self.locale = "fr"
		self.voice = "upmc-pierre-hsmm"
		self.volume = "1.0"

	def generate(self, message):
		"""Given a message in message,
		   return a response in the appropriate
		   format."""
		raw_params = {"INPUT_TEXT": message,
				"INPUT_TYPE": self.input_type,
				"OUTPUT_TYPE": self.output_type,
				"LOCALE": self.locale,
				"AUDIO": self.audio,
				"VOICE": self.voice,
				"effect_Volume_selected" :"on", 
				"effect_Volume_parameters" : "amount:"+self.volume,
				"effect_TractScaler_selected" : "on",
				"effect_TractScaler_parameters" : "amount:1.2",
				"effect_F0Scale_selected" : "on",
				"effect_F0Scale_parameters" : "scale:1.0 ", 
				"effect_F0Add_selected  " : "on",
				"effect_F0Add_parameters": "add:50.0",
				"effect_Rate_selected  " : "off",
				"effect_Rate_parameters": "durScale:1.5" ,
				"effect_Robot_selected" : "off",
				"effect_Robot_parameters " : "amount:100",
				"effect_Whisper_selected" : "off",
				"effect_Whisper_parameters" : "amount:100", 
				"effect_Stadium_selected" : "off",
				"effect_Stadium_parameters" :"amount:100.0" 
				}
				
		params = urllib.urlencode(raw_params)
		headers = {}

		# Open connection to self.host, self.port.
		conn = httplib.HTTPConnection(self.host, self.port)		
		conn.request("POST", "/process", params, headers)
		response = conn.getresponse()
		if response.status != 200:
			print (response.getheaders())
			raise RuntimeError("{0}: {1}".format(response.status,
				response.reason))
		return response.read()

def dire(t):
	client = maryclient()
	the_sound = client.generate(t)
	w = "/home/output_mytts_wav.wav"
	f = open(w, "wb")
	f.write(the_sound)
	f.close()
	os.system( "sox /home/output_mytts_wav.wav /home/out.wav tempo 0.95" ) # pour ralentir
	os.system( "play /home/out.wav" ) 
	os.system( "rm /home/out.wav " )

	
if len(sys.argv) == 2: 
#si j'ai un argument je le dicte sinon  appel a la fonction mytts.dire("patatipatata")
	dire(sys.argv[1])

#dire("bonjour monsieur.")

synesthesiam · May 21, 2020, 7:52pm

I’ve had the same problem using the MaryTTS effects interface. It works for me when I change the Input Type to be RAWMARYXML and input something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<maryxml version="0.4"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://mary.dfki.de/2002/MaryXML"
xml:lang="en-US">
<p>
  <prosody rate="+30%">
    <s>
      Welcome to the world of speech synthesis!
    </s>
  </prosody>
</p>
</maryxml>

The rate property of prosody lets you increase/decrease the rate by some percent. A pitch property here works as well.

gitmirko · May 23, 2020, 5:41pm

Hi,
I’m trying to use MaryTTS (with this image) as TTS with rhasspy in a master/satellite setup. On the satellite I’m getting the following error: TtsException: file does not start with RIFF id

[ERROR:2020-05-23 17:30:36,748] rhasspyserver_hermes: file does not start with RIFF id
Traceback (most recent call last):
  File "/usr/lib/rhasspy-voltron/.venv/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy-voltron/.venv/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy-voltron/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1554, in api_text_to_speech
    session_id=session_id,
  File "/usr/lib/rhasspy-voltron/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 527, in speak_sentence
    raise TtsException(say_response.error)
rhasspyserver_hermes.TtsException: file does not start with RIFF id
[ERROR:2020-05-23 17:30:36,745] rhasspyserver_hermes: TtsError(error='file does not start with RIFF id', site_id='satellite', context='40c42b71-85b9-4132-958e-7599fc99fda8', session_id='')
[DEBUG:2020-05-23 17:30:36,738] rhasspyserver_hermes: Handling TtsError (topic=hermes/error/tts, id=a7de9354-dfc5-40f3-b20e-18b6dd3eea30)
[DEBUG:2020-05-23 17:30:36,732] rhasspyserver_hermes: Handling AudioPlayBytes (topic=hermes/audioServer/satellite/playBytes/40c42b71-85b9-4132-958e-7599fc99fda8, id=a7de9354-dfc5-40f3-b20e-18b6dd3eea30)
[DEBUG:2020-05-23 17:30:36,650] rhasspyserver_hermes: Publishing 122 bytes(s) to hermes/tts/say
[DEBUG:2020-05-23 17:30:36,649] rhasspyserver_hermes: -> TtsSay(text='Test Mirko', site_id='satellite', lang=None, id='40c42b71-85b9-4132-958e-7599fc99fda8', session_id='')
[DEBUG:2020-05-23 17:27:51,483] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=db2524c8-75e2-4250-8ba0-bd85820effbb)
[DEBUG:2020-05-23 17:27:50,324] rhasspyserver_hermes: Handling AudioPlayBytes (topic=hermes/audioServer/satellite/playBytes/e2fce021-a91a-4824-843e-94e2585c6a70, id=db2524c8-75e2-4250-8ba0-bd85820effbb)
[DEBUG:2020-05-23 17:27:50,252] rhasspyserver_hermes: Publishing 122 bytes(s) to hermes/tts/say
[DEBUG:2020-05-23 17:27:50,251] rhasspyserver_hermes: -> TtsSay(text='Test Mirko', site_id='satellite', lang=None, id='e2fce021-a91a-4824-843e-94e2585c6a70', session_id='')

Edit: after pulling the latest MaryTTS docker image now I have a debug-log available: https://pastebin.com/raw/eMB2W7ec

I found the problem:

2020-05-23 17:30:36,709 [I/O dispatcher 2] DEBUG marytts.Voice Could not find default voice for locale de_DE
2020-05-23 17:30:36,710 [I/O dispatcher 2] ERROR marytts.server Processing failed.

The right locale is just de and now maryTTS is working as expected

Best regards
Mirko