No audio playback from MaryTTS

I got MaryTTS setup and I can access the web interface at http://localhost:59125, but when I run a command that works with eSpeak I don’t hear anything from MaryTTS. Here is my rhasspy log:

[ERROR:2020-08-12 11:53:00,478] rhasspyserver_hermes: file does not start with RIFF id
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1612, in api_text_to_speech
results = await asyncio.gather(*aws)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1598, in speak
session_id=session_id,
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 596, in speak_sentence
raise TtsException(say_response.error)
rhasspyserver_hermes.TtsException: file does not start with RIFF id
[ERROR:2020-08-12 11:53:00,476] rhasspyserver_hermes: TtsError(error=‘file does not start with RIFF id’, site_id=‘default’, context=‘06094371-412b-4155-802c-3c7be7deeb44’, session_id=’’)
[DEBUG:2020-08-12 11:53:00,465] rhasspyserver_hermes: Handling TtsError (topic=hermes/error/tts, id=80ba8b41-f9dc-4b72-a4fe-ce9ee8f43181)
[DEBUG:2020-08-12 11:53:00,456] rhasspyserver_hermes: Handling AudioPlayBytes (topic=hermes/audioServer/default/playBytes/06094371-412b-4155-802c-3c7be7deeb44, id=80ba8b41-f9dc-4b72-a4fe-ce9ee8f43181)
[DEBUG:2020-08-12 11:53:00,416] rhasspyserver_hermes: Publishing 125 bytes(s) to hermes/tts/say
[DEBUG:2020-08-12 11:53:00,415] rhasspyserver_hermes: -> TtsSay(text=‘It is 11 53 AM.’, site_id=‘default’, lang=‘en’, id=‘06094371-412b-4155-802c-3c7be7deeb44’, session_id=’’)
[DEBUG:2020-08-12 11:53:00,398] rhasspyserver_hermes: Sent 370 char(s) to websocket
[DEBUG:2020-08-12 11:53:00,396] rhasspyserver_hermes: Handling NluIntent (topic=hermes/intent/GetTime, id=9ccda5d8-603a-4c1c-adbd-ae86acdfd33f)
[DEBUG:2020-08-12 11:53:00,391] rhasspyserver_hermes: <- NluIntent(input=‘what time is it’, intent=Intent(intent_name=‘GetTime’, confidence_score=1.0), site_id=‘default’, id=‘2e289aae-b3af-483c-a716-c7c936533980’, slots=[], session_id=‘2e289aae-b3af-483c-a716-c7c936533980’, custom_data=None, asr_tokens=[[AsrToken(value=‘what’, confidence=1.0, range_start=0, range_end=4, time=None), AsrToken(value=‘time’, confidence=1.0, range_start=5, range_end=9, time=None), AsrToken(value=‘is’, confidence=1.0, range_start=10, range_end=12, time=None), AsrToken(value=‘it’, confidence=1.0, range_start=13, range_end=15, time=None)]], asr_confidence=None, raw_input=‘what time is it’, wakeword_id=None, lang=None)
[DEBUG:2020-08-12 11:53:00,377] rhasspyserver_hermes: Publishing 204 bytes(s) to hermes/nlu/query
[DEBUG:2020-08-12 11:53:00,376] rhasspyserver_hermes: -> NluQuery(input=‘what time is it’, site_id=‘default’, id=‘2e289aae-b3af-483c-a716-c7c936533980’, intent_filter=None, session_id=‘2e289aae-b3af-483c-a716-c7c936533980’, wakeword_id=None, lang=None)

In the MaryTTS docker log, I noticed that Rhasspy appears to be sending a string of text without quotes (so not a string?), I am not sure if this is causing an issue segmenting the text in Mary. Here is the log:

2020-08-12 15:53:00,444 [I/O dispatcher 5] INFO marytts.server Connection from null

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server New synthesis request: /process

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server INPUT_TEXT=It is 11 53 AM.

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server VOICE=cmu-slt-hsmm

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server LOCALE=en

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server INPUT_TYPE=TEXT

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server OUTPUT_TYPE=AUDIO

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server AUDIO=WAVE

2020-08-12 15:53:00,444 [I/O dispatcher 5] DEBUG marytts.server No audio effects requested

2020-08-12 15:53:00,444 [I/O dispatcher 5] INFO marytts.R 12 New request (input type “TEXT”, output type “AUDIO”, voice “cmu-slt-hsmm”, effect “”, audio “WAVE”)

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.IO Setting text input: It is 11 53 AM.

2020-08-12 15:53:00,445 [I/O dispatcher 5] INFO marytts.server Read: It is 11 53 AM.

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.R 12 Now converting the following input data from TEXT to RAWMARYXML:

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.IO Writing Text output:

It is 11 53 AM.

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.R 12 Determining which modules to use

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.ModuleRegistry Module TextToMaryXML converts TEXT into RAWMARYXML (locale en, voice cmu-slt-hsmm)

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.ModuleRegistry found path through modules

2020-08-12 15:53:00,445 [I/O dispatcher 5] INFO marytts.R 12 Handling request using the following modules:

2020-08-12 15:53:00,445 [I/O dispatcher 5] INFO marytts.R 12 - TextToMaryXML (marytts.modules.TextToMaryXML)

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.R 12 Handing the following data to the next module:

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.IO Writing Text output:

It is 11 53 AM.

2020-08-12 15:53:00,445 [I/O dispatcher 5] INFO marytts.R 12 Next module: TextToMaryXML

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.TextToMaryXML textNodeString=`It is 11 53 AM.’

2020-08-12 15:53:00,445 [I/O dispatcher 5] DEBUG marytts.R 12 Now splitting the following RAWMARYXML data into chunks:

2020-08-12 15:53:00,446 [I/O dispatcher 5] DEBUG marytts.IO <?xml version="1.0" encoding="UTF-8"?>

It is 11 53 AM.

2020-08-12 15:53:00,447 [I/O dispatcher 5] DEBUG marytts.R 12 Now converting the following input data from RAWMARYXML to AUDIO:

2020-08-12 15:53:00,447 [I/O dispatcher 5] DEBUG marytts.IO <?xml version="1.0" encoding="UTF-8"?>

It is 11 53 AM.

2020-08-12 15:53:00,447 [I/O dispatcher 5] DEBUG marytts.R 12 Determining which modules to use

2020-08-12 15:53:00,448 [I/O dispatcher 5] DEBUG marytts.ModuleRegistry Module JTokeniser converts RAWMARYXML into TOKENS (locale en, voice cmu-slt-hsmm)

2020-08-12 15:53:00,448 [I/O dispatcher 5] DEBUG marytts.ModuleRegistry Module Preprocess converts TOKENS into WORDS (locale en, voice cmu-slt-hsmm)

2020-08-12 15:53:00,448 [I/O dispatcher 5] DEBUG marytts.ModuleRegistry Module OpenNLPPosTagger converts WORDS into PARTSOFSPEECH (locale en, voice cmu-slt-hsmm)

2020-08-12 15:53:00,448 [I/O dispatcher 5] ERROR marytts.server Processing failed.

java.lang.UnsupportedOperationException: No known way of generating output (AUDIO) from input(RAWMARYXML), no processing path through modules.

at marytts.server.Request.processOneChunk(Request.java:524)

at marytts.server.Request.processOrLookupOneChunk(Request.java:403)

at marytts.server.Request.process(Request.java:337)

at marytts.server.http.SynthesisRequestHandler.process(SynthesisRequestHandler.java:261)

at marytts.server.http.SynthesisRequestHandler.handleClientRequest(SynthesisRequestHandler.java:91)

at marytts.server.http.BaseHttpRequestHandler.handle(BaseHttpRequestHandler.java:138)

at org.apache.http.nio.protocol.BufferingHttpServiceHandler$RequestHandlerAdaptor.handle(BufferingHttpServiceHandler.java:189)

at org.apache.http.nio.protocol.SimpleNHttpRequestHandler.handle(SimpleNHttpRequestHandler.java:51)

at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.processRequest(AsyncNHttpServiceHandler.java:453)

at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.requestReceived(AsyncNHttpServiceHandler.java:225)

at org.apache.http.nio.protocol.BufferingHttpServiceHandler.requestReceived(BufferingHttpServiceHandler.java:127)

at org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput(DefaultNHttpServerConnection.java:161)

at org.apache.http.impl.nio.DefaultServerIOEventDispatch.inputReady(DefaultServerIOEventDispatch.java:147)

at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)

at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)

at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)

at java.lang.Thread.run(Thread.java:748)

2020-08-12 15:53:00,448 [I/O dispatcher 5] DEBUG marytts.http Returning HTTP status 500: Processing failed.

java.lang.UnsupportedOperationException: No known way of generating output (AUDIO) from input(RAWMARYXML), no processing path through modules.

at marytts.server.Request.processOneChunk(Request.java:524)

at marytts.server.Request.processOrLookupOneChunk(Request.java:403)

at marytts.server.Request.process(Request.java:337)

at marytts.server.http.SynthesisRequestHandler.process(SynthesisRequestHandler.java:261)

at marytts.server.http.SynthesisRequestHandler.handleClientRequest(SynthesisRequestHandler.java:91)

at marytts.server.http.BaseHttpRequestHandler.handle(BaseHttpRequestHandler.java:138)

at org.apache.http.nio.protocol.BufferingHttpServiceHandler$RequestHandlerAdaptor.handle(BufferingHttpServiceHandler.java:189)

at org.apache.http.nio.protocol.SimpleNHttpRequestHandler.handle(SimpleNHttpRequestHandler.java:51)

at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.processRequest(AsyncNHttpServiceHandler.java:453)

at org.apache.http.nio.protocol.AsyncNHttpServiceHandler.requestReceived(AsyncNHttpServiceHandler.java:225)

at org.apache.http.nio.protocol.BufferingHttpServiceHandler.requestReceived(BufferingHttpServiceHandler.java:127)

at org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput(DefaultNHttpServerConnection.java:161)

at org.apache.http.impl.nio.DefaultServerIOEventDispatch.inputReady(DefaultServerIOEventDispatch.java:147)

at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)

at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)

at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)

at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)

at java.lang.Thread.run(Thread.java:748)

2020-08-12 15:53:00,448 [I/O dispatcher 5] INFO marytts.server Request couldn’t be handled successfully.

2020-08-12 15:53:00,450 [I/O dispatcher 5] INFO marytts.server Connection closed: [closed]

2020-08-12 15:54:18,746 [I/O dispatcher 2] INFO marytts.server Connection closed: [closed]

2020-08-12 15:54:18,746 [I/O dispatcher 3] INFO marytts.server Connection closed: [closed]

Edit: Clarified findings in MaryTTS docker log

I got a log of a good MaryTSS operation by running this url in my web browser:

localhost:59125/process?INPUT_TYPE=TEXT&AUDIO=WAVE_FILE&OUTPUT_TYPE=AUDIO&LOCALE=en_us&INPUT_TEXT=It%20is%2011%2053%20AM.

This produces a wave file with audio.

I then compared this log to the log of the operation sent from Rhasspy, and the only difference I can find is that Rhasspy is for some reason sending “en” as the locale and not “en_us”. I have “en_us” set in my profile.json, but Rhasspy is still sending “en”. Is this possibly a bug?

this is what i have in my profile.json, and its been fine since 2.5.0 pre
you could start by trying CAPS for the last bit or en-GB?
edit - or it might be the underscore

"text_to_speech": {
    "marytts": {
        "locale": "en-GB",
        "url": "http://myserver.local:59125/process",
        "voice": "dfki-prudence-hsmm"
    },

I am absolutely stumped. I tried every combination of en-us, en_us, en-US, en_US, and nothing worked. The log still shows the locale submitted from rhasspy is “en”. I tried changing the voice setting in Rhasspy and that did nothing either. It’s like Rhasspy is ignoring its profile.json.

Figured it out! Everything in Rhasspy was correct. In my home assistant configuration.yaml, I had this code:

rest_command:
  espeak:
    url: http://localhost:12101/api/text-to-speech?voice=en
    method: POST
    headers:
      content_type: text/plain
    payload: '{{ message }}'

Of course, the voice=en command in the url was screwing up the request to MaryTTS. I set this to en_us and everything works fantastic now!

1 Like