TTS over HTTP is returning plaintext instead of wav

pingless · January 9, 2023, 2:41pm

Hi,
I’m following the tutorial for setting up a “Server with Satellites” (tutorials/#server-with-satellites) and I am not managing to get the TTS to work via HTTP.

As far as I can tell, the satellite is sending a siteId parameter (the URL it is POSTing to is http://<myserver>:12101/api/text-to-speech?play=false&siteId=<mysatellite>). This is causing the server to respond in plaintext, with the input text as the body, and a header of content-type: text/html; charset=utf-8.
As a result, the satellite fails to speak with an error message of Expected audio/wav content type, got text/html; charset=utf-8 and wave.Error: file does not start with RIFF id (stacktrace below).

When I make the same POST manually without the siteId parameter (api/text-to-speech?play=false), I correctly get WAV data. However, I cannot figure out how to get the satellite to avoid sending its siteId property as part of the POST, and I’m not sure if this would even be correct behaviour.

I can see someone had a similar issue here, but the solution in that thread seemed to be a workaround of “use MQTT instead”.
Issue #262 on Github seems to describe the exact problem, but is unsolved and still open. I tried downgrading to 2.5.9, but it did not solve the issue.

Can someone help me figure out what I might be doing wrong, or whether this is a bug in Rhasspy?

Thank you!

Satellite: Raspberry Pi 3 running rhasspy 2.5.11 in docker
Server: Ubuntu running rhasspy 2.5.11 in docker

Stack Trace on the satellite:

rhasspy  | [DEBUG:2023-01-09 14:31:05,871] rhasspyserver_hermes: TTS timeout will be 30 second(s)
rhasspy  | [DEBUG:2023-01-09 14:31:05,876] rhasspyserver_hermes: -> TtsSay(text='Hello World', site_id='pi1', lang=None, id='a1c19df1-f005-4ba9-b636-c7899ee29125', session_id='', volume=1.0)
rhasspy  | [DEBUG:2023-01-09 14:31:05,877] rhasspyserver_hermes: Publishing 132 bytes(s) to hermes/tts/say
rhasspy  | [DEBUG:2023-01-09 14:31:05,895] rhasspyremote_http_hermes: <- TtsSay(text='Hello World', site_id='pi1', lang=None, id='a1c19df1-f005-4ba9-b636-c7899ee29125', session_id='', volume=1.0)
rhasspy  | [DEBUG:2023-01-09 14:31:05,898] rhasspyremote_http_hermes: http://10.0.1.3:12101/api/text-to-speech
rhasspy  | [WARNING:2023-01-09 14:31:09,193] rhasspyremote_http_hermes: Expected audio/wav content type, got text/html; charset=utf-8
rhasspy  | [DEBUG:2023-01-09 14:31:09,198] rhasspyremote_http_hermes: -> AudioPlayBytes(11 byte(s)) to hermes/audioServer/pi1/playBytes/a1c19df1-f005-4ba9-b636-c7899ee29125
rhasspy  | [DEBUG:2023-01-09 14:31:09,202] rhasspyserver_hermes: Handling AudioPlayBytes (topic=hermes/audioServer/pi1/playBytes/a1c19df1-f005-4ba9-b636-c7899ee29125, id=80fe37d5-3890-4b30-87e0-dab711acedfc)
rhasspy  | [DEBUG:2023-01-09 14:31:09,204] rhasspyserver_hermes: Handling AudioPlayBytes (topic=hermes/audioServer/pi1/playBytes/a1c19df1-f005-4ba9-b636-c7899ee29125, id=48eeae67-bbbe-4e7d-a76a-03c2f882d9b8)
rhasspy  | [DEBUG:2023-01-09 14:31:09,204] rhasspyremote_http_hermes: -> TtsSayFinished(site_id='pi1', id='a1c19df1-f005-4ba9-b636-c7899ee29125', session_id='')
rhasspy  | [DEBUG:2023-01-09 14:31:09,208] rhasspyremote_http_hermes: Publishing 80 bytes(s) to hermes/tts/sayFinished
rhasspy  | [DEBUG:2023-01-09 14:31:09,208] rhasspyspeakers_cli_hermes: <- AudioPlayBytes(11 byte(s))
rhasspy  | [DEBUG:2023-01-09 14:31:09,209] rhasspyspeakers_cli_hermes: ['aplay', '-q', '-t', 'wav']
rhasspy  | [ERROR:2023-01-09 14:31:09,220] rhasspyspeakers_cli_hermes: handle_play
rhasspy  | Traceback (most recent call last):
rhasspy  |   File "/usr/lib/rhasspy/rhasspy-speakers-cli-hermes/rhasspyspeakers_cli_hermes/__init__.py", line 248, in convert_to_wav
rhasspy  |     with io.BytesIO(sound_bytes) as sound_io, wave.open(sound_io, "rb"):
rhasspy  |   File "/usr/lib/python3.7/wave.py", line 510, in open
rhasspy  |     return Wave_read(f)
rhasspy  |   File "/usr/lib/python3.7/wave.py", line 164, in __init__
rhasspy  |     self.initfp(f)
rhasspy  |   File "/usr/lib/python3.7/wave.py", line 131, in initfp
rhasspy  |     raise Error('file does not start with RIFF id')
rhasspy  | wave.Error: file does not start with RIFF id
rhasspy  |
rhasspy  | During handling of the above exception, another exception occurred:
rhasspy  |
rhasspy  | Traceback (most recent call last):
rhasspy  |   File "/usr/lib/rhasspy/rhasspy-speakers-cli-hermes/rhasspyspeakers_cli_hermes/__init__.py", line 255, in convert_to_wav
rhasspy  |     audio_data, sample_rate = soundfile.read(sound_file)
rhasspy  |   File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/soundfile.py", line 257, in read
rhasspy  |     subtype, endian, format, closefd) as f:
rhasspy  |   File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/soundfile.py", line 629, in __init__
rhasspy  |     self._file = self._open(file, mode_int, closefd)
rhasspy  |   File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/soundfile.py", line 1184, in _open
rhasspy  |     "Error opening {0!r}: ".format(self.name))
rhasspy  |   File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/soundfile.py", line 1357, in _error_check
rhasspy  |     raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
rhasspy  | RuntimeError: Error opening <_io.BytesIO object at 0x75ec35a0>: File contains data in an unknown format.
rhasspy  |
rhasspy  | During handling of the above exception, another exception occurred:
rhasspy  |
rhasspy  | Traceback (most recent call last):
rhasspy  |   File "/usr/lib/rhasspy/rhasspy-speakers-cli-hermes/rhasspyspeakers_cli_hermes/__init__.py", line 81, in handle_play
rhasspy  |     wav_bytes = self.convert_to_wav(sound_bytes)
rhasspy  |   File "/usr/lib/rhasspy/rhasspy-speakers-cli-hermes/rhasspyspeakers_cli_hermes/__init__.py", line 265, in convert_to_wav
rhasspy  |     temp_file.name, backends=self.audioread_backends
rhasspy  |   File "/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/audioread/__init__.py", line 116, in audio_open
rhasspy  |     raise NoBackendError()
rhasspy  | audioread.exceptions.NoBackendError

Satellite profile:

{
    "dialogue": {
        "system": "rhasspy"
    },
    "handle": {
        "system": "hass"
    },
    "home_assistant": {
        "access_token": "<redacted>",
        "url": "<redacted>"
    },
    "intent": {
        "remote": {
            "url": "http://10.0.1.3:12101/api/text-to-intent"
        },
        "system": "remote"
    },
    "microphone": {
        "pyaudio": {
            "device": "1",
            "siteId": "pi1"
        },
        "system": "pyaudio"
    },
    "mqtt": {
        "enabled": "",
        "host": "<redacted>",
        "password": "<redacted>",
        "site_id": "pi1",
        "username": "<redacted>"
    },
    "sounds": {
        "error": "${RHASSPY_PROFILE_DIR}/sounds/xp-critical-stop.wav",
        "recorded": "${RHASSPY_PROFILE_DIR}/sounds/xp-hw-remove.wav",
        "system": "aplay",
        "wake": "${RHASSPY_PROFILE_DIR}/sounds/xp-hw-insert.wav"
    },
    "speech_to_text": {
        "remote": {
            "url": "http://10.0.1.3:12101/api/speech-to-text"
        },
        "system": "remote"
    },
    "text_to_speech": {
        "larynx": {
            "default_voice": "northern_english_male"
        },
        "remote": {
            "url": "http://10.0.1.3:12101/api/text-to-speech"
        },
        "system": "remote"
    },
    "wake": {
        "porcupine": {
            "keyword_path": "computer_raspberry-pi.ppn",
            "sensitivity": "0.7"
        },
        "system": "porcupine"
    }
}

Server profile:

{
    "intent": {
        "satellite_site_ids": "pi1",
        "system": "fsticuffs"
    },
    "mqtt": {
        "enabled": "true",
        "host": "<redacted>",
        "password": "<redacted>",
        "site_id": "root",
        "username": "<redacted>"
    },
    "sounds": {
        "aplay": {
            "device": "front:CARD=PCH,DEV=0"
        }
    },
    "speech_to_text": {
        "satellite_site_ids": "pi1",
        "system": "kaldi"
    },
    "text_to_speech": {
        "larynx": {
            "default_voice": "northern_english_male",
            "vocoder": "vctk_medium"
        },
        "nanotts": {
            "language": "en-GB"
        },
        "satellite_site_ids": "pi1",
        "system": "larynx"
    }
}