MQTT issue with external TTS service

Hi,
I have coupled Rhasspy to my TTS service via MQTT (=Hermes MQTT is selected as TTS service), and have trouble convincing Rhasspy that this has finished.

From the Web interface of Rhasspy I send a text:
[DEBUG:2021-12-28 09:39:22,800] rhasspyserver_hermes: → TtsSay(text=‘Das ist noch ein fünfter Versuch’, site_id=‘default’, lang=None, id=‘3ae8e3ed-363b-42dd-866c-2b24dbb97727’, session_id=’’, volume=1.0)

This is sent out properly:
[DEBUG:2021-12-28 09:39:22,800] rhasspyserver_hermes: Publishing 157 bytes(s) to hermes/tts/say

On my home automation system this is handled correctly
2021.12.28 09:39:22 1: [speakTablet] name=Tab2.EG text=Das ist noch ein fünfter Versuch

and also answered properly to Rhasspy
2021.12.28 09:39:22 1: set SpeakMQTT_Client publish hermes/tts/sayFinished {“siteId”:“default”,“id”:“3ae8e3ed-363b-42dd-866c-2b24dbb97727”}

Rhasspy recognized a sayFinished message
DEBUG:2021-12-28 09:39:22,830] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=970ec854-df84-4261-9b2d-73990f8c9167)

  • but strangely, the id has changed. This is definitely not done by my system.

Anyway, of course Rhasspy does not see the proper answer and after 30 seconds goes in timeout error.

Is there any clue what happens here?

Regards

pah

The only clue might be that regardless you think your TTS service does not send that message, it somehow does.
Strange but I cannot think of something else. I do not think Rhasspy changes the id somewhere

Well, thanks for trying - but the post is not about what I “think”, rather, this is proven.

I coupled another MQTT device to the MQTT Server, subscribing to everything, such that I can receive any traffic.

Test 1: TTS in Rhasspy set to service nanoTTS (or any other TTS Service), and my own TTS service subscribing to /hermes/tts/say. Then using the web UI of Rhasspy to speak something. ==> hermes/tts/say is received by nanoTTS as well as by my own service. Both texts spoken correctly. Identical messages /hermes/tts/sayFinished as well as hermes/audioServer/default/playFinished are sent by the two TTS services, received and processed by Rhasspy, everything is ok. Test 1 is proof that the reply messages from my own service are correct.

2021.12.29 15:20:21 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/tts/say => {
2021.12.29 15:20:21 1: {“text”: “Das ist Test Nummer 1234”, “siteId”: “default”, “lang”: null, “id”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”, “sessionId”: “”, “volume”: 1.0}

2021.12.29 15:20:21 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/tts/sayFinished =>
2021.12.29 15:20:21 1: {“siteId”: “default”, “id”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”, “sessionId”: “”}
2021.12.29 15:20:21 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/audioServer/default/playFinished =>
2021.12.29 15:20:21 1: {“id”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”, “sessionId”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”} (This is from my TTS service)

2021.12.29 15:20:21 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/tts/sayFinished =>
2021.12.29 15:20:21 1: {“siteId”: “default”, “id”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”, “sessionId”: “”}
2021.12.29 15:20:25 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/audioServer/default/playFinished =>
2021.12.29 15:20:25 1: {“id”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”, “sessionId”: “1a6eef5c-5314-43ae-baa4-b4be931c4d82”} (This is from NanoTTS)

Test 2: TTS in Rhasspy set to “Hermes MQTT”, and my own TTS service subscribing to /hermes/tts/say. Then using the web UI of Rhasspy to speak something. ==> hermes/tts/say is received by my own service. Text spoken correctly. Messages /hermes/tts/sayFinished as well as hermes/audioServer/default/playFinished are sent by my own service, received but not processed by Rhasspy, Failure, Rhasspy goes into timeout after 30 seconds.

2021.12.29 15:23:40 1: {“text”: “Das ist Test Nummer 1236”, “siteId”: “default”, “lang”: null, “id”: “9b7a2deb-dd30-4280-b69e-a545591db943”, “sessionId”: “”, “volume”: 1.0}
2021.12.29 15:23:40 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/tts/sayFinished =>
2021.12.29 15:23:40 1: {“siteId”: “default”, “id”: “9b7a2deb-dd30-4280-b69e-a545591db943”, “sessionId”: “”}
2021.12.29 15:23:40 4: MQTT2_DEVICE_Parse: MQTT2_XXX hermes/audioServer/default/playFinished =>
2021.12.29 15:23:40 1: {“id”: “9b7a2deb-dd30-4280-b69e-a545591db943”, “sessionId”: “9b7a2deb-dd30-4280-b69e-a545591db943”} (this from my own TTS service is not accepted)

Conclusion: Need to check if the waiting for return messages is correct when TTS in Rhasspy is set to “Hermes MQTT”. I’m willing to do this check, need some time to find the proper location in the code.

Regards

pah

No, it is not. Because this message has a sessionId. When doing a speak command from the webui, only an id will be filled, not a sessionId.
The first message on sayFinished is the one from nanoTTS (2021.12.29 15:20:21 1:.

Sadly I cannot tell where the second sayFinished (with the sessionId) is originating from, maybe some strange behavior when returning a message to sayFinished with a sessionId, while the ttsSay message did not have one

Hi,

is it possible, that you have more than one rhasspy instance? Probaly all in the default site?

Only an idea….

Greetings

Andreas

Could it be that rhasspy-server-hermes is taking over the job from rhasspy-dialog-hermes? Possibly a faulty emergency run from old times?

Thanks for the suggestion. But no, this is not possible - binding to the port would fail.

Regards

pah

Hmmm. Could not verify this up to now, have to check.

Regards

pah

OK, I re-started this with a clean installation.

Setting: Rhasspy installed, working properly, Intents recognized, all services tested ok.
TTS Service Chosen is nanoTTS.
Coupled an external MQTT Client to the Rhasspy external MQTT server, subscribed to
hermes/tts/# rhasspy/tts/# hermes/hotword/#help.
On receiving a hermes/tts/say message, it does a speech message and responds with two MQTT messages

For logging, I have added a mosquitto_sub client as
mosquitto_sub -v -t hermes/hotword/# -t hermes/dialogueManager/# -t hermes/asr/# -t hermes/audioServer/+/playFinished -t hermes/nlu/# -t hermes/error/# -t hermes/tts/# -t hermes/feedback/#

Test 1: Type a text into the Web UI Home page of Rhasspy, klick “speak”. Result: I get two speech outputs, one from nanoTTS locally, one from my external system. Rhasspy is completely ok. mosquitto_sub gives me

hermes/tts/say {“text”: “es ist 17 uhr 31”, “siteId”: “default”, “lang”: null, “id”: “54b2cc5f-37e7-4512-b17e-a3de038924fe”, “sessionId”: “”, “volume”: 1.0}

hermes/tts/sayFinished {“siteId”:“default”,“sessionId”:"",“id”:“54b2cc5f-37e7-4512-b17e-a3de038924fe”}
hermes/audioServer/default/playFinished {“sessionId”:“54b2cc5f-37e7-4512-b17e-a3de038924fe”,“id”:“54b2cc5f-37e7-4512-b17e-a3de038924fe”}

hermes/tts/sayFinished {“siteId”: “default”, “id”: “54b2cc5f-37e7-4512-b17e-a3de038924fe”, “sessionId”: “”}
hermes/audioServer/default/playFinished {“id”: “54b2cc5f-37e7-4512-b17e-a3de038924fe”, “sessionId”: “54b2cc5f-37e7-4512-b17e-a3de038924fe”}

One set of MQTT messages comes from my service, the other one from nanoTTS

Test 2: Chosing as the TTS service in the Rhasspy Web frontend “Hermes TTS”. Type that text again. I only get one voice output, on my system. However: Rhasspy does NOT accept the MQTT messages from my service, rather waits for some (!) return and goes into timeout after 30 seconds.

mosquitto_sub gives me

hermes/tts/say {“text”: “es ist 17 uhr 32”, “siteId”: “default”, “lang”: null, “id”: “1d84dddd-52a9-4d65-87c6-d69bd58b16fb”, “sessionId”: “”, “volume”: 1.0}

hermes/tts/sayFinished {“siteId”:“default”,“id”:“1d84dddd-52a9-4d65-87c6-d69bd58b16fb”,“sessionId”:""}
hermes/audioServer/default/playFinished {“sessionId”:“1d84dddd-52a9-4d65-87c6-d69bd58b16fb”,“id”:“1d84dddd-52a9-4d65-87c6-d69bd58b16fb”}

To very precise, what are observable differences apart from the disappointing result:

  1. Of course, the saved settings in profile.json differ in one line
    “system”: “nanotts” in Test 1 versus “system”: “hermes” in Test 2
  2. The JSON response from my system does not contain superfluous space characters, and the pairs are in a slightly different order. None of these should be relevant for JSON.

All suggestions made here in reply to my first post have been carefully checked and excluded. Final conclusion: In Rhasspy 2.5.11 seems to be a bug when the TTS system is set to “hermes” - proper responses are not recognized.

Regards

pah

Good find, looks like a defect indeed. I think the best way to proceed is to open an issue on github. The way you have written it down should be a good start to be able to reproduce it.

Now I’ve been able to reproduce the error. It only occurs with my installation if the “Speak” command is used on the WebGUI of the base.
The result is that RIFF data is indeed really being waited for. This looks more like an error in the processing of the web request.

[ERROR:2022-01-11 19:13:09,638] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1699, in api_text_to_speech
results = await asyncio.gather(*aws)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 1685, in speak
say_chars_per_second=say_chars_per_second,
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 617, in speak_sentence
handle_finished(), messages, message_types, timeout_seconds=timeout_seconds
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 995, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/rhasspy/usr/local/lib/python3.7/asyncio/tasks.py”, line 449, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[DEBUG:2022-01-11 19:12:39,649] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=bf0de8b5-a8c3-44b0-a520-3da7b1fa8ae1)
[DEBUG:2022-01-11 19:12:39,649] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=01758859-a148-432a-b4a7-7adc255b5db7)
[DEBUG:2022-01-11 19:12:39,648] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=645e0969-1bb6-4b23-aad6-b9ab792adabb)
[DEBUG:2022-01-11 19:12:39,648] rhasspyserver_hermes: Handling TtsSayFinished (topic=hermes/tts/sayFinished, id=db95c2ae-dec4-4dab-b5c5-d194b288d70d)
[DEBUG:2022-01-11 19:12:39,606] rhasspyserver_hermes: Publishing 139 bytes(s) to hermes/tts/say
[DEBUG:2022-01-11 19:12:39,605] rhasspyserver_hermes: → TtsSay(text=‘Das ist ein Test’, site_id=‘boden’, lang=None, id=‘2c22cf7f-1ab8-4ca3-980f-c8c33de72e99’, session_id=’’, volume=1.0)
[DEBUG:2022-01-11 19:12:39,603] rhasspyserver_hermes: TTS timeout will be 30 second(s)

So quickly adjusted the “small TTS” so that it looks like audio data is being sent:

#!/usr/bin/perl -w
use Net::MQTT::Simple “localhost”;
use JSON;
use strict;
use Net::MQTT::Simple;

my $mqtt = Net::MQTT::Simple->new(“localhost”);
$mqtt->run(
“hermes/tts/say” => sub {
my ($topic, $message) = @_;
my $tts_json = JSON->new->allow_nonref->decode( $message );
my $siteId = $tts_json->{siteId};
my $id = $tts_json->{id};
my $sessionId = $tts_json->{sessionId};
my $text = ‘"’.$tts_json->{text}.’"’;
my $ausgabe = {id => $id, sessionId => $sessionId, siteId => $siteId};
my $sayFinish = JSON->new->utf8->encode( $ausgabe );
my $bytes = “leer”;
my $bytesausgabe = {siteId => $siteId, requestId => $id, wav_bytes => $bytes};
my $playBytes = JSON->new->utf8->encode( $bytesausgabe );
qx(pico2wave -l “de-DE” -w test.wav $text);
$mqtt->publish(“hermes/tts/sayFinished” => $sayFinish);
qx(aplay test.wav);
my $playausgabe = {id => $id, sessionId => $sessionId};
my $playFinish = JSON->new->utf8->encode( $playausgabe );
$mqtt->publish(“hermes/audioServer/$siteId/playBytes/$id” => $playBytes);
$mqtt->publish(“hermes/audioServer/$siteId/playFinished” => $playFinish);
},
);

Now it’s also possible via the web interface of the basis:

Greetings, Jens

p.s. “Audio Playing” must be deactivated, otherwise the audio server will rightly complain

wave.Error: file does not start with RIFF id

waits for the info of the audio file to be played. This info should come from the audio server.

from rhasspyhermes.audioserver import (
AudioPlayBytes,

Here I ask you to switch the query to “TtsSayFinished” (according to Reference) or “AudioPlayFinished” (for testing the “Audio Playing”, if it defined). At least for the web api. The normal MQTT API is not affected by this.

https://github.com/rhasspy/rhasspy-server-hermes/blob/master/rhasspyserver_hermes/main.py#L1688f.
https://github.com/rhasspy/rhasspy-server-hermes/blob/master/rhasspyserver_hermes/init.py#L621ff.