Can't get TTS to work on Satellite

I have a setup with Rhasspy addon on Home Assistant and a Satellite Rhasspy with mic and speaker. Intents are handled by Home Assistant and are working and I can see the response coming back to my Satellite. I want to have TTS running on Satellite but am not hearing audio, even though audio works when I say the wake up word. I see in the log audiotoggleoff, I am assuming this is the culprit.

[DEBUG:2021-03-25 22:02:14,849] rhasspydialogue_hermes: Recognized NluIntent(input='what time is it', intent=Intent(intent_name='GetTime', confidence_score=1.0), site_id='Living_room', id=None, slots=[], session_id='Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8', custom_data=None, asr_tokens=[[AsrToken(value='what', confidence=1.0, range_start=0, range_end=4, time=None), AsrToken(value='time', confidence=1.0, range_start=5, range_end=9, time=None), AsrToken(value='is', confidence=1.0, range_start=10, range_end=12, time=None), AsrToken(value='it', confidence=1.0, range_start=13, range_end=15, time=None)]], asr_confidence=None, raw_input='what time is it', wakeword_id='blueberry_raspberry-pi', lang=None)
[DEBUG:2021-03-25 22:02:14,866] rhasspyserver_hermes: -> AudioToggleOff(site_id='Living_room')
[DEBUG:2021-03-25 22:02:14,868] rhasspyserver_hermes: Publishing 25 bytes(s) to hermes/audioServer/toggleOff
[DEBUG:2021-03-25 22:02:14,871] rhasspyserver_hermes: -> TtsSay(text='Time Intent worked from Home Assistant', site_id='Living_room', lang=None, id='5e7a4104-8335-4192-8a70-6fea3710af88', session_id='', volume=1.0)
[DEBUG:2021-03-25 22:02:14,875] rhasspyspeakers_cli_hermes: <- AudioToggleOff(site_id='Living_room')
[DEBUG:2021-03-25 22:02:14,877] rhasspyserver_hermes: Publishing 167 bytes(s) to hermes/tts/say
[DEBUG:2021-03-25 22:02:14,877] rhasspyspeakers_cli_hermes: Disabled audio
[ERROR:2021-03-25 22:02:37,524] rhasspydialogue_hermes: Session timed out for site Living_room: Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8
[DEBUG:2021-03-25 22:02:37,527] rhasspydialogue_hermes: -> AsrStopListening(site_id='Living_room', session_id='Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8')
[DEBUG:2021-03-25 22:02:37,528] rhasspydialogue_hermes: Publishing 113 bytes(s) to hermes/asr/stopListening
[DEBUG:2021-03-25 22:02:37,536] rhasspydialogue_hermes: -> DialogueSessionEnded(termination=DialogueSessionTermination(reason=<DialogueSessionTerminationReason.TIMEOUT: 'timeout'>), session_id='Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8', site_id='Living_room', custom_data='blueberry_raspberry-pi')
[DEBUG:2021-03-25 22:02:37,536] rhasspydialogue_hermes: Publishing 191 bytes(s) to hermes/dialogueManager/sessionEnded
[DEBUG:2021-03-25 22:02:37,542] rhasspydialogue_hermes: -> HotwordToggleOn(site_id='Living_room', reason=<HotwordToggleReason.DIALOGUE_SESSION: 'dialogueSession'>)
[DEBUG:2021-03-25 22:02:37,543] rhasspydialogue_hermes: Publishing 54 bytes(s) to hermes/hotword/toggleOn
[DEBUG:2021-03-25 22:02:37,562] rhasspyremote_http_hermes: <- AsrStopListening(site_id='Living_room', session_id='Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8')
[DEBUG:2021-03-25 22:02:37,563] rhasspyremote_http_hermes: <- AsrStopListening(site_id='Living_room', session_id='Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8')
[WARNING:2021-03-25 22:02:37,564] rhasspyremote_http_hermes: Session not found for Living_room-blueberry_raspberry-pi-380ac37b-93a8-4e62-bba3-486acc77bed8
[DEBUG:2021-03-25 22:02:37,568] rhasspywake_porcupine_hermes: <- HotwordToggleOn(site_id='Living_room', reason=<HotwordToggleReason.DIALOGUE_SESSION: 'dialogueSession'>)
[DEBUG:2021-03-25 22:02:37,569] rhasspywake_porcupine_hermes: Enabled
[DEBUG:2021-03-25 22:02:37,579] rhasspywake_porcupine_hermes: Receiving audio
[DEBUG:2021-03-25 22:02:44,911] rhasspyserver_hermes: -> AudioToggleOn(site_id='Living_room')
[DEBUG:2021-03-25 22:02:44,912] rhasspyserver_hermes: Publishing 25 bytes(s) to hermes/audioServer/toggleOn
[ERROR:2021-03-25 22:02:44,916] rhasspyserver_hermes: 
Traceback (most recent call last):
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1821, in full_dispatch_request
    result = await self.dispatch_request(request_context)
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/site-packages/quart/app.py", line 1869, in dispatch_request
    return await handler(**request_.view_args)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1700, in api_text_to_speech
    results = await asyncio.gather(*aws)
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1686, in speak
    volume=volume,
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 599, in speak_sentence
    handle_finished(), messages, message_types
  File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 971, in publish_wait
    result_awaitable, timeout=timeout_seconds
  File "/usr/lib/rhasspy/usr/local/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[DEBUG:2021-03-25 22:02:44,921] rhasspyspeakers_cli_hermes: <- AudioToggleOn(site_id='Living_room')
[DEBUG:2021-03-25 22:02:44,923] rhasspyspeakers_cli_hermes: Enabled audio

Living_room is the satellite rpi

How are you planning to send TTS to the Satellite?
Try setting it to Hermes MQTT if you are processing all of it on the Server and want to send it to the Satellite.

Furthermore, for the Satellite to play TTS you need the AudioPlaying component setup. As it will basically just play a sound file from the Soundcard. The Audio Recording is not required for TTS.

I set up my master Rhasspy to send TTS to HTTP on my Satellite Rhasspy. You can see this happening in the log using NanoTTS which creates a wav file. The problem is that I don’t hear that sound and I see the log AudioToggleOff.
The audio playing in the Satellite is working fine and is setup using aplay. When I say the wakeup word I can hear the beep acknowledging it, so it’s working.
I enabled audio recording on the Satellite so it would listen to commands.
I am not using MQTT, are you suggesting that I enable a shared MQTT or what ?

Anyone ?. I am stuck

Hi B0ndo2,
you’re not allone. I’m trying to get Server(Pi3B+) and a sattelite (respeaker-coreV2) running.
Im also using the HTTP connection between Server and sattelite and its maybee the same as in your
configuration. Wakeword on the sattelite works, command transfer to Server, analizing is ok, but tts
back to the sattelite throws an timeout.???

best regards
Luke

Yes, this is what is happening with the above configuration. I made some changes (on the base server TTS is local with nanoTTS and Audio playing is remote) since I posted this (still not working) and now am processing TTS on the base server and posting a wav file to the /api/play-wav on the satellite but getting another new error that it doesn’t like the wav file.
Can you try it ?

What is you Home Assistant config?

You mean Rhasspy add on in Home Assistant ?. It is in the screenshot in the OP. Intents in HA are configured and working fine but the problem is the feedback using Rhasspy TTS

No, I was asking how Home Assistant is responding on the intent.
You probably have some command posting the text as payload to a url?

A bit hard to see (for my at least), but the first screenshot is the server right?
The second is the satellite, correct?
The log is from the satellite?
Which settings are in the “Rhasspy Text-to-Speech URL” fields?

Assuming on the server you have the url set to the satellite, I think the setting on the satellite should be Google Wavenet or Espeak or something.

That way, the server posts it received text to the HTTP endpoint (the satellite) and the satellite should proces the text and play it.

I’ll check on the docs on this as well :slight_smile:

I think I’m incorrect, the Remote HTTP sends text the the endpoint and expects audio back.
So when you want to play the audio on your sat, the audio should be send to the sat.

I see you have the Text to Speech set to Disabled, did you also try and set it to Remote HTTP?
When reading the docs, I expect the text received on the satellite being send to the server and the audio back should be played.
The setting on the server should then be set to Espeak or any of the other systems.

I have a spare Pi, so I will try this as well. It is a bit shady what is actually happening so I might be able to get the correct settings :slight_smile:
I am using a esp32 as sat now with Hermes and it’s working fine.

Intents are working fine in HA, it’s the TTS that is not working. Right now I have Google TTS in my HA configuration and a VLC media player which I use to do the feedback from the intent in HA and here is the config in HA. The speech part is being sent to Rhasspy and this is where the error happens.

intent:
intent_script:
  GetTime:  
    speech:
      text: It's {{ states.sensor.time }}
    action:
      service: tts.google_say
      data:
        entity_id: media_player.pi
        message: It's {{ now().strftime('%I:%M %p') }}

The first screenshot is the server running in HA as an addon and TTS is enabled with Remote HTTP and the error in the OP is what I get. I tried changing TTS on the master to nanoTTS and playback to Remote HTTP (using the API end point in the satellite) and that gives another error in my satellite (complaining that the wav file is bad). The master (server) has no audio output capability and I need the feedback to be played on the satellite that sent the command
What I tried (and failed):

  1. satellite capturing the command while STT and intent happening in the master (works fine). Master receives speech, processes it and sends intent to HA (works fine). HA completes the intent action and sends back the text to master then master uses Remote HTTP to satellite to process TTS (fails as in the log in OP)

  2. satellite capturing the command while STT and intent happening in the master (works fine). Master receives speech, processes it and sends intent to HA (works fine). HA completes the intent action and sends back the text to master and the master does TTS and audio playback is sent via Remote HTTP to the satellite (fails at satellite which complains about invalid wav file)

Ok, rather then trying this I might have another approach.
I have this working with events

  • remove the speech: from the config.

  • add this as another action:

      service: rest_command.rhasspy_speak
      data_template:
        payload: It's {{ states.sensor.time }}
    

Add this in you configuration.yaml:

rest_command:
  rhasspy_speak:
    url: 'http://<satellite_ip>:12101/api/text-to-speech'
    method: 'POST'
    payload: '{{ payload }}' 
    content_type: text/plain
  • set the text to speech on the sat to Espeak or some other system (able to proces the text)

I am using something similar, you can dynamically set the satellite_ip but please try this.

The theory is:

  • the text is sent as payload to the text-to-speech engine on the satellite.
  • the sat plays the audio generated from its engine

From your idea I think option 1 should work, but it clearly does not.

This suggestion was one of the ideas I wanted to try, I will try it tonight or tomorrow latest. The only drawback on this is adding more configuration in HA that I would like to keep in Rhasspy.
Do you have any idea why the second option is failing ?

The second option seems like a defect to me. Both are defects I think.

There is another drawback with your setup, and that is when you want to install another satellite.
With both options, that is not possible because you can enter only 1 endpoint.

However, with the suggestion I made, you can dynamically set the host like this:

rest_command:
  rhasspy_speak:
    url: 'http://{{ siteid }}:12101/api/text-to-speech'
    method: 'POST'
    payload: '{{ payload }}' 
    content_type: text/plain

Change the added action to:

  service: rest_command.rhasspy_speak
  data_template:
    payload: It's {{ states.sensor.time }}
    siteid: '{{trigger.event.data._intent.siteId}}'

I am using event and automations so I have the trigger.event.data. For intents that should be something different, but I do not know the format yet.

Makes sense. I read somewhere that in order to send the speech to the correct satellite (with more than one) is to use a shared mqtt broker and set siteID but I dont have a broker in my setup.

I tested it without the siteid (using IP address) and it works. I am not sure how siteid will work since this is the name I set in Rhasspy and not the IP address ( do I have to setup a dns name for it?)

Yes, you would need to use the host in synch with the siteId

But in my opinion, working with events is much more flexible.
You can use this in an automation
siteid: '{{trigger.event.data._intent.siteId}}'
to get the siteid.
You can then use this:
/api/text-to-speech?siteId={{ siteid }}'
to get Rhasspy to use the correct siteId.
When you have multiple satellites, that is very useful.

I do not know how you can get to the intent with templates to get the siteId.

I haven’t played with events so I will have to look into it. Now that I have TTS working this way am not sure what am I gaining (or loosing) by not using the builtin way of Rhasspy. To be honest nanoTTS sounds horrible compared to use the free google TTS (which uses google translate).

Using the text-to-speech?siteId={{ siteid }} is also Rhasspy build-in :wink:

The whole url in the rhasspy_speak will be:
url: 'http://<serverip>:12101/api/text-to-speech?siteId={{ siteid }}'

When adding a new satellite, no config change is needed.
But if it is working now, I would only change if you need extra satellites or just because you want to experiment :slight_smile:

Also, for Rhasspy TTS you can set it to Google Wavenet, much better and in my opinion the best.
It caches text spoken, so will play the same sentence from cached wave file