Could you guide me or provide pointers (URL links) to tutorials or examples on how to use Rhasspy 3 without Home Assistant (HA)?
Using Rhasspy 3 seems quite simplified with Docker and the Wyoming protocol, but I must admit that I don’t quite understand how to simply use this new “Wyoming” protocol in a non-Home Assistant environment.
If you have any information on using Wyoming directly or through Node-RED, for instance, as I do not use Home Assistant, I would be very interested.
Additionally, if you know how to connect Docker containers like Piper or Faster-Whisper to an MQTT server, I would appreciate any simple solutions you can suggest.
one has to write the client that understands the events generated by the components.
this is the home assistant assist feature
another you can look at is the wyoming satellite.
the event processor is in satellite.py
what do you mean by connect to mqtt
they send events over a tcp stream, socket.io like, so one would have to create a bridge. audio chunk events over mqtt don’t make sense tho. probably too slow as well
all of the components hear all the events, and only process the ones they want.
VAD hears the audio events, and emits a ‘i hear a voice’ event
HOTWORD heard the audio events, and waits til VAD’s event arrives before processing for hotword, and then it sends HOTWORD x detected
the client doesn’t send audio to the ASR until the hotword event is heard
etc etc
each component registers itself when used, so the client can query features and decide what to do.
with docker, you are just running a bunch of services which define their tcp url to connect with
they don’t know about each other at the docker level
the client connects to whatever they want. it reads from one and writes to another. the services write back their response
the audio input service reads from the audio program (usually arecord which is streaming audio buffers non stop), and writes audio_chunk events on its tcp pipe…
Thank you, @rexxdad, for your answers and suggestions. I’ll take a look at satellite.py, as I’ve quickly gone through the explanations.
What I’m still missing in my understanding is where the Wyoming hub is located. I’m probably writing this post too quickly, so I’ll take the time to read the documentation more thoroughly later.
But basically, the satellite listens and sends an audio file to be transcribed… but how do you retrieve the information? How does Wyoming-piper know that the satellite exists to play the sound?
From what I understand, it’s Home Assistant that controls and routes all the data streams. What I don’t get is how to organize everything without having a Home Assistant instance running.
in satellite the event handlerfor events back from server
async def event_from_server(self, event: Event) -> None:
"""Called when an event is received from the server."""
if AudioChunk.is_type(event.type):
# TTS audio
await self.event_to_snd(event)
elif AudioStart.is_type(event.type):
# TTS started
await self.event_to_snd(event)
await self.trigger_tts_start()
elif AudioStop.is_type(event.type):
# TTS stopped
await self.event_to_snd(event)
await self.trigger_tts_stop()
elif Detect.is_type(event.type):
# Wake word detection started
await self.trigger_detect()
elif Detection.is_type(event.type):
# Wake word detected
_LOGGER.debug("Wake word detected")
await self.trigger_detection(Detection.from_event(event))
elif VoiceStarted.is_type(event.type):
# STT start
await self.trigger_stt_start()
elif VoiceStopped.is_type(event.type):
# STT stop
await self.trigger_stt_stop()
elif Transcript.is_type(event.type):
# STT text
_LOGGER.debug(event)
await self.trigger_transcript(Transcript.from_event(event))
elif Synthesize.is_type(event.type):
# TTS request
_LOGGER.debug(event)
await self.trigger_synthesize(Synthesize.from_event(event))
elif Error.is_type(event.type):
_LOGGER.warning(event)
await self.trigger_error(Error.from_event(event))
# Forward everything except audio to event service
if not AudioChunk.is_type(event.type):
await self.forward_event(event)
there is some config that says whether TTS should be processed here at the satellite
wyoming hub? the wyoming core runs everywhere…it handles the communication to/from the other side
sat connects to one ‘server’ that has a shared services running maybe ASR and TTS
audio in , text out, if TTS is needed, then the SYnth service needs to process it to create wav data to be played
ARS send back the text ‘transcript’
client decides needs TTS here, calls the synthesize service… I assume it can tell what is local vs server
I haven’t got there yet in my analysis