Rhasspy3 without Home Assistant

Hello,

Could you guide me or provide pointers (URL links) to tutorials or examples on how to use Rhasspy 3 without Home Assistant (HA)?
Using Rhasspy 3 seems quite simplified with Docker and the Wyoming protocol, but I must admit that I don’t quite understand how to simply use this new “Wyoming” protocol in a non-Home Assistant environment.

If you have any information on using Wyoming directly or through Node-RED, for instance, as I do not use Home Assistant, I would be very interested.
Additionally, if you know how to connect Docker containers like Piper or Faster-Whisper to an MQTT server, I would appreciate any simple solutions you can suggest.

Thank you very much.

one has to write the client that understands the events generated by the components.

this is the home assistant assist feature

another you can look at is the wyoming satellite.

the event processor is in satellite.py

what do you mean by connect to mqtt
they send events over a tcp stream, socket.io like, so one would have to create a bridge. audio chunk events over mqtt don’t make sense tho. probably too slow as well

all of the components hear all the events, and only process the ones they want.

VAD hears the audio events, and emits a ‘i hear a voice’ event
HOTWORD heard the audio events, and waits til VAD’s event arrives before processing for hotword, and then it sends HOTWORD x detected
the client doesn’t send audio to the ASR until the hotword event is heard

etc etc

each component registers itself when used, so the client can query features and decide what to do.

no VAD, then don’t wait for it event, etc

with docker, you are just running a bunch of services which define their tcp url to connect with

they don’t know about each other at the docker level

the client connects to whatever they want. it reads from one and writes to another. the services write back their response

the audio input service reads from the audio program (usually arecord which is streaming audio buffers non stop), and writes audio_chunk events on its tcp pipe…

Hello,

Thank you, @rexxdad, for your answers and suggestions. I’ll take a look at satellite.py, as I’ve quickly gone through the explanations.

What I’m still missing in my understanding is where the Wyoming hub is located. I’m probably writing this post too quickly, so I’ll take the time to read the documentation more thoroughly later.

But basically, the satellite listens and sends an audio file to be transcribed… but how do you retrieve the information? How does Wyoming-piper know that the satellite exists to play the sound?

From what I understand, it’s Home Assistant that controls and routes all the data streams. What I don’t get is how to organize everything without having a Home Assistant instance running.

in satellite the event handlerfor events back from server

 async def event_from_server(self, event: Event) -> None:
        """Called when an event is received from the server."""
        if AudioChunk.is_type(event.type):
            # TTS audio
            await self.event_to_snd(event)
        elif AudioStart.is_type(event.type):
            # TTS started
            await self.event_to_snd(event)
            await self.trigger_tts_start()
        elif AudioStop.is_type(event.type):
            # TTS stopped
            await self.event_to_snd(event)
            await self.trigger_tts_stop()
        elif Detect.is_type(event.type):
            # Wake word detection started
            await self.trigger_detect()
        elif Detection.is_type(event.type):
            # Wake word detected
            _LOGGER.debug("Wake word detected")
            await self.trigger_detection(Detection.from_event(event))
        elif VoiceStarted.is_type(event.type):
            # STT start
            await self.trigger_stt_start()
        elif VoiceStopped.is_type(event.type):
            # STT stop
            await self.trigger_stt_stop()
        elif Transcript.is_type(event.type):
            # STT text
            _LOGGER.debug(event)
            await self.trigger_transcript(Transcript.from_event(event))
        elif Synthesize.is_type(event.type):
            # TTS request
            _LOGGER.debug(event)
            await self.trigger_synthesize(Synthesize.from_event(event))
        elif Error.is_type(event.type):
            _LOGGER.warning(event)
            await self.trigger_error(Error.from_event(event))

        # Forward everything except audio to event service
        if not AudioChunk.is_type(event.type):
            await self.forward_event(event)

there is some config that says whether TTS should be processed here at the satellite
wyoming hub? the wyoming core runs everywhere…it handles the communication to/from the other side

sat connects to one ‘server’ that has a shared services running maybe ASR and TTS
audio in , text out, if TTS is needed, then the SYnth service needs to process it to create wav data to be played

ARS send back the text ‘transcript’
client decides needs TTS here, calls the synthesize service… I assume it can tell what is local vs server
I haven’t got there yet in my analysis

No, I can’t quite get it to work; I’m definitely missing some understanding here :wink: I have Wyoming for the wake word, TTS, and STT… I can get responses from TTS, even though I can’t easily read the return file. But getting all these components to work together without Home Assistant is where I’m stuck. There seems to be a missing routing layer between all these services or some kind of documentation for beginners like me on how to interact with all these elements. (It was much simpler with version 2.5.)
My satellite seems operational, but honestly, it’s not useful since it doesn’t know what to do. Aside from making curl requests and handling responses, I can’t get anything concrete. It’s really unclear how to interact without a central hub.

If I summarize, I’m just missing a brick that is operated by home assistant, do you know if there is a wyoming module project for node red. this module could replace the missing link.

wyoming nodered implementation ??

so, your app wants text commands .

so you start up the components that can deliver that
and listen to the asr events that give you the text.

and im guessing you want to respond to the user

so you have some text to send, send it yo the TTS converter, and then send that audio yo the plater at wherever the user is.

there is no magic here

time to get moving on my app replacement w wyoming. i finally just got it myself!

each function is a little service server

mic outputs aufio_chunks
vad takes in audio_chunks and signals voice or not
hwd takes in audio and signals hw dectected
asr takes in ausio chunks and returns text
tts takes in text and produces audio chunks
snd takes in audio chunks and plays on speakers
the intent handlers take text and DO something with it (turn on/off a light)

here is a good description of the services and their inputs and outputs

but each thing doesn’t make the decision about the next… some router/manager/collaboration component needs to do that
HA has one built in
Satellite is targeted at remote devices using central voice reco and surfacing text for intent handling

my current app (smart-mirror) uses snowboy for hw detection and google for speech reco
and has its own intent handlers
but snowboy is getting harder and harder to build, so there has to be a replacement

I think I can make a manager for all these with the satellite as an example.
just have to decide if I want to do a local docker type implementation, custom build the docker-compose file (and docker has to be installed locally)

or the native install type, where I spin up the components locally (so have to have python and the parts installed on local machine)

the connection and buffer handling is the same either way
and I can get TTS which I want to provide…
now I need another service which can inject the tts request without the upper app knowing where the requests are serviced. (its all written in nodejs, inside a browser)

I see that satellite really has only two local services, mic and speaker/snd
and everything else go the the event service (server)
audio => wakeword triggers, stt-> Transcribe, (transcript is consumed on the HA side)
audio <= from tts (synthesize)

@pito so I have mine working… have a look

another dockerable container that does all the audio flows
mic->hw->asr —> to tts–snd for test)

my need is to get triggered when hotword is detected and get the text spoken after that
so I have a socketio server which the container connects to and will emit those events when they happen

today there is a backgroud JS app started which does all that , spinning the mic/snowboy/asr to emits on stdout read by the node code…

no doc yet, and still a few bumps…

git clone GitHub - sdetweil/wyoming-sonus: wyoming voice control to replace sonus under smart-mirror
script/setup to get all the python stuff installed
example run

script/run --config=./sonushandler/config.json --debug

config.json points to where the other services are, and have settings to customize them

the docker-compose for starting this

version: "3.8"
services:
#
# this is audio in 
#  
  microphone:
    image: "$mic_image"
    ports:
      - "10600:10600"
    devices:
      - /dev/snd:/dev/snd
    group_add:
      - audio
    command:
      - "--device"
      - "$mic_device"
#
# this is audio out
#      
  playback:
    image: "$play_image"
    ports:
      - "10601:10601"
    devices:
      - /dev/snd:/dev/snd
    group_add:
      - audio
    command:
      - "--device"
      - "$play_device"

#
#  these are wakeword services, select one
#      
  openwakeword:
    image: "rhasspy/wyoming-openwakeword"
    ports: 
      - "10400:10400"
    volumes:  
      - $HOME/$SMPATH/services/data/ww/openwakeword:$wakeword_config_folder 
    command:   
      - "--preload-model"
      - "ok_nabu"
      - "-custom-model-dir"
      - "$wakeword_config_folder"
  snowboy:
    image: "rhasspy/wyoming-snowboy"
    ports: 
      - "10400:10400"
    volumes:  
      - $HOME/$SMPATH/services/data/ww/snowboy:$wakeword_config_folder  
    command:   
      - "--sensitivity"
      - "0.5"
      - "--data-dir"
      - "$wakeword_config_folder"    
      - "--debug"
  porcupine:
    image: "rhasspy/wyoming-porcupine1"  
    ports:
      - "10400:10400"
    volumes:  
      - $HOME/$SMPATH/services/data/ww/porcupine:$wakeword_config_folder
    command:   
      - "--sensitivity"
      - "0.5"
      - "--system"
      - "$SYSTEM_TYPE"
      - "--data-dir"
      - "$wakeword_config_folder"      
#
# these are speech to text (stt), select one
#      
  asr_google:
    image: $asr_google_image
    ports:
      - "10555:10555"
    volumes:
      - $HOME/$SMPATH/services/data/asr/google:/config

#
# these are text to speech  (tts)
# 
  tts_piper:
    image: rhasspy/wyoming-piper 
    ports:
      - "10200:10200" 
    volumes:
      - $HOME/$SMPATH/services/data/tts/piper:/data      
    command:
      - "--voice"
      - "en_US-lessac-medium" 
          
#
# this is the coordinator 
#
  sonus:
    image: $sonus_image 
    #ports:
    #  - "$sonus_app_port:8080"
    volumes:
      - $HOME/$SMPATH/services/data:$sonus_config_path
    command:
      - "--config"
      - "$sonus_config_path/$sonus_config_file"
      - "--socket_address"
      - "$sonus_app_address"
    depends_on:   
      # audio in
      - microphone
      # hotword
      - snowboy
      # speech to text
      - asr_google    
      # tect to speech  
      - tts_piper
      # audio play out
      - playback      

and the .env file in the same folder that docker-compose will use

SMPATH=temp/sssss/smart-mirror
SYSTEM_TYPE=Linux
# play
#play_image=3399c39ea9a9
play_image=wyoming_snd_external:latest
play_device=plughw:0,8
# mic 
#mic_image=3a09c0bb8f9c
mic_image=rhasspy/wyoming-mic-external:latest
mic_device=plughw:1,0
asr_google_image=wyoming-google:1.0.0
sonus_image=rhasspy/wyoming-sonus:latest
sonus_app_address=http://sams:$sonus_app_port
sonus_config_path=/config
sonus_config_file=/config.json
wakeword_config_folder=/custom

I built all the components locally to test out and see what was going on… used the local image id or name (in the env file)
then sonus: service is this github project
so I will have a nodejs app that configures the compose and config file
and talk to the sonus container to get notices of hotword and text
and also allow smart-mirror plugins to send text to be spoken over the socketio.

right now it speaks the text out of the asr after Syntheazsing the audio
you can compose up which ever services you want
docker-compose -f smart-mirror-services.yml up microphone playback asr_google tts_piper snowboy
while I have been developing…

the asr_google speech reco is here GitHub - sdetweil/wyoming-google

found a bug in mic (doesn’t stop the recorder when the client disconnects)
then when the client reconnects, oops, recorder won’t start (device in use) . so you have to cycle the mic container between tests

i stole the task processors from satellite and added more for asr/tts and snd
all the workflow work is done in handle_event

I updated to provide the connection to the outside app… , changed to events from socketio

see the readme

I just added support for partial transcript results (my google stt service can do that, and my app uses it)