Recently I’ve been implementing a POC for interacting with Rhasspy using chat (XMPP) by configuring it as a satellite. The thing worked out surprisingly well when using Hermes (I could even send replies back via chat!!), although I had a small issue. I had two ways (that I knew of) to go for this:
- send a
DialogueStartSessionand, after receiving
DialogueSessionStarted, send an
AsrTextCapturedwith the text received from chat
- send a
NluQuerywith the text received from chat (emulating the text input in Rhasspy web UI)
Using option 2 seemed easier, but I’d lose session management, which is essential when the interaction has multiple steps. So I went for option 1.
AsrTextCaptured however triggers the “recorded” wav file, which takes away precious response time. I could have sent a fake
PlayFinished to trick Rhasspy into going ahead, but I just wanted to skip that step, so I created a convention for this:
Whenever AsrTextCaptured.wakewordId is null, Rhasspy will assume that the request didn’t come from voice so it will skip playing the “recorded” wav file
I implemented that in my fork. Currently though only the Google ASR module can fill the wakewordId field. Core developers, could this be a reasonable way to go? Can we use this or some other approach? Or am I going totally the wrong way?
The POC is based on AppDaemon. I’ll publish it soon to my public repository.
Related old topic: Chat support out of the box