Integration with Whisper and the Free Life Planner

Hi all,

Much thanks to you all for making an Alexa-killer. I’ve written a life planning system that uses a lot of temporal planning tech, NLU, LLMs, theorem proving, etc to help people improve their real-world security posture. For instance, help them save money, plan for financial contingencies, prep for emergencies, maintain their homes (or get into housing if need be, etc). I previously integrated Alexa, which took 4 days. I am now working up the courage to integrate Rhasspy, although Rhasspy is more impressive and also has a lot more moving parts. I’m a bit overwhelmed.

I need large vocabulary speech recognition, so like Whisper, I saw there is a Whisper integration. I don’t want to use any grammar, or if I do, the equivalent of “.*” - I then need to call the Free Life Planner’s API somehow with the voice command, and return the answer for the TTS. If there are any conversational dialog features, that would also be a plus.

Are there any tutorials for Whisper integration with Rhasspy, or something that does the same thing but is more real-time? Is there anyone who could answer questions if I get stuck? I am sorry to be daunted. I personally prefer using .deb for install, but can use the Docker, but I guess I’m afraid of API changes that would force me to rework everything. That’s been nice about Alexa, its API hasn’t really changed. However, my main FLP machine was recently compromised, and so I air-gapped it, hence the need to prematurely advance the Rhasspy integration timeline.

Thanks again for making Rhasspy available,

Andrew

ps. here are links to the Free Life Planner in case anyone is interested (working on releasing the latest update asap, but will probably take a lot of time):

1 Like

GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++ ← looks possibly interesting/useful in the context of Rhasspy. See also github’s yoheinakajima/babyagi - sw5park/Otto - and Torantulino/Auto-GPT and https://vicuna.lmsys.org/ as well.

Good news, I finally attempted integration with FLP, and it went really smoothly, only took a few hours. I used Code Llama a lot, such as in figuring out the existence of the open transcription mode. However, a couple of those hours were painful in that I had some trouble figuring out where to put the handler scripts that talk to FLP, for both intent recognition and intent handling. Since I’m using the Debian package, I had to put them in /home//.config/rhasspy/profiles/en/ and then use the full names in the Intent Recognition and Intent Handler program fields. If anyone is interested I will probably upload to GitHub in a bit the configuration and steps I took. The quality of Kaldi is good, but I may eventually try to see about using Whisper. I will now update that Reference Manual to reflect the fact that I’m using Rhasspy. It’s a great system overall, and it will help me to stay secure. Thanks!!!

I used the Whisper plugin, it’s working great. BTW, anyone try integrating Bark TTS?

I’m going to attempt to use a Local Command option for that.

1 Like