Building Virtual Assistant on Twitch

Posting this in Help because I know I’ll have questions as I’m building/streaming. I’ll be streaming development of a virtual assistant using Rhasspy, Home Assistant, and Node-RED a few days a week if anyone is interested in following along (and helping :slight_smile:)

I’m also chatting about the virtual assistant in discord on the #virtual-assistant channel

1 Like

Hi there,

I checked the video, but it does not seem to have a plan whatsoever. It is you figuring out what Rhasspy is.
If that is the goal, fine, but I think you should be better prepared if you want to create to kind of instruction video.
There is a lot of reading you do on screen, not very appealing for the viewer :slight_smile:
Also, please explain why you use Rhasspy, Home Assistant and Node Red. Node Red is great, but add an extra layer since it almost if not all can be done in Home Assistant as well.

Good luck with the next video :slight_smile:

Hey! Thanks for watching. I’ve got a Trello board here with plans and costs:

In the first video from Tuesday I look at Rhasspy vs Mycroft. The Trello board should give you a good idea of “why Node-RED and HA”. Basically I’m going to want to create “skills” based on external APIs and services. I want Home Assistant todo what it does best, assist the home.

Hey Matt, Rhasspy author here (Mike). I’m enjoying watching your videos! This is giving me some great feedback about what I need to fix with the web UI for first time users :slight_smile:

Most people probably want to set up Rhasspy like an Alexa, so I think there needs to be a “one-click” method when you first start up that just selects all of the recommended systems. You fortunately found the “best” setup for most people – Porcupine, Kaldi, and Fsticuffs (+ Dialogue Manager).

Answers to some questions I saw so far:

  • Why is the fallback TTS for Google Wavenet gone?
    • In 2.5, each TTS system is a separate service now (everything was previously one big Python program). We haven’t gotten multiple services working together yet, but it’s on the roadmap. This should enable fallback TTS as well as (hopefully) multiple speech to text languages simultaneously.
  • Why is there no Websocket API for text to speech, etc.?
    • There is, sort of. You can publish any MQTT message into /api/mqtt via Websocket.

Let me know if you have any more questions, or get stuck. Thanks for having the patience to work through the difficult parts of setting up Rhasspy, and I look forward to seeing more videos!

P.S. Something that is totally not obvious: the timeouts that you were seeing from text to speech were because you hit “Speak” before the TTS service had started. There’s no indication of this besides the text vomit from the logs, so I apologize :laughing:

Mike! I appreciate you dropping in and leaving a note. I’m glad the streams are useful for this perspective!

Your feedback makes sense :smile: fallback TTS would be great! I may have NodeRED trigger a wav file if no internet is found that says it can’t use online services. Since Wavenet files are cached I can pull it from there.

I ended up using MQTT for TTS at the very end of the stream! I had some problems after the stream getting Rhasspy to run as a service, kept getting MQTT not connected issues. Ended up getting it to run in Docker and got it to the exact same spot where I left off last stream. I’m going to talk a bit about that in the next stream (probably later today).

Thanks again for the feedback from the streams. Always feel free to pop in and throw out ideas or be like “wth are you thinking?!” :joy:

@synesthesiam any idea why intentNotRecognized isn’t coming through when I try and say gibberish? It’s like it’s always trying to match it to an intent, but having intentNotRecognized through Dialogue Manager will be key for future features I’m wanting to add. I’m using Snowboy, Kaldi, Fsticuffs, and Rhasspy (Dialogue Management).

Yes. The Kaldi service doesn’t currently get transcription likelihoods back from the decoder, so it has no way to reject a sentence below threshold. This is something I’m hoping to get into the next release.

Pocketsphinx can do this, but it’s not as fast or accurate as Kaldi.

1 Like

Yes. The Kaldi service doesn’t currently get transcription likelihoods back from the decoder, so it has no way to reject a sentence below threshold. This is something I’m hoping to get into the next release.

If there is any place to vote the feature, i would do :wink:

I’ve opened an issue here:

1 Like