Rhasspy 2.5.5 Released

Since 2.5, development has continued along at a steady pace. The overall architecture has stabilized, and we’re starting to see more success stories and skills/apps pop up :slight_smile:

If you haven’t yet, check out @koan’s Awesome Rhasspy page for the latest and greatest from the community.

Rhasspy 2.5.5

I’m trying to get back in the habit of keeping a CHANGELOG instead of always just moving on to the next shiny thing :robot:

Raven

The biggest addition since 2.5.0 has been Raven, a new wakeword system based on Snips’ wake word detector. Big thanks to @fastjack for his node-personal-wakeword, which provided many juicy implementation details.

I just put up a Raven tutorial in the docs, explaining how to train it with your custom wake word. Make sure to restart Rhasspy after recording your examples!

You can record multiple wake words as well, and they should only be activated by the person that recorded them. So you can use the wake word as a kind of “speaker detection”.

Long term, Raven is largely intended to be a way of recording examples of yourself saying your wake word. If Raven works well enough for you, that’s great. If not, you can use it for a few weeks to build up a database of audio examples that can then serve as training material for a more sophisticated system (like Mycroft Precise).

More TTS

There is now support for nanoTTS, Google Wavenet, and OpenTTS (with support for MozillaTTS (Docker image)).

I’m in the process of training my own MozillaTTS model on a small dataset kindly donated by my mother :heart: to get an idea of how it works. MozillaTTS development is moving so fast that it’s hard to find a stable TTS/vocoder combination.

If I can get a decent English model trained, I’ll work towards training models for Rhasspy’s other supported languages. Community voice data donations would be highly welcomed! I used the CMU arctic prompts, since they’re supposedly phonetically balanced. Does anyone know of such a thing for other languages?

Coming Soon

A few things did not make it into this release, but are on the horizon:

  • SnipsNLU intent recognition
    • Mostly working, but having difficulty getting it packaged into the Docker image and Debian installers.
  • Energy-based silence detection
    • Lets you use audio energy or energy ratio to decide when voice command is over
    • Thresholds can be tuned to work fine with background fan noise
  • “Sounds-like” pronunciations
    • Lets you describe how a word is pronounced using known words or even just pieces of known words
  • Strict grammar mode for Kaldi
    • Ensures that Kaldi STT system only ever outputs sentences from your sentences.ini
    • Less flexible, but might work better for really large assistants with many slot values
14 Likes