Rhasspy 2.5.6 Released

synesthesiam · October 3, 2020, 3:17pm

After a few quiet months, Rhasspy 2.5.6 has finally been released! This release includes contributions from many community members, credited below

Two of the biggest features in this release:

Multi-site support for dialogue manager
- For MQTT base/satellite setups - allows each satellite to maintain an independent dialogue session. Previously, the newest session would cancel the previous one.
- A possible bug has already been found
“Text FST” language model type for Kaldi
- For very large Rhasspy grammars, the default (ARPA) language models may behave poorly by adding extra words
- In “Text FST” mode, the ASR system can only produce sentences from your sentences.ini file (no extra words or different word order). This is faster to train and more accurate if you only care about using your exact voice commands.

Thanks to everyone for contributing, testing, answering questions, and helping to make Rhasspy better!

Lastly, as I mentioned in The Master Plan, we’ll be looking for volunteers soon to donate their voices for new Rhasspy text to speech voices. We have volunteers for English, Dutch, and German currently. If you speak a different language, have a good microphone (like a Blue Yeti Nano), and are willing to license your recordings as public domain or Creative Commons, please let me know

Changelog

Added

Multi-site support for dialogue manager
Add “Text FST” language model type for Kaldi for strict grammar-based recognition
UDP audio settings in web UI for Pocketsphinx wake word system
Rudimentary SSML support in Google Wavenet TTS (digitalfiz)

Changed

JSON output from all services is no longer forced to be ASCII
fuzzywuzzy performance improvement by using sqlite database (maxbachmann)
Lots of documentation improvements (koen)
Strip commans from replaced numbers (“one thousand, one hundred”)
Improve rhasspy-nlu performance (maxbachmann)
Simplify Google Wavenet voice selection UI (Romkabouter)
Fix local command when not using absolute path (DeadEnd)

dror-israel · October 3, 2020, 8:35pm

Thank you for the good work.
After upgrading from 2.5.5 to 2.5.6 on a server (rpi 3b+) satellite (rpi rb+) installation, I get this error: NluException: file is not a database

romkabouter · October 3, 2020, 8:50pm

Great, I have updated the Hassio Addon here:

Tobias_Riesemann · October 4, 2020, 3:06pm

Thanks for the latest update!
After upgrading from 2.5.5 to latest (2.5.6) i got the following error message after saving a new sentence:
“TrainingFailedException: file is not a database”

farfade · October 4, 2020, 5:57pm

Hello,

After a day of test, it seems that the new satellites management (multi-site support for dialogue manager) works well

Thank you, once again !

CrankyCoder · October 4, 2020, 7:32pm

What is being used to create the new tts voices?

synesthesiam · October 4, 2020, 8:52pm

If you’re using fuzzywuzzy, you may need to delete your existing intent_examples.json file and retrain. A fix for this should be coming soon.

synesthesiam · October 4, 2020, 9:00pm

A fork of MozillaTTS with the PyTorch backend.

dror-israel · October 5, 2020, 6:40am

Thanks, I will try it.

CrankyCoder · October 5, 2020, 1:17pm

Thanks!!! you have some links?! I would love to play with some of that

solyarisoftware · October 8, 2020, 11:48am

Hi Michael, all

That’s my first post here.

I love your project anI’m trying to disseminate it: https://twitter.com/solyarisoftware/status/1314151250716491778?s=20

I have a question. You say:

Multi-site support for dialogue manager

but what you mean with “dialogue manager” ?
I mean I do not reading about it in the (great) documentation here:
https://rhasspy.readthedocs.io/. I didn’t find anything related. My fault?

BTW 1, I’m obsessed with dialog management and I’m working on NaifJs, my opensource state-machine based DM.

BTW 2 , I’m from Italy and I’ll try to contribute as volunteer as soon I understand call to action in practice.

Thanks again for your beautiful project!
Giorgio

synesthesiam · October 8, 2020, 6:11pm

Hi Giorgio, good to see you here Thanks for your help in promoting Rhasspy!

The “dialogue manager” in Rhasspy is very close to what Snips had, which is not very sophisticated. It is essentially a session manager, where a client application can choose which intents are active at each step and invoke the text to speech system to prompt the user.

In Snips and previously in Rhasspy, only one session could be active at at time. As of 2.5.6, there can now be one session active per site (each client reports a siteId). This is supposed to allow people with multiple Rhasspy satellites in different rooms to interact with them simultaneously, assuming there’s a central server running Rhasspy’s speech-to-text, intent recognition, etc. services.

This looks pretty cool. Rhasspy works well with Node-RED, so it would seem that a NaifJs bot could be made to work by listening to the websocket events. With open transcription enabled, you could catch the spoken text at /api/events/text (websocket) and send the response back via /api/text-to-speech (HTTP).

I’ve almost completed a new Italian Kaldi speech model, thanks to a connection made by @adrianofoschi and public speech data from Common Voice and M-AI Labs.

My “call to action” for Italian is for single-person recordings to make a nice text-to-speech voice using a fork of MozillaTTS (in progress). I have a process for finding seemingly good sentences/prompts for a person to read, but I need help to verify that they aren’t weird or offensive – they come from the Internet after all.

Here are some samples of the Dutch voice I’m working on now: https://drive.google.com/file/d/1Fp9lc5eGpe5Xw8ACHXIgVGX5zTow7rpM/view?usp=sharing (credit to @rdh). Forgive the slightly robotic sound, it’s only been training for a day and is maybe 10-15% done

synesthesiam · October 8, 2020, 9:09pm

Yes, I finally pushed a version up today! I’m calling it larynx after the technical term for the human voice box.

This uses my MozillaTTS fork as a git submodule. The only big changes to MozillaTTS are to use gruut for cleaning/phonemizing text. Note: right now only U.S. English and Dutch are supported. More languages are coming soon!

solyarisoftware · October 9, 2020, 4:08am

Hi Michael,
thanks for your detailed feedback!

Thanks for your help in promoting Rhasspy!

I “have to” give you back. Promoting is a first level

Your work on no-cloud ASR/TTS “unification” of other opensources fill a gap.
You did a great simplification/dissemination of basic concepts, with beautiful graphics.
Rhasspy “resurrects” good defunct SNIPS and last but not least, it could be act as a common OS of opensource/open-hardware smartspeakers,

The “dialogue manager” in Rhasspy is very close to what Snips had, which is not very sophisticated. It is essentially a session manager, where a client application can choose which intents are active at each step and invoke the text to speech system to prompt the user.

In Snips and previously in Rhasspy, only one session could be active at at time. As of 2.5.6, there can now be one session active per site (each client reports a siteId ). This is supposed to allow people with multiple Rhasspy satellites in different rooms to interact with them simultaneously, assuming there’s a central server running Rhasspy’s speech-to-text, intent recognition, etc. services.

Thanks for the clarification. I like the multi-room architecture!

This looks pretty cool. Rhasspy works well with Node-RED, so it would seem that a NaifJs bot could be made to work by listening to the websocket events. With open transcription enabled, you could catch the spoken text at /api/events/text (websocket) and send the response back via /api/text-to-speech (HTTP).

Well, I just opensourced NaifJs. Currently the project is still very rough / a very alfa-stage. The architecture I foreseen would be pretty independent from the “caller” channel so yes, I’ll build an HTTP / websocket (or hermes protocol maybe using github /snipsco/hermes-protocol/tree/master/platforms/hermes-javascript) channel adapter as you suggest. I’ll see the Node-RED integration.

Digression: in NaifJs, there is something similar to the intent-based approach (you integrated in Rhasspy as an option), but that’s intended to be embedded as part of internal nodes (states) pattern matching (in practice I propose simple regexps as an option to dialogues developers). I will think about the integration and I’ll feedback you in a separated post

I’ve almost completed a new Italian Kaldi speech model, thanks to a connection made by @adrianofoschi and public speech data from Common Voice and M-AI Labs.

That’s great! I didn’t know.

My “call to action” for Italian is for single-person recordings to make a nice text-to-speech voice using a fork of MozillaTTS (in progress). I have a process for finding seemingly good sentences/prompts for a person to read, but I need help to verify that they aren’t weird or offensive – they come from the Internet after all.

So it seems to me that you are looking for a voice actor that has to pronounce a list of utterances you prepared, right?

I guess that different voice signatures (female/male/other) would be a bonus.

Anyway, I’m personally available (and I could involve friends) to double-check / validate the content of Italian spoken recorded voices. Please let me know. Eventually I’m available to record a voice… (not sure)

Maybe, some manifesto / detailed description of that “call to action” would help. I’m available to share your needs on social media and my blog convcomp dot it, with a dedicated article.

Sorry again for digressions here.
Thanks
giorgio

CrankyCoder · October 9, 2020, 1:31pm

Any tutorials on how to create models? I work for a TV station and actually have access to TONS of close captioned media and would be interested in seeing if i can create a voice model from some of this data.

synesthesiam · October 9, 2020, 2:09pm

Which language are you planning to work with?

CrankyCoder · October 9, 2020, 2:10pm

programming language or speaking language?
i have no real preference, typically something like python is where i lean for linux stuff

synesthesiam · October 9, 2020, 2:11pm

I meant speaking language

CrankyCoder · October 9, 2020, 3:11pm

would be english

synesthesiam · October 9, 2020, 9:03pm

OK, I’ve added a Larynx tutorial to get you started. This is a first draft, so hopefully everything works!