Rhasspy 2.5.6 Released

Thank you for the good work.
After upgrading from 2.5.5 to 2.5.6 on a server (rpi 3b+) satellite (rpi rb+) installation, I get this error: NluException: file is not a database

Great, I have updated the Hassio Addon here:

2 Likes

Thanks for the latest update!
After upgrading from 2.5.5 to latest (2.5.6) i got the following error message after saving a new sentence:
“TrainingFailedException: file is not a database”

Hello,

After a day of test, it seems that the new satellites management (multi-site support for dialogue manager) works well :slight_smile:

Thank you, once again !

1 Like

What is being used to create the new tts voices?

If you’re using fuzzywuzzy, you may need to delete your existing intent_examples.json file and retrain. A fix for this should be coming soon.

A fork of MozillaTTS with the PyTorch backend.

1 Like

Thanks, I will try it.

1 Like

Thanks!!! :slight_smile: you have some links?! I would love to play with some of that

Hi Michael, all

That’s my first post here.

I love your project anI’m trying to disseminate it: https://twitter.com/solyarisoftware/status/1314151250716491778?s=20

I have a question. You say:

Multi-site support for dialogue manager

but what you mean with “dialogue manager” ?
I mean I do not reading about it in the (great) documentation here:
https://rhasspy.readthedocs.io/. I didn’t find anything related. My fault?

BTW 1, I’m obsessed with dialog management and I’m working on NaifJs, my opensource state-machine based DM.

BTW 2 , I’m from Italy and I’ll try to contribute as volunteer as soon I understand call to action in practice.

Thanks again for your beautiful project!
Giorgio

1 Like

Hi Giorgio, good to see you here :slight_smile: Thanks for your help in promoting Rhasspy!

The “dialogue manager” in Rhasspy is very close to what Snips had, which is not very sophisticated. It is essentially a session manager, where a client application can choose which intents are active at each step and invoke the text to speech system to prompt the user.

In Snips and previously in Rhasspy, only one session could be active at at time. As of 2.5.6, there can now be one session active per site (each client reports a siteId). This is supposed to allow people with multiple Rhasspy satellites in different rooms to interact with them simultaneously, assuming there’s a central server running Rhasspy’s speech-to-text, intent recognition, etc. services.

This looks pretty cool. Rhasspy works well with Node-RED, so it would seem that a NaifJs bot could be made to work by listening to the websocket events. With open transcription enabled, you could catch the spoken text at /api/events/text (websocket) and send the response back via /api/text-to-speech (HTTP).

I’ve almost completed a new Italian Kaldi speech model, thanks to a connection made by @adrianofoschi and public speech data from Common Voice and M-AI Labs.

My “call to action” for Italian is for single-person recordings to make a nice text-to-speech voice using a fork of MozillaTTS (in progress). I have a process for finding seemingly good sentences/prompts for a person to read, but I need help to verify that they aren’t weird or offensive – they come from the Internet after all.

Here are some samples of the Dutch voice I’m working on now: https://drive.google.com/file/d/1Fp9lc5eGpe5Xw8ACHXIgVGX5zTow7rpM/view?usp=sharing (credit to @rdh). Forgive the slightly robotic sound, it’s only been training for a day and is maybe 10-15% done :slight_smile:

2 Likes

Yes, I finally pushed a version up today! I’m calling it larynx after the technical term for the human voice box.

This uses my MozillaTTS fork as a git submodule. The only big changes to MozillaTTS are to use gruut for cleaning/phonemizing text. Note: right now only U.S. English and Dutch are supported. More languages are coming soon!

3 Likes

Hi Michael,
thanks for your detailed feedback!

Thanks for your help in promoting Rhasspy!

I “have to” give you back. Promoting is a first level :slight_smile:

  • Your work on no-cloud ASR/TTS “unification” of other opensources fill a gap.
  • You did a great simplification/dissemination of basic concepts, with beautiful graphics.
  • Rhasspy “resurrects” good defunct :frowning: SNIPS and last but not least, it could be act as a common OS of opensource/open-hardware smartspeakers,

The “dialogue manager” in Rhasspy is very close to what Snips had, which is not very sophisticated. It is essentially a session manager, where a client application can choose which intents are active at each step and invoke the text to speech system to prompt the user.

In Snips and previously in Rhasspy, only one session could be active at at time. As of 2.5.6, there can now be one session active per site (each client reports a siteId ). This is supposed to allow people with multiple Rhasspy satellites in different rooms to interact with them simultaneously, assuming there’s a central server running Rhasspy’s speech-to-text, intent recognition, etc. services.

Thanks for the clarification. I like the multi-room architecture!

This looks pretty cool. Rhasspy works well with Node-RED, so it would seem that a NaifJs bot could be made to work by listening to the websocket events. With open transcription enabled, you could catch the spoken text at /api/events/text (websocket) and send the response back via /api/text-to-speech (HTTP).

Well, I just opensourced NaifJs. Currently the project is still very rough / a very alfa-stage. The architecture I foreseen would be pretty independent from the “caller” channel so yes, I’ll build an HTTP / websocket (or hermes protocol maybe using github /snipsco/hermes-protocol/tree/master/platforms/hermes-javascript) channel adapter as you suggest. I’ll see the Node-RED integration. :ok_hand:

Digression: in NaifJs, there is something similar to the intent-based approach (you integrated in Rhasspy as an option), but that’s intended to be embedded as part of internal nodes (states) pattern matching (in practice I propose simple regexps as an option to dialogues developers). I will think about the integration and I’ll feedback you in a separated post :slight_smile:

I’ve almost completed a new Italian Kaldi speech model, thanks to a connection made by @adrianofoschi and public speech data from Common Voice and M-AI Labs.

That’s great! I didn’t know.

My “call to action” for Italian is for single-person recordings to make a nice text-to-speech voice using a fork of MozillaTTS (in progress). I have a process for finding seemingly good sentences/prompts for a person to read, but I need help to verify that they aren’t weird or offensive – they come from the Internet after all.

So it seems to me that you are looking for a voice actor that has to pronounce a list of utterances you prepared, right?

I guess that different voice signatures (female/male/other) would be a bonus.

Anyway, I’m personally available (and I could involve friends) to double-check / validate the content of Italian spoken recorded voices. Please let me know. Eventually I’m available to record a voice… (not sure) :face_with_hand_over_mouth:

Maybe, some manifesto / detailed description of that “call to action” would help. I’m available to share your needs on social media and my blog convcomp dot it, with a dedicated article.

Sorry again for digressions here.
Thanks
giorgio

1 Like

Any tutorials on how to create models? I work for a TV station and actually have access to TONS of close captioned media and would be interested in seeing if i can create a voice model from some of this data.

Which language are you planning to work with?

programming language or speaking language?
i have no real preference, typically something like python is where i lean for linux stuff

I meant speaking language :wink:

would be english :slight_smile:

1 Like

OK, I’ve added a Larynx tutorial to get you started. This is a first draft, so hopefully everything works!

3 Likes

Thanks for this new release, Rhasspy is an awesome project.

The link to leads to a random issue on github instead of the forum post: The Master Plan

1 Like