Donating voice, validating voice. Helping improve Rhasspy

Hello, I want to help Rhasspy to become a better tool, but I’m not a programmer, so, coding is out of my league…
Anyway, on the interview with the Matrix labs people, you said people could help donating voice right? I’ve registered to https://voice.mozilla.org/es and I’m helping with the read and listen the sentences there.

Will this, help improve performance on Rhasspy?
Will Rhasspy be able to integrate the voice.mozilla work?
Is there any other platform, that you think I could help reading and listening?

Thanks, if there’s anything that I could help with, just tell.

2 Likes

This will probably help to improve a lot more than Rhasspy.

Even if Rhasspy does not use Mozilla Common Voice data directly, it will help create better acoustic model for Spanish speakers.

This is awesome :+1:

Well as far as I remember it is planned to integrate DeepSpeech. Common voice is kind of the basis to train DeepSpeech so contributing helps a lot.

Having a public model like Common voice can only be a benefit.
The smaller exposure your native language gets in terms of global discourse the more important it is to donate some time, your voice and your ear, just for a while and add to your native language model.

Sure. I’ll be doing some Spanish, some Catalan, and even some English. As I understand, it is okay to donate voice, even if you’re accent is not native. I guess, if it’s not right enough, then, someone will just not validate my voice right?

If anyone thinks of anything that I could do to help, please go ahead and tell me, I’ll do my best

By the way, I’m having some issues by now with one word. It is “enciende”, that means “turn on”. If I speak that word slow, most of the times it gets it, but if I speak at my normal velocity, Rhasspy understand another thing. Any way to temporaly fix this?

Also, one question, the wakes word pronunciation must be in English right? It is quite hard for a Spanish to pronunciate the word porcupine right… I’m using snowboy, and most of the times it is working.

Thanks!

Looking at Alexa FAQ:

For example, we use your requests to Alexa to train our speech recognition and natural language understanding systems using machine learning. Training Alexa with real world requests from a diverse range of customers is necessary for Alexa to respond properly to the variation in our customers’ speech patterns, dialects, accents, and vocabulary and the acoustic environments where customers use Alexa.

@synesthesiam Maybe Rhasspy can provide a service to record utterance as WAV and associate it with a JSON file with the language and the recognized ASR transcript and intent from NLU to generate unit-tests (or even enrich voice datasets for acoustic model training). Anyone wishing to participate can run this opt-in service. Just thinking…

2 Likes

Hi @ivanpianobcn, welcome :slight_smile:

Likely yes, if you contribute to a language that Rhasspy currently doesn’t have a good model for. Spanish and Catalan (as you mentioned) are lacking anything besides a pocketsphinx model. With enough data, a DeepSpeech model could be built, which will hopefully perform better.

Librevox is another great platform that focuses on free e-books. I’ve done a bit of work training new MaryTTS voices from that dataset. The hardest part for me as only a speaker of English was getting proper alignments of speech and text. The process can be semi-automated, but it’s hard to do simple sanity checks when you don’t know the spoken word or script!

I like this idea, I’m just not sure where to put the data and how to ensure that people’s privacy is protected. More metadata would help with training, like the person’s age, gender, and anything that would influence accent (e.g., country of origin).

But I worry that such data could be exploited if it got into the wrong hands. If someone with experience in this sort of thing has any thoughts, I’d like to hear.

For the Mycroft Precise dataset that @ulno and I have talked about, the plan is to let Mycroft.ai handle that burden.

I don’t think there can be any problem as the opt-in will have to contain a disclaimer, that is opted in.
But the ‘Skill’ to opt in is only there to transfer KWS and its how that is ceated and the opt-in that is agreed, just needs to be fit for purpose.
But part of the setup is locale & language data so it quickly and easily creates model classification and capture.
But also it doesn’t have to start shipping out to remote model repo’s it can just be a option to collate recordings so that an owner can collate a dataset for own use easily over time with little input.

I have never done it myself but frequently read custom training with own data quickly improves accuracy.
I am not sure why this is not more often implemented as easy to use skills than the command line tutorials and methods given to accomplish this.

You can use a high load KWS as an authoritive server to train low load model sparse KWS that is hierarchical by nature, but those tiers available often don’t seem to be used.

Not sure of implementation but its a really good suggestion in general.

[Edit]
I was tunneling on KWS and blinkered but yeah same for STT also, but yeah what you send to public repos then gets slightly more complex.
Still you can create a local store for approval to send but the options and method to collate and use data/models are much harder and not of ease of use and should be, at least much easier.

As a first step, the data can be stored locally as « unit-tests ». Unit tests are great to build confidence in the ASR and NLU when making changes. I’ve hacked the one I created with the Snips console to ensure that I don’t break anything when tinkering with my intents, ASR and NLU systems.

When a real platform respecting users’ privacy exists (something like Mozilla Common Voice for intents maybe), people that have stored their unit-tests locally could upload them easily to contribute. This is a more long term goal though as building such platform could be time consuming and the legal stuff can be… hum.

With all the Rhasspy users speaking utterances with correct transcriptions in multiple language around the globe at all times, it is kind of a waste of dataset material… Storing a few hundreds of utterances as WAV (or maybe a smaller non destructive format) with a JSON should not occupy too much disk space and may be invaluable for the open source vocal assistant community in the long run.

Again… Just thinking… :thinking::blush:

1 Like

I completely agree with you as so much of what we do could be used to great effect with very little effort.

Also actually if you have set-up a config opt-in STT is also KW authorised on each sentance.

My initial thoughts as usual where the wrong way round as at the point of KWS that is just an open mic and requires KWS to become a STT sentence, which is also a vocal opt-in.

With STT there are ways to automate and increase model data that don’t inflict so much pain and effort to users.

Another engine can validate STT and if 2 different engines recognise the same intent then its likely that sentence doesn’t need human input.
The sentences that 2 different engines recognise differently should be categorised for further human input but greatly reduce input needs.
You could even trickle away with the fair usuage polices of some big data to help improve open-source private models if you so wished to.

There are a whole load of possible methods that with a little thought and participation and willingness from a community that with a little thought shouldn’t be excluded because of the some thought required to dodge potential pitfalls of the unknown.
If your opt-in says “hey we are community software, with no commercial interest but if we make some blunders we apologise but we require you to accept this disclaimer of responsibility”.
After that thats it, apart from our intrinsic responsibility to treat the community with respect.

My main worry comes from the scenario where a user has opted-in, but for some reason beyond their control, another entity (company, family member, etc.) has issued a legal takedown notice for data collected in some specific time period.

It probably won’t happen, but I wish there was a way of doing this via peer-to-peer (torrents) so there’s no central server that could be taken down by lawyers.

@synesthesiam dark model :slight_smile: Guess you could Tor it.

I think it would probably be easier to couple it to a database where the opt-in config had identity data such as an email or something.
That if this happens however unlikely its still very easy to remove submissions from the submission store and database.

I think much is about the disclaimer created and access to raw data, but the ability to opt-out can be easily achieved also.

1 Like

As far as I understood so far they aim to provide a wakeword some time soon (Hey Firefox) for a lot of languages:
Contribute to Common Voice’s first target segment