What is needed in order to support hebrew voice recognition in rhasspy?
For speech recognition, I need recordings of different people speaking short phrases (one or two sentences) along with their text transcriptions. Around 20+ hours would be a good start, with more being better.
Mozilla Common Voice doesn’t appear to have any Hebrew yet, unfortunately. I found the CoSIH corpus, which has a good amount (still downloading so not sure how much or the quality).
Would you be interested in working together to add Hebrew support to Rhasspy?
I am not that techy,
I need to understand more, to see If I’m capable of doing it.
No need for being techy
The big need is transcribed speech data, and the fact that there’s none for Hebrew on Mozilla Common Voice means you have a unique opportunity to contribute!
The first step is adding Hebrew sentences to Mozilla’s Sentence Collector for people to read. These must be under a public domain license (CC-0). See the how to for suggestions of where to get sentences from.
The next step is getting people to read those sentences using the Common Voice website. A variety of speakers is important (gender, age), and the more data the better. Do you know anyone else who would be willing to contribute?
I’ve sent an e-mail to the guy who runs NLPH, which has speech resources for Hebrew. Let’s see if he has the time or interest to help us out.
I can give it a try.
Let me know if you got an answer from this guy?
Do I need a special equipment in order to record sentences?
No, just any regular microphone will be fine
If you’d like to help me train a Hebrew text to speech voice in the future, however, that would need a nice microphone like the Blue Yeti Nano. But one step at a time.
I’ll let you know. Any other internet communities you can think of to find volunteers?
I will dig into it at the coming weekend
He hasn’t responded, unfortunately. Maybe I’ll try to reach out to him directly on GitHub.
Meanwhile I am trying to find volunteers.
Got a response from Shay. He’s no longer working in the field, but he did offer to make a post on the Facebook group to ask for volunteers. @dror-israel, maybe you could help to coordinate with the group?
I don’t have a facebook, but i will contact them.
Lol, me either.
When you do contact them, let them know that I’m also working on getting speech data from Librivox audio books (they have a number of Hebrew books). So we won’t be starting from nothing!