Hebrew voice recognition

What is needed in order to support hebrew voice recognition in rhasspy?

Hi @dror-israel,

For speech recognition, I need recordings of different people speaking short phrases (one or two sentences) along with their text transcriptions. Around 20+ hours would be a good start, with more being better.

Mozilla Common Voice doesn’t appear to have any Hebrew yet, unfortunately. I found the CoSIH corpus, which has a good amount (still downloading so not sure how much or the quality).

Fortunately, there is already a Hebrew phonological analysis, but I can’t tell if the Hebrew Wiktionary has pronunciations in the International Phonetic Alphabet.

Would you be interested in working together to add Hebrew support to Rhasspy?

I am not that techy,
I need to understand more, to see If I’m capable of doing it.

No need for being techy :slight_smile:

The big need is transcribed speech data, and the fact that there’s none for Hebrew on Mozilla Common Voice means you have a unique opportunity to contribute!

The first step is adding Hebrew sentences to Mozilla’s Sentence Collector for people to read. These must be under a public domain license (CC-0). See the how to for suggestions of where to get sentences from.

The next step is getting people to read those sentences using the Common Voice website. A variety of speakers is important (gender, age), and the more data the better. Do you know anyone else who would be willing to contribute?

I’ve sent an e-mail to the guy who runs NLPH, which has speech resources for Hebrew. Let’s see if he has the time or interest to help us out.

I can give it a try.
Let me know if you got an answer from this guy?
Do I need a special equipment in order to record sentences?

No, just any regular microphone will be fine :slight_smile:

If you’d like to help me train a Hebrew text to speech voice in the future, however, that would need a nice microphone like the Blue Yeti Nano. But one step at a time.

I’ll let you know. Any other internet communities you can think of to find volunteers?

I will dig into it at the coming weekend


He hasn’t responded, unfortunately. Maybe I’ll try to reach out to him directly on GitHub.

Meanwhile I am trying to find volunteers.

Got a response from Shay. He’s no longer working in the field, but he did offer to make a post on the Facebook group to ask for volunteers. @dror-israel, maybe you could help to coordinate with the group?

I don’t have a facebook, but i will contact them.

Lol, me either.

When you do contact them, let them know that I’m also working on getting speech data from Librivox audio books (they have a number of Hebrew books). So we won’t be starting from nothing!