DeepSpeech and TTS from Mozilla

,

Hi everybody,
is it possible to integrate Deepspeech or the TTS version of Mozilla?
The version 0.6 looks quite interesting.

Yes! Plans are to include an English and a German model for ASR. Not sure about TTS yet.

3 Likes

Hi in my spare time i am testing rhasspy and was trying to get it to work with deepspeech. I got it working but it is a long way from production ready. If you anybody wants to help me out, always welcome.

2 Likes

There is also a French model for deepspeech 0.6 here:

Enjoy :wink:

1 Like

Nice! Do you have any experience tuning the parameters of DeepSpeech?

Unfortunately not.

I tested the model last week but the WER is still pretty high for open transcription.

With a small language model created by Rhasspy it should be pretty good though.

2 Likes

So is it possible to get this working? I wouldn’t mind spending some time playing with it if someone points me a bit in the right direction.

1 Like

I believe @Rowesca1 above said they got it working. Any more details you can share, @Rowesca1?

i had a simple version working, however due to work and other things i had no time to work on it anymore. Hope to pick it up again real soon.

2 Likes

Could you at least explain me a bit what you did so I can try to reproduce and continue? I will then share my progress :slight_smile:

1 Like

I’d be up for working on this too. Don’t mind starting from scratch, but would probably be helpful if we could get your starting point @Rowesca1.

1 Like

I recently came across this project. They seem to have some Docker builds with DeepSpeech that might be a good starting point.

1 Like

I’ve tried deepspeech within ProjectAlice (another free software voice activated home assistant). But, unfortunately for me, I didn’t see much improvement in accuracy.

Work has started here: https://github.com/rhasspy/rhasspy-asr-deepspeech-hermes

2 Likes

Since I’ve tried both ProjectAlice and Rhasspy I can say that maybe I had a microphone configuration problem. With Rhasspy I couldn’t get the wakeword to function and with ProjectAlice I couldn’t get the STT part to work. Now I get everything working with Rhasspy and I still have to try ProjectAlice again ((with DeepSpeech). So, it might be good and the problem I saw was elsewhere.

Deepspeech is OK but unless you use a Pi 4 its going to have some quite big latency accumalation.
Pi3 is less than x.5 realtime and zero might not be that bad as its single threaded anyway but core to core difference of the less than x.5 of P3+.

Maybe deepspeech 0.7 is bringing improvements?

Also I think that hardware on Pi level should have issues with most kinds of good STT engine anyway. Theres always going to be a tradeoff between accuracy and performance.

Not yet as it runs in a single thread due to the model.
It is supposed to become multithread and already romping away with .8 since I last looked.

Really there should be no tradeoff between accuracy and performance and we are probably looking at edge devices in the wrong way.
You might get a single edge server running STT & TTS connected to multiple room devices which I call shelf devices as the reside on a shelf at the edge.

Shelf devices should be cross compatible and really don’t like whats happened with Rhasspy Satelite as its a repo bloat of huge size and packed full of non standard protocols that just don’t need to be there.
A shelf device just needs KWS and audio capability and a connection to a room threaded edge server and use common standard RPT, MQTT methods in ultra simple low load systems.

The adoption of Hermes protocols from the defunct Snips to accomodate a few developers rather than encapsulate user based software ideals has probably had a similar effect on Rhasspy, maybe not defunct but for some of us derailed and strangely for very little user advantage. Its prob time to stop trying to force the creation of non standard IP.

It already works faster than realtime on a Pi4 so with the diversification of use it already makes a Pi4 edge server.
Shelf devices don’t need to run STT or TTS so no tradeoff is required.

But yeah not keen on Rhasspy satelite and still waiting for Rhasspy Rover the small Galra drone :slight_smile:

How can there be not a tradeoff between accuracy and performance when good speech to text engines clearly need more resources/power than weaker ones, especially if more hardware hungry ML models are involved?

I don’t know about your setup, but I personally want to avoid having a more powerful server running 24/7 in my home, thats why I want to keep the whole voice assistant on a single raspberry pi. This worked pretty well with snips already, but still you clearly can’t compare this to the big cloud solutions like siri & alexa in terms of accuracy and reliability.

Like I say Pi4 already runs faster than realtime and is slated to be multithread.
Pi4 or above its very possible to run very reasonable STT & TTS and like also said with shelf devices they don’t actually need to run STT or TTS.

Also doesn’t matter Snips did or didn’t do well as Snips is dead.

1 Like