DeepSpeech and TTS from Mozilla

CeKl · December 17, 2019, 9:42pm

Hi everybody,
is it possible to integrate Deepspeech or the TTS version of Mozilla?
The version 0.6 looks quite interesting.

synesthesiam · December 18, 2019, 2:41pm

Yes! Plans are to include an English and a German model for ASR. Not sure about TTS yet.

Rowesca1 · December 19, 2019, 9:22pm

Hi in my spare time i am testing rhasspy and was trying to get it to work with deepspeech. I got it working but it is a long way from production ready. If you anybody wants to help me out, always welcome.

fastjack · December 19, 2019, 9:50pm

There is also a French model for deepspeech 0.6 here:

Enjoy

synesthesiam · December 21, 2019, 3:51pm

Nice! Do you have any experience tuning the parameters of DeepSpeech?

fastjack · December 21, 2019, 3:59pm

Unfortunately not.

I tested the model last week but the WER is still pretty high for open transcription.

With a small language model created by Rhasspy it should be pretty good though.

goferito · January 10, 2020, 3:31pm

So is it possible to get this working? I wouldn’t mind spending some time playing with it if someone points me a bit in the right direction.

synesthesiam · January 10, 2020, 9:59pm

I believe @Rowesca1 above said they got it working. Any more details you can share, @Rowesca1?

Rowesca1 · January 14, 2020, 11:40am

i had a simple version working, however due to work and other things i had no time to work on it anymore. Hope to pick it up again real soon.

goferito · January 22, 2020, 2:10pm

Could you at least explain me a bit what you did so I can try to reproduce and continue? I will then share my progress

Libbum · January 27, 2020, 9:52am

I’d be up for working on this too. Don’t mind starting from scratch, but would probably be helpful if we could get your starting point @Rowesca1.

synesthesiam · January 27, 2020, 10:26pm

I recently came across this project. They seem to have some Docker builds with DeepSpeech that might be a good starting point.

voice · April 14, 2020, 7:59pm

I’ve tried deepspeech within ProjectAlice (another free software voice activated home assistant). But, unfortunately for me, I didn’t see much improvement in accuracy.

synesthesiam · April 14, 2020, 9:08pm

Work has started here: https://github.com/rhasspy/rhasspy-asr-deepspeech-hermes

voice · May 12, 2020, 2:11am

Since I’ve tried both ProjectAlice and Rhasspy I can say that maybe I had a microphone configuration problem. With Rhasspy I couldn’t get the wakeword to function and with ProjectAlice I couldn’t get the STT part to work. Now I get everything working with Rhasspy and I still have to try ProjectAlice again ((with DeepSpeech). So, it might be good and the problem I saw was elsewhere.

rolyan_trauts · May 12, 2020, 12:03pm

Deepspeech is OK but unless you use a Pi 4 its going to have some quite big latency accumalation.
Pi3 is less than x.5 realtime and zero might not be that bad as its single threaded anyway but core to core difference of the less than x.5 of P3+.

patrickjane · July 12, 2020, 11:16am

Maybe deepspeech 0.7 is bringing improvements?

Also I think that hardware on Pi level should have issues with most kinds of good STT engine anyway. Theres always going to be a tradeoff between accuracy and performance.

rolyan_trauts · July 12, 2020, 11:57am

Not yet as it runs in a single thread due to the model.
It is supposed to become multithread and already romping away with .8 since I last looked.

Really there should be no tradeoff between accuracy and performance and we are probably looking at edge devices in the wrong way.
You might get a single edge server running STT & TTS connected to multiple room devices which I call shelf devices as the reside on a shelf at the edge.

Shelf devices should be cross compatible and really don’t like whats happened with Rhasspy Satelite as its a repo bloat of huge size and packed full of non standard protocols that just don’t need to be there.
A shelf device just needs KWS and audio capability and a connection to a room threaded edge server and use common standard RPT, MQTT methods in ultra simple low load systems.

The adoption of Hermes protocols from the defunct Snips to accomodate a few developers rather than encapsulate user based software ideals has probably had a similar effect on Rhasspy, maybe not defunct but for some of us derailed and strangely for very little user advantage. Its prob time to stop trying to force the creation of non standard IP.

It already works faster than realtime on a Pi4 so with the diversification of use it already makes a Pi4 edge server.
Shelf devices don’t need to run STT or TTS so no tradeoff is required.

But yeah not keen on Rhasspy satelite and still waiting for Rhasspy Rover the small Galra drone

patrickjane · July 12, 2020, 12:12pm

How can there be not a tradeoff between accuracy and performance when good speech to text engines clearly need more resources/power than weaker ones, especially if more hardware hungry ML models are involved?

I don’t know about your setup, but I personally want to avoid having a more powerful server running 24/7 in my home, thats why I want to keep the whole voice assistant on a single raspberry pi. This worked pretty well with snips already, but still you clearly can’t compare this to the big cloud solutions like siri & alexa in terms of accuracy and reliability.

rolyan_trauts · July 12, 2020, 12:19pm

Like I say Pi4 already runs faster than realtime and is slated to be multithread.
Pi4 or above its very possible to run very reasonable STT & TTS and like also said with shelf devices they don’t actually need to run STT or TTS.

Also doesn’t matter Snips did or didn’t do well as Snips is dead.