Deepspeech Pi3 single thread only and a bit slow

rolyan_trauts · May 11, 2020, 11:26pm

I have been playing with version 0.7 of Deepspeech and its true on the Pi4 its faster than realtime but Pi3 remains less than .5x realtime.

Its sort of strange as Deepspeech at least with TFlite on a Pi runs single threaded and its something to do with model limitations.
Shame really as with these tailored Pi TFlite wheels Pinto manages 2.5x perf gains via cores and even the good old Pi3 could be faster than realtime.

I posted the above as the stuff he provides is really great.

voice · May 12, 2020, 1:38am

[quote=“rolyan_trauts, post:1, topic:961”]
Prebuilt binary for TensorflowLite’s standalone installer. Fast tuning with MultiTread. For RaspberryPi. A very lightweight installer. - PINTO0309/TensorflowLite-bin
[/quote] Emphasis mine

Hey, I thought that there were no DNN that could do inference in parallel (multithreaded). That should be great, but it didn’t work?

I’m looking at two things, currently: 1) a fast and nice TTS (I found really nice ones from Mozilla, but none capable of running on Pi yet. Maybe Fowardtacotron + some fast vocoder)… 2) A better STT than pocket sphinx.

Have you played with TTS too, or only STT?

rolyan_trauts · May 12, 2020, 11:16am

I am not sure about about DNN parallel inference as think its more of a case of existing DNN frameworks, such as Caffe, TensorFlow and Torch, only provide a single-level priority, one-DNN-per-process execution model and sequential inference interfaces.
Not sure if its true DNN parellism isn’t possible just that heavyweight research is backed by heavyweight hardware but even Google TPUs have cores?
There are articles but yeah seems to be a lack of frameworks but really don’t know.

I have been eagerly awaiting Deepspeech and was also really happy to hear about the single function VAD/KWS/STT engine and then started to realise actually thats a bit of a stinky option for many of the envisaged uses for the Pi.
I guess the assumption is that we will be using some form of TPU/NPU accelerator.

I haven’t bothered with Tacotron especially Tacotron 2 as its seems it needs some heavy lifting as does Melotron https://nv-adlr.github.io/Mellotron
But harmonsing singing TTS is just amazing and with luck AI will kill Justin Bieber
Here is Melotron doing Adele

http://docs.google.com/uc?export=open&id=1MMQiBMoc390VAHW78aE9krELfdNi-avy

Pico by Svox is by far the best lightweight TTS but closed source with limited language models.
So apart from wondering how the hell did svox do that and why others are so ‘robotic’ or seem heavy.
So if I got it wrong about tacotron and tacotron2 in respect to Pi load please comment but frustrated how commercial TTS does seem to have the lightweight options.

But back to CNNs that do use and beginning to think huge model collection is fubar and that continuous learning is likely to be a thing as custom training and pruning can greatly improve accuracy and with low cost cloud TPU access for a few $ and sveral minutes you can have your new model returned.

I posted as much in