/api/speech-to-intent with compressed audio (ogg)

setzer22 · February 21, 2020, 6:38pm

Hi all!

First of all, I’d like to take the chance to thank the developers for this amazing project!

I’m running rhasspy in an external VPS, and I’m developing an application that sends wav files via http using rhasspy’s /api/speech-to-intent method.

I was wondering, though, whether there’s a way to make rhasspy accept ogg as compressed audio on that (or another similar) endpoint.

Is this feature available? This would allow me to save huge amounts of bandwidth.

Thanks!

synesthesiam · May 27, 2020, 8:39pm

Not currently possible, but might be a good idea. What’s the best way to convert ogg to wav?

JGKK · May 27, 2020, 8:44pm

I would think probably ffmpeg on a commandline level.

Dakes · September 23, 2020, 7:54pm

I’d also like to see this. I wrote a Tasker script that records voice when I press my Bixby button, but unfortunately Tasker only does 3gpp and mp4. If I trigger an external app it sometimes take very long for the app to start/stop.

synesthesiam · September 26, 2020, 3:40pm

I’m thinking we could do this pretty easily with ffmpeg if the HTTP Content-Type field is filled out appropriately when you POST to /api/speech-to-intent (or /api/speech-to-text).

It looks like ffmpeg supports 3gp audio, so this should work as long as we include it in the Docker image.