/api/speech-to-intent with compressed audio (ogg)

Hi all!

First of all, I’d like to take the chance to thank the developers for this amazing project!

I’m running rhasspy in an external VPS, and I’m developing an application that sends wav files via http using rhasspy’s /api/speech-to-intent method.

I was wondering, though, whether there’s a way to make rhasspy accept ogg as compressed audio on that (or another similar) endpoint.

Is this feature available? This would allow me to save huge amounts of bandwidth.

Thanks!

2 Likes

Not currently possible, but might be a good idea. What’s the best way to convert ogg to wav?

1 Like

I would think probably ffmpeg on a commandline level.

1 Like

I’d also like to see this. I wrote a Tasker script that records voice when I press my Bixby button, but unfortunately Tasker only does 3gpp and mp4. If I trigger an external app it sometimes take very long for the app to start/stop.

I’m thinking we could do this pretty easily with ffmpeg if the HTTP Content-Type field is filled out appropriately when you POST to /api/speech-to-intent (or /api/speech-to-text).

It looks like ffmpeg supports 3gp audio, so this should work as long as we include it in the Docker image.