Implement web interface for audio streaming - could alleviate need for satellite rhasspy instances

RandomRhasspyUser · November 11, 2020, 11:35pm

If rhasspy had a “client” webpage with the ability to stream audio back and forth to the base, it could enable practically any device with a web browser and audio to become a satellite without the need for multiple rhasspy devices (or platform-specific applications), and optionally allow the complete processing of speech-to-text and intents directly on the base.

Alexey · November 13, 2020, 1:15pm

I support the proposal.
Such a web microphone could be embedded in any smart home control system, since they all have a web interface.

fishertimj · November 14, 2020, 7:11pm

I’m working on this on my own right now: Possible to use Tablet as Mic?

RandomRhasspyUser · November 14, 2020, 7:30pm

Fantastic, I was mulling over doing this myself - I will hold off for the time being.

It seems the easiest strategy would be opening audio via websockets and publishing the audio chunks to MQTT hermes/audioServer/SITEID on the backend while also subscribing to the necessary tts stream for command playback on the device.

Hopefully the frontend could be flexible enough to have a basic webpage with the websocket javascript code, then allow for user-customizable javascript/css/html for all skinning possibilities.

Obviously it is a fairly large undertaking, I am looking forward to your work!

rolyan_trauts · November 14, 2020, 7:32pm

Linto are doing this the just created some MFCC libs in DART so they will run in a browser.

fishertimj · November 14, 2020, 8:02pm

Yep! Basically the plan! My first iteration will have this all communicating through Node-RED (via websocket) so I can exercise some extra control over a few aspects but regardless, once I’m done, I’ll share the code and the rest of you much-smarter folks can incorporate into documentation or a cookbook and go from there.

It’s going well. I’m not that far from demoing. Issue is finding enough time!!

I’m very excited about this! It solves so many problems for me.

fishertimj · November 14, 2020, 8:03pm

I may eventually find that the best way forward is to rely on a framework of sorts but for now, I’m working on building this all with vanilla JS. The efficiency nerd in me can’t help but try.

VoxAbsurdis · May 18, 2021, 4:17am

Did anything end up happening with this? It’s an AMAZING idea! I love that basically any device with a web browser and a microphone+speaker could become a satellite with virtually no setup!

fawad · May 18, 2021, 5:27am

Yes I agree this is something I have been hunting for quite some time. I will be ultra exciting even if it can be achieved with a bit pf setup

VoxAbsurdis · May 31, 2021, 1:28am

@fishertimj I can totally understand the time constraints… Can I just suggest though just making the code public now, even if it’s not ready/functional/cleaned up etc? Just post a public git repo with a suitably dire warning message (Here be buggy, unfinished dragons…") and leave it at that if that’s what you have time for.

Then people can at least see what you’re trying to do and possibly help out, even if you don’t currently have the time to finish. It’s more important that the community has a place to start to encourage people to join in, than it is to have a finished, working product.

If people can see the code then they can always ask you a quick question or two, and keep the ball rolling.

Again, it’s totally understandable if your life circumstances don’t let you finish this. Just don’t let the community lose the start that you’ve already made! This idea is just too good to let it live and die with one person!

MDL · June 17, 2021, 4:30pm

thinking about how to implement this in JS a few questions came up.
if wwd runs on the base, the client would constantly have to stream audio chunks over websocket/mqtt, right? and how would this compare to do the same via udp?

without the need for wwd a simple activation button for ident-recognition would be an option?

VoxAbsurdis · June 22, 2021, 1:56am

I would suggest that audio only needs to be streamed when a certain level of volume. I mean, you could just always stream, and that is probably the technically easiest, but I’m sure data over the network could be simplified at the client level in some minimal way.

fishertimj · July 19, 2021, 3:40pm

Won’t let this die! I’m jumping back into all this in the next several weeks as my new home build is getting to the stage where I need to wrap this up!

fawad · July 19, 2021, 3:56pm

Please keep us updated, I am ultra interested in this

VoxAbsurdis · July 19, 2021, 9:36pm

YES!!! @fishertimj , you are my new(?) hero! Anything I can do to help (testing, UX design, graphic design, moral support, a left kidney, ??? ) just let me know! I am desperate to see this happen.

I noticed that in the Rhasspy 2.5.11 preview thread, it has a feature stated as:

Wake word systems can receive raw UDP audio from multiple sites, and forward it to MQTT (see wake…udp_site_info

Would this be helpful, I wonder?

IonU · August 23, 2021, 2:33am

@fishertimj How did the house build go? Did you ever get that browser version running?