Hey, I hope everyone is good.
I am planning to have a working satellite app on a phone or tablet.
I am following this line of thought regarding what needs to be done:
- Detect the purpose of the user to interact with the system.
- Receive the spoken audio of the user and transform that to text.
- Send that text to the server.
- Evaluate that text as intents on the Rhasspy engine.
- Execute what is defined on the matched intent.
Ok, for 1) I won’t need an activation word as I plan to do a press-to-talk button.
For 2) I think that native speech to text solutions may work fine, like the ones offered by this Flutter module: speech_to_text | Flutter Package
If everything goes fine with 2) I’ll have a nice text representation of the user command so I’ll need to send it to the backend.
At first I think the options we have here are MQTT or a REST Webhook. Any advice on this would be appreciated.
So for 5) if we use MQTT, it is just a matter of using the one configured on Rhasspy and that would be it. If using REST, and script or call to a service would be needed in the middle.
Does any of this make sense?
Whisper.cpp has a Android app that likely just needs a GUI polish and customising to your needs?
Or IOS so guess the hard part is done.
Hi Rolyan, thank you for the links.
That recognition model looks really great.
But I am assuming that the native recognition of Android and iOS is good enough and I would like to use Flutter as a dev platform so I think the recognition part is covered by the module I’ve mentioned on the first post.
What I’m wondering is if the flow I’ve mentioned makes sense as it is or I would need to reconsider part (if not all) of it.
I am not sure about the intent side of things, but posted the above as often the native method on a phone for Speech2text is a webservice on Android at least, as less sure about IOS as never had an Iphone.
I got a Pixel6a as its a rarity to have native on device speech2text rather than the native cloud service usually provided by the manufacturer or Google.
I keep meaning to have look at android dev hence the pixel6a (google tensor npu version) but haven’t even tried side loading the app, so dunno forwarding that Whisper does have Opensource was about my limit, so apols if I can offer no more.