Ask Google/Amazon/Wolfram Alpha - controlled freeform text -> controlled spying

I would like to be able to forward a free form question to one of the official cloud services via predicating my question.

For example:
Porcupine [beep], ask Google: What is the tallest building in the World?

So at least it would be cool to have the option of free form text in the grammar or stop it from transcribing and really send “What is the tallest building in the World?” as your recorded voice.

Is that already possible or what is missing to do this?

No idea if there is a better way but you could try an intent for ask google that starts a custom script which uses the api to record and transcribe the second half and then submit it to google. Kaldi can be configured to understand free speech but it is slightly slow. I think I read something about using 2.5 to run two stt systems for projects like this but I don’t think anyone tried yet.

1 Like

I also thought about two systems - also no trouble giving kaldi a bit more procesing power - it seems to run really fine and fast on my intel i5 third generation with ssd hd.

How would I define an intent for ask google that accepts any language after that?

That is something I have no idea because even using kaldi via the web api has you bound by the profile language of kaldi. I don’t think rhasspy is able to detect languages on it’s own and I am not sure there is a piece of software that can detect language from a voice sample that runs locally

Sorry wasn’t clear. Replace “any language” with any combinations of words (of the currently selected language)

Well, it is possible to have a custom script running for an intent. So my idea is to create an intent [AskGoogle] and run a python script if that intent is recognized. In that script you can then use the web api (don’t ask me how, I am just starting out myself but I read about using it in a script somewhere in this forum a week or so ago) to record what you want to ask Google and then you can have the web api transcribe it, send it to Google and return the answer to your question. It might not be instant so you might have to make a pause before asking the question or play a sound from your script to know when to ask the question but it should get the job done

Ah, nice idea.
So I would do something like:
Porcupine [beep1], Ask Google a question: [play “what do you want to ask google?” via tts, start recording with vad, send that to google, play answer]

I think that should work - will give it a try this week and report back.

This will need the Dialogue Manager, so you need version 2.5 (which is in pre-release)

1 Like