Ask Google/Amazon/Wolfram Alpha - controlled freeform text -> controlled spying

ulno · March 23, 2020, 2:21pm

I would like to be able to forward a free form question to one of the official cloud services via predicating my question.

For example:
Porcupine [beep], ask Google: What is the tallest building in the World?

So at least it would be cool to have the option of free form text in the grammar or stop it from transcribing and really send “What is the tallest building in the World?” as your recorded voice.

Is that already possible or what is missing to do this?

Daenara · March 23, 2020, 3:30pm

No idea if there is a better way but you could try an intent for ask google that starts a custom script which uses the api to record and transcribe the second half and then submit it to google. Kaldi can be configured to understand free speech but it is slightly slow. I think I read something about using 2.5 to run two stt systems for projects like this but I don’t think anyone tried yet.

ulno · March 23, 2020, 4:57pm

I also thought about two systems - also no trouble giving kaldi a bit more procesing power - it seems to run really fine and fast on my intel i5 third generation with ssd hd.

How would I define an intent for ask google that accepts any language after that?

Daenara · March 23, 2020, 5:01pm

That is something I have no idea because even using kaldi via the web api has you bound by the profile language of kaldi. I don’t think rhasspy is able to detect languages on it’s own and I am not sure there is a piece of software that can detect language from a voice sample that runs locally

ulno · March 23, 2020, 5:20pm

Sorry wasn’t clear. Replace “any language” with any combinations of words (of the currently selected language)

Daenara · March 23, 2020, 5:27pm

Well, it is possible to have a custom script running for an intent. So my idea is to create an intent [AskGoogle] and run a python script if that intent is recognized. In that script you can then use the web api (don’t ask me how, I am just starting out myself but I read about using it in a script somewhere in this forum a week or so ago) to record what you want to ask Google and then you can have the web api transcribe it, send it to Google and return the answer to your question. It might not be instant so you might have to make a pause before asking the question or play a sound from your script to know when to ask the question but it should get the job done

ulno · March 23, 2020, 5:31pm

Ah, nice idea.
So I would do something like:
Porcupine [beep1], Ask Google a question: [play “what do you want to ask google?” via tts, start recording with vad, send that to google, play answer]

I think that should work - will give it a try this week and report back.

romkabouter · March 23, 2020, 7:06pm

This will need the Dialogue Manager, so you need version 2.5 (which is in pre-release)

tuxedo78 · May 29, 2020, 3:42pm

Indeed, it’s possible with 2.5.0

I got a working proof of concept (based on rhasspy-herpes-app framework proposed here and Google Assistant SDK samples.

@ulno did you manage to get something working as well on your side ?

ulno · May 29, 2020, 5:41pm

haven’t worked on it again yet - but this here does encourage me to give it another shot, but I will try to do it out of node-red (if possible)

kookic · May 29, 2020, 7:18pm

Yes, it works, I do it in 2.4 with speech recognition, wolfram, wikipedia, write note…:

‘hey snowboy’ … ding … listen to me …
os.system (‘curl -X POST “http: // localhost: 12101 / api / start-recording”’)
“”"
ask the question with silence management
me i do with sox
“”"
form = “- r 8000 -e signed -b 16 -c 1”
os.system ("/ usr / bin / sox -t alsa default" + form + WAVEFILE + “silence 1 0.1 1% 1 0.1 1%”)
os.system (‘curl -X POST “http: // localhost: 12101 / api / stop-recording”’)
“”"
now go to speech recognition send the Wavefile
recover the text
send text to wolfram or wiki or translator
“”"
But the most complicated may be the occupation of the microphone by rhasppy, I had this problem to solve.

DanielW · May 29, 2020, 8:27pm

I was thinking about the same problem. Kaldi works good enough in “Open Transcription mode”.

Shouldn’t it be possible to implement a “any text” slot in Rhasspy? I don’t really know how those things work, but for me it seems not that complicated?

davosian · July 18, 2021, 10:20am

Has anyone been successful in getting this to work reliably? If you, do you mind sharing your setup in more detail? I am using Rhasspy 2.5 and Node Red for intent handling.

Thanks!