Recognized untrain sentences / words


Is there a way to get a slot for an unknown part of a sentence ?

My idea would be to do a reminder intent. I can define the start of the sentence, but would never know what I will say to remind.

example :

Hey Rhasspy, remind me to call my wife in two hours
Hey Rhasspy, call me back in 30 minutes to shutdown the oven
Hey Rhasspy, bring me back to take my son out of freezer in ten minutes

etc …

Any idea ?

I do like this.
Hey rhasspy.
Listen{actionstt} to me…
At this moment I make a speech to text (with google speech recognition) and I recover the sentence:
‘add 3 carrots and 2 leeks to the list’
And I treat this phrase according to the words that interest me.
So, we can make requests on Wikipedia, file writing, youtube search …


@kookic i like this. Would you be willing to share some of the configs to show how you do stuff like adding things to the list? This would be an awesome first addition to my “out of the box” not doing much setup lol.

Kind of following this up… Is there a history of communications that an intent was not identified for?

Would be interesting to go through unrecognized requests and see what the rest of the family is trying to get rhasspy to do. Maybe i need to add a joke telling thing if my kids keep asking for jokes ect.

Would it work with kaldi ?

Prefer to keep everything offline, especially with google … :roll_eyes:

of course but i didn’t get good results with kaldi, and i also use translation and wiki with yandex and wolfram, so a bit of online.

A bit of code

recup_the_sentence = dire( )

import speech_recognition as sr

def dire ( ) :
r = sr.Recognizer()
global CLE_GOOGLE # your key google api
wavefile = “input.wav”
os.system( "arecord -d 5 -f cd -t wav " + wavefile) # record your voice for 5 second
with sr.WavFile(wavefile) as source:
audio = r.record ( source )
retourG = r.recognize_google (
audio, language = ‘fr-FR’, key = CLE_GOOGLE)
print(’…Traduction GOOGLE: ’ + retourG )
san = sanitize_string(retourG)
return san.encode(‘utf-8’)
except LookupError:
print ( ‘Cannot understand audio!’ )

sentence example
il (faut | faudrait) faire{carnet:a} les courses
(efface |supprime){carnet:e}[toutes] les courses
(ajoute | rajoute ){carnet:a} des courses
qu’est-ce qu’il [nous] faut{carnet:l} pour les courses
(tu peux me lire){carnet:e} les courses

recup les slot value and

def course( v ) :
if v == “a”: # ajoute
mytts.dire(“Je técouteu, tu veux quoi?”)
# enregister
os.system(“arecord -d 5 -f cd -t wav output.wav”)
# transcrire
retourG = dire()
print(’…Traduction GOOGLE: ’ + retourG +"" )
with open(‘course.txt’, ‘a’) as mon_fichier:
scourse = retourG
mytts.dire ("Okay, jai ajoutai, "+retourG)

		print ("probleme enregistrement")

if v == "e": #efface
	with open('course.txt', 'w') as mon_fichier:
	mytts.dire ("Okay, j'ai tout effaçé.")

if v == "l": # lecture
	with open('course.txt', 'r') as mon_fichier:
		mytts.dire ("Bien, pour les courses, il te faut: ")
		mytts.dire (
print ("Fin des courses.")

Ok so you don’t say the untrained part in same sentence but in a second sentence.

I seriously doubt it would work with kaldi. It Alan understand trained words even when they are far from them regarding phonemes. Seems It just HAVE to find something known.

yes that’s why I lost no time with kali and use google.
with Poketsphinx…?

One way might be to run a second instance of Rhasspy with open transcription enabled and a special wake word. Not ideal, but at least possible :slight_smile:

When Rhasspy 2.5 comes out (someday, I promise!) you could theoretically run two STT services connected to the same MQTT broker. We’d need to think about how to adjust the dialogue manager to handle multiple responses and decide which one to choose. Just a thought.


Just a few nights of work. :wink: Thank for all.

Just so I understand this correctly; it is currently not possible to configure rhasspy to just pass custom words to the intent handler? So it is not possible to create a handler for example for:

  • play queen on spotify (where “queen” is not part of the trained sentence)
  • who is barack obama (where “barack obama” is not part of the trained sentence)

I’m quite new to the whole voice command ecosystem so I might ask for something really difficult, I don’t know but it would be nice to be able to create a more custom handler.

I came here for help with this too. I really need to be able to tell my assistant to add things to a list or even fun little things like Simon says.

Something like

add [*]{thing} to the list

simon says [*]{thing_to_say}

This is difficult with the way Rhasspy approaches speech recognition. Depending on where the “wildcard” is also affects the complexity.

Let’s take @digitalfiz’s example:

add [*]{thing} to the list

simon says [*]{thing_to_say}

The SimonSays example is easier to implement because the wildcard occurs at the end of the sentence. I would implement this by having the first speech system (e.g., Kaldi) recognize “simon says”, then I would clip the audio from there to the end and send that off to a second system (e.g., DeepSpeech) which is listening for generic English.

The Groceries example is harder, though, because the first speech system has to detect both the start and end of the wildcard phrase. One way I know to do this is to add <UNKNOWN> “words” to the first speech system. During recognition, it might recognize “add apples and bananas to the list” as “add <UNKNOWN> to the list” (the <UNKNOWN> may be repeated). Kaldi and Pocketsphinx can give the time window of each recognized word/token, so I could clip out the audio from all the <UNKNOWN> words and send it to DeepSpeech.

A much easier approach to wildcards would be if the number of words were known in advance. If I knew that [*] was one generic English word, I could just add every possible word from the English dictionary as a candidate at that point in the grammar. No need for a second speech system!

If you knew there would be between 1 and 3 words, for example, I could use knowledge from the generic English language model to do something similar. It would be harder than a single word, but easier than adding a second speech system.