Built a new wyoming asr for google cloud streaming speech to text, how do I test this thing?

I have built a new wyoming asr building on the whipser version…
i am ready to test… it starts, (using script/run) but the handler isnt started…

the source for wyoming-faster-whisper doesn’t respond to tcp connections (or websocket) , from its readme

Run a server anyone can connect to:

script/run --model tiny-int8 --language en --uri 'tcp://0.0.0.0:10300' --data-dir /data --download-dir /data

and as I modeled my asr on this, it also doesn’t respond…

I added debug right before the server.run()

INFO:__main__:Ready
INFO:__main__:Info(asr=[AsrProgram(name='faster-whisper', attribution=Attribution(name='Guillaume Klein', url='https://github.com/guillaumekln/faster-whisper/'), installed=True, description='Faster Whisper transcription with CTranslate2', models=[AsrModel(name='tiny-int8', attribution=Attribution(name='rhasspy', url='https://github.com/rhasspy/models/'), installed=True, description='tiny-int8', languages=['af', 'am', 'ar', 'as', 'az', 'ba', 'be', 'bg', 'bn', 'bo', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el', 'en', 'es', 'et', 'eu', 'fa', 'fi', 'fo', 'fr', 'gl', 'gu', 'ha', 'haw', 'he', 'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'is', 'it', 'ja', 'jw', 'ka', 'kk', 'km', 'kn', 'ko', 'la', 'lb', 'ln', 'lo', 'lt', 'lv', 'mg', 'mi', 'mk', 'ml', 'mn', 'mr', 'ms', 'mt', 'my', 'ne', 'nl', 'nn', 'no', 'oc', 'pa', 'pl', 'ps', 'pt', 'ro', 'ru', 'sa', 'sd', 'si', 'sk', 'sl', 'sn', 'so', 'sq', 'sr', 'su', 'sv', 'sw', 'ta', 'te', 'tg', 'th', 'tk', 'tl', 'tr', 'tt', 'uk', 'ur', 'uz', 'vi', 'yi', 'yo', 'zh'])])], tts=[], handle=[], wake=[])

no other source file change.

shouldn’t I be able to send a {type:“describe”} and get a response?
(or any of the three non audio ones… describe, transcribe and audioStop

post and put time out

if I have to test by using HA, how do I get my asr into the selection list for assistant config?

in my asr I added a debugging clause to the end of the
async def handle_event(self, event: Event) → bool: function, so if called with an unknown event it will output… nothing…
the handler.py is never started…added debug there too

1 Like

SO… I cleaned up some of the python and now get my asr handler to fire…

but… neither it nor faster-whisper respond to the transcribe event…

I added debugging and its not handled correctly

DEBUG:__main__:Info(asr=[AsrProgram(name='google-streaming', attribution=Attribution(name='Sam Detweiler', url='https://github.com/sdetweil/google-streaming'), installed=True, description='google cloud streaming asr', models=[AsrModel(name='google-streaming', attribution=Attribution(name='rhasspy', url='https://github.com/rhasspy/models/'), installed=True, description='google cloud streaming asr', languages=['af', 'am', 'ar', 'as', 'az', 'ba', 'be', 'bg', 'bn', 'bo', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el', 'en', 'es', 'et', 'eu', 'fa', 'fi', 'fo', 'fr', 'gl', 'gu', 'ha', 'haw', 'he', 'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'is', 'it', 'ja', 'jw', 'ka', 'kk', 'km', 'kn', 'ko', 'la', 'lb', 'ln', 'lo', 'lt', 'lv', 'mg', 'mi', 'mk', 'ml', 'mn', 'mr', 'ms', 'mt', 'my', 'ne', 'nl', 'nn', 'no', 'oc', 'pa', 'pl', 'ps', 'pt', 'ro', 'ru', 'sa', 'sd', 'si', 'sk', 'sl', 'sn', 'so', 'sq', 'sr', 'su', 'sv', 'sw', 'ta', 'te', 'tg', 'th', 'tk', 'tl', 'tr', 'tt', 'uk', 'ur', 'uz', 'vi', 'yi', 'yo', 'zh'])])], tts=[], handle=[], wake=[])

I sent a describe like this

{ "type": "describe"}

and got this on the handler
DEBUG:wyoming_google.handler:Sent info

then sent a transcribe

{"type":"transcribe","data":{"language":"en","name":"foo"}}

and got my extra debugging (from the end of the handler)
INFO:wyoming_google.handler:unknown event=transcribe name=foo language=en

        
        _LOGGER.info("unknown event="+event.type+" name="+event.data["name"]+" language="+event.data["language"])

        return True

but the code here https://github.com/rhasspy/wyoming/blob/master/wyoming/asr.py

says its ‘transcribe’

I changed the handler code from

 if Transcribe.is_type(event.type):

to

 if event.type == "transcribe":

and it works…

SO the Transcribe.is_type() is failing for some reason

the design of the handler appears to support persistent data across events…
the whisper handler does

        if AudioChunk.is_type(event.type):
            if not self.audio:
                _LOGGER.debug("Receiving audio")

            chunk = AudioChunk.from_event(event)
            chunk = self.audio_converter.convert(chunk)
            self.audio += chunk.audio   ##   adding on to buffer

but in my handler, the self. xxx variables are not persistent…
(dummy transcription)

on audioStop, I set self.text=“something”
it gets printed out and returned (altho the doc says you need a transcript event to get the results)
the wyoming-faster-whipser handler does not handle transcipt events, but returns the data only on audioStop

anyhow, i added a transcript event to return the text… and its null… oops…
my init sets it that way, same as the whipser one sets self.audio=bytes().

so, what is wrong

just like the whipser asr I am ‘counting’ on data persistence between events… as google streaming supports intermediate results… so it can be faster to produce results than the existing asr’s

There was a misspelling of “transcribe” in an earlier version of Wyoming :man_facepalming:
It’s fixed now, but this is why Transcribe.is_type was failing for you.

A new handler is spun up for each TCP connection. So if you’re breaking the connection between each message, the self variables will not persist.

thanks… I suspected as much on the variables… and I had 1.4.2, 1.5.0 works better
BUT there was a breaking change on the audio event names… wtf… now have embedded dashes
audio-stop

AND the package doesn’t have an updated CHANGELOG, and shows 1.3 but master is 1.5

and the answer on testing, is nothing existed… I created a new test app to send two wav files to the asr to verify the reco…