Audio input gst-launch-1.0 rtspsrc camera stream

Sleepy · July 1, 2021, 7:23am

Good day!
I am looking for help!
I want to talk to rhasspy through a microphone in a wi-fi camera.
What happened:

Install rhasspy in docker on a Raspberry Pi
Get access to rtsp audio stream.
Write the word “test” from the stream to a PCMA file via vlc on a desctop computer.
Recode the file from step 4 to wav via vlc
Configure recognition of the word “test” (in russian language) from the transmitted wav file

That does not work:
Register the command in the Audio input settings.

Tried:

gst-launch-1.0
rtspsrc location = rtsp: // login: password@192.168.135.49: 554 / stream2 latency = 0 select-stream = stream_1! filesink location = / dev / stdout

Tried adding:
! rtppcmadepay! alawdec! filesink location = / dev / stdout

messages appear in mqtt, but when trying to recognize it, I get the “TimeOut” error.

What am I doing wrong? I do not understand encodings, format and other sound parameters.

romkabouter · July 1, 2021, 7:43am

Please post the logfile where you see this TimeOut

Sleepy · July 1, 2021, 7:50am

[ERROR:2021-07-01 07:29:41,989] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 936, in api_listen_for_command
async for response in core.publish_wait(handle_intent(), [], message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 994, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[WARNING:2021-07-01 07:29:11,993] rhasspyserver_hermes: Dialogue management is disabled. ASR will NOT be automatically enabled.
[DEBUG:2021-07-01 07:29:11,992] rhasspyserver_hermes: ← HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)
[DEBUG:2021-07-01 07:29:11,969] rhasspyserver_hermes: Subscribed to hermes/error/nlu
[DEBUG:2021-07-01 07:29:11,968] rhasspyserver_hermes: Waiting for intent (session_id=None)
[DEBUG:2021-07-01 07:29:11,967] rhasspyserver_hermes: Publishing 199 bytes(s) to hermes/hotword/default/detected
[DEBUG:2021-07-01 07:29:11,965] rhasspyserver_hermes: → HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)

Sleepy · July 1, 2021, 7:56am

Settings for which logs were received from above

{
“intent”: {
“system”: “fsticuffs”
},
“microphone”: {
“arecord”: {
“device”: “pulse”
},
“command”: {
“channels”: “1”,
“record_arguments”: “rtspsrc location=rtsp://login:password@192.168.135.49:554/stream2 latency=0 select-stream=stream_1 ! filesink location=/dev/stdout”,
“record_program”: “gst-launch-1.0”,
“sample_rate”: “8000”,
“sample_width”: “4”,
“test_arguments”: “”,
“udp_audio_port”: “”
},
“system”: “command”
},
“mqtt”: {
“enabled”: “true”,
“host”: “192.168.135.19”,
“password”: “password”,
“username”: “login”
},
“sounds”: {
“remote”: {
“url”: “http://192.168.135.10:8081/path/to/endpoint”
}
},
“speech_to_text”: {
“pocketsphinx”: {
“open_transcription”: true
},
“system”: “kaldi”
},
“wake”: {
“pocketsphinx”: {
“keyphrase”: “jarvis”
},
“porcupine”: {
“keyword_path”: “jarvis_raspberry-pi.ppn”,
“sensitivity”: “0.1”
},
“raven”: {
“keywords”: {
“Джарвис”: {
“enabled”: true
}
}
},
“snowboy”: {
“apply_frontend”: true,
“model”: “jarvis.umdl”,
“sensitivity”: “0.8,0.80”
}
}
}

romkabouter · July 1, 2021, 7:56am

Enable the DialogieManager should fix your problem, because the hotword is detected

Sleepy · July 1, 2021, 8:03am

I tested via a button on the homepage. Expected recognition of the spoken word “test”. This effect was when sending the wav file. Am I not expecting that?

Sleepy · July 1, 2021, 8:08am

Feeling that the analysis of the stream from mqtt is not happening. Doesn’t even try. I could not find any evidence of this in the logs. Although messages appear in mqtt.

romkabouter · July 1, 2021, 8:10am

Like I said: you should enable the DialogueManager

Sleepy · July 1, 2021, 8:26am

He turned on the dialogue.

I started the recognition manually.
I didn’t say anything (I’m not at home).
And the false recognition worked for me. But I’m not sure about that.
Is it really so?

{
“text”: “тест”,
“likelihood”: 1,
“seconds”: 28.348475833015982,
“siteId”: “default”,
“sessionId”: “default-default-06864a95-de4d-4a66-bd8e-2de7fdcd8245”,
“wakewordId”: null,
“asrTokens”: [
[
{
“value”: “тест”,
“confidence”: 1,
“rangeStart”: 0,
“rangeEnd”: 5,
“time”: {
“start”: 0,
“end”: 0
}
}
]
],
“lang”: null
}

romkabouter · July 1, 2021, 8:59am

This might be triggered because of the fact that there was only silence (assuming no audioinput)

Best to do a real test when your at home

Sleepy · July 1, 2021, 9:01am

I’ll check it out in the evening.
Thank you, kind person !!

Sleepy · July 1, 2021, 4:34pm

Does not hear

[ERROR:2021-07-01 19:32:14,911] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 936, in api_listen_for_command
async for response in core.publish_wait(handle_intent(), [], message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 994, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[DEBUG:2021-07-01 19:31:44,886] rhasspyserver_hermes: ← HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)
[DEBUG:2021-07-01 19:31:44,879] rhasspyserver_hermes: Waiting for intent (session_id=None)
[DEBUG:2021-07-01 19:31:44,877] rhasspyserver_hermes: Publishing 199 bytes(s) to hermes/hotword/default/detected
[DEBUG:2021-07-01 19:31:44,876] rhasspyserver_hermes: → HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)

romkabouter · July 1, 2021, 5:22pm

how do you trigger the hotword? Have you already setup some sentences?
what is the samplerate on the wi-fi camera? You have set this to 8000 and width 4 in Rhasspy. It this correct with respect to the wi-fi camera?
is there a reason you have no hotword detection enabled?
try to record the audio from the stream with this file:
ESP32-Rhasspy-Satellite/record.py at voco · Romkabouter/ESP32-Rhasspy-Satellite · GitHub

Change the broker ip adress and on the bottom “office” into “default”
Execute it and it will record 4 seconds from the stream and save it as a wave.
Listen to it and see how it sounds.

Sleepy · July 2, 2021, 6:49pm

Thanks for the debug opportunity!
I achieved the appearance of a voice, with the following settings:

But the sound is accelerated. Cartoon.
What affects speed?

Sleepy · July 2, 2021, 7:01pm

Changed the rate in record.py and the voice in the text file is now normal.

romkabouter · July 2, 2021, 9:13pm

Rhasspy expects 16000 16bit, maybe you can try and output that from the wifi cam

Sleepy · July 3, 2021, 7:36pm

Thank you! The final version of the command parameters:

rtspsrc location=rtsp://login:password@192.168.135.49:554/stream2 latency=0 select-stream=stream_1 ! rtppcmadepay ! alawdec ! audioconvert ! audioresample ! audio/x-raw, rate=16000 ! filesink location=/dev/stdout