Audio input gst-launch-1.0 rtspsrc camera stream

Good day!
I am looking for help!
I want to talk to rhasspy through a microphone in a wi-fi camera.
What happened:

  1. Install rhasspy in docker on a Raspberry Pi
  2. Get access to rtsp audio stream.
  3. Write the word “test” from the stream to a PCMA file via vlc on a desctop computer.
  4. Recode the file from step 4 to wav via vlc
  5. Configure recognition of the word “test” (in russian language) from the transmitted wav file

That does not work:
Register the command in the Audio input settings.

Tried:
image

gst-launch-1.0
rtspsrc location = rtsp: // login: password@192.168.135.49: 554 / stream2 latency = 0 select-stream = stream_1! filesink location = / dev / stdout

Tried adding:
! rtppcmadepay! alawdec! filesink location = / dev / stdout

messages appear in mqtt, but when trying to recognize it, I get the “TimeOut” error.

What am I doing wrong? I do not understand encodings, format and other sound parameters.

Please post the logfile where you see this TimeOut

[ERROR:2021-07-01 07:29:41,989] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 936, in api_listen_for_command
async for response in core.publish_wait(handle_intent(), [], message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 994, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[WARNING:2021-07-01 07:29:11,993] rhasspyserver_hermes: Dialogue management is disabled. ASR will NOT be automatically enabled.
[DEBUG:2021-07-01 07:29:11,992] rhasspyserver_hermes: ← HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)
[DEBUG:2021-07-01 07:29:11,969] rhasspyserver_hermes: Subscribed to hermes/error/nlu
[DEBUG:2021-07-01 07:29:11,968] rhasspyserver_hermes: Waiting for intent (session_id=None)
[DEBUG:2021-07-01 07:29:11,967] rhasspyserver_hermes: Publishing 199 bytes(s) to hermes/hotword/default/detected
[DEBUG:2021-07-01 07:29:11,965] rhasspyserver_hermes: → HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)

Settings for which logs were received from above

{
“intent”: {
“system”: “fsticuffs”
},
“microphone”: {
“arecord”: {
“device”: “pulse”
},
“command”: {
“channels”: “1”,
“record_arguments”: “rtspsrc location=rtsp://login:password@192.168.135.49:554/stream2 latency=0 select-stream=stream_1 ! filesink location=/dev/stdout”,
“record_program”: “gst-launch-1.0”,
“sample_rate”: “8000”,
“sample_width”: “4”,
“test_arguments”: “”,
“udp_audio_port”: “”
},
“system”: “command”
},
“mqtt”: {
“enabled”: “true”,
“host”: “192.168.135.19”,
“password”: “password”,
“username”: “login”
},
“sounds”: {
“remote”: {
“url”: “http://192.168.135.10:8081/path/to/endpoint
}
},
“speech_to_text”: {
“pocketsphinx”: {
“open_transcription”: true
},
“system”: “kaldi”
},
“wake”: {
“pocketsphinx”: {
“keyphrase”: “jarvis”
},
“porcupine”: {
“keyword_path”: “jarvis_raspberry-pi.ppn”,
“sensitivity”: “0.1”
},
“raven”: {
“keywords”: {
“Джарвис”: {
“enabled”: true
}
}
},
“snowboy”: {
“apply_frontend”: true,
“model”: “jarvis.umdl”,
“sensitivity”: “0.8,0.80”
}
}
}

Enable the DialogieManager should fix your problem, because the hotword is detected

I tested via a button on the homepage. Expected recognition of the spoken word “test”. This effect was when sending the wav file. Am I not expecting that?

Feeling that the analysis of the stream from mqtt is not happening. Doesn’t even try. I could not find any evidence of this in the logs. Although messages appear in mqtt.

image
image

Like I said: you should enable the DialogueManager

He turned on the dialogue.
image

I started the recognition manually.
I didn’t say anything (I’m not at home).
And the false recognition worked for me. But I’m not sure about that.
Is it really so?

{
“text”: “тест”,
“likelihood”: 1,
“seconds”: 28.348475833015982,
“siteId”: “default”,
“sessionId”: “default-default-06864a95-de4d-4a66-bd8e-2de7fdcd8245”,
“wakewordId”: null,
“asrTokens”: [
[
{
“value”: “тест”,
“confidence”: 1,
“rangeStart”: 0,
“rangeEnd”: 5,
“time”: {
“start”: 0,
“end”: 0
}
}
]
],
“lang”: null
}

This might be triggered because of the fact that there was only silence (assuming no audioinput)

Best to do a real test when your at home :slight_smile:

1 Like

I’ll check it out in the evening.
Thank you, kind person !! :love_you_gesture:

Does not hear :sob:

[ERROR:2021-07-01 19:32:14,911] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 936, in api_listen_for_command
async for response in core.publish_wait(handle_intent(), [], message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 994, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[DEBUG:2021-07-01 19:31:44,886] rhasspyserver_hermes: ← HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)
[DEBUG:2021-07-01 19:31:44,879] rhasspyserver_hermes: Waiting for intent (session_id=None)
[DEBUG:2021-07-01 19:31:44,877] rhasspyserver_hermes: Publishing 199 bytes(s) to hermes/hotword/default/detected
[DEBUG:2021-07-01 19:31:44,876] rhasspyserver_hermes: → HotwordDetected(model_id=‘default’, model_version=’’, model_type=‘personal’, current_sensitivity=1.0, site_id=‘default’, session_id=None, send_audio_captured=None, lang=None, custom_entities=None)

  1. how do you trigger the hotword? Have you already setup some sentences?
  2. what is the samplerate on the wi-fi camera? You have set this to 8000 and width 4 in Rhasspy. It this correct with respect to the wi-fi camera?
  3. is there a reason you have no hotword detection enabled?
  4. try to record the audio from the stream with this file:
    ESP32-Rhasspy-Satellite/record.py at voco · Romkabouter/ESP32-Rhasspy-Satellite · GitHub

Change the broker ip adress and on the bottom “office” into “default”
Execute it and it will record 4 seconds from the stream and save it as a wave.
Listen to it and see how it sounds.

1 Like

Thanks for the debug opportunity!
I achieved the appearance of a voice, with the following settings:

But the sound is accelerated. Cartoon.
What affects speed?

image

Changed the rate in record.py and the voice in the text file is now normal.

Rhasspy expects 16000 16bit, maybe you can try and output that from the wifi cam

1 Like

Thank you! The final version of the command parameters:

rtspsrc location=rtsp://login:password@192.168.135.49:554/stream2 latency=0 select-stream=stream_1 ! rtppcmadepay ! alawdec ! audioconvert ! audioresample ! audio/x-raw, rate=16000 ! filesink location=/dev/stdout