Can't get Rhasspy audio to work in a docker container, please help!

First of all, apologies if this has been asked before - we’ve tried searching but couldn’t find our problem here.

We’re trying to build our own voice assistant for switching lights on and off. The project is (perhaps aptly) named daedalus because we’re aiming a bit high, not ever having worked with linux before.

So here’s where at. Setup is a raspberry pi 4 with a USB mic/speaker attached.

lsusb on the raspberry itself gives this:

tim@daedalus:~ $ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 003: ID ff01:0006 BY BY Y02
Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 001 Device 005: ID 04d9:0296 Holtek Semiconductor, Inc. USB-HID Keyboard
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

arecord to list devices gives this:

tim@daedalus:~ $ arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Y02 [BY Y02], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

and then when we test by recording some audio, we get this:

tim@daedalus:~ $ arecord -t wav -d 5 test.wav
Recording WAVE ‘test.wav’ : Unsigned 8 bit, Rate 8000 Hz, Mono

This gives a valid audio file. We have created a python script calling a speech-to-text engine which works! So far, so good.

Now we’re trying to use Rhasppy in a docker container with Raven as a wake word system. Our audio recording settings are on PyAudio (Recommended) and wake word is set to Rhasspy Raven. Next, we enter a keyword and press record. A dialog Speak wake word appears and whatever we say nothing gets picked up, this is followd by a timeout error.

log reads:

[ERROR:2022-07-23 10:46:47,405] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request_.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 2164, in api_record_wake_example
async for response in core.publish_wait(handle_recorded(), messages, message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 995, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures.base.TimeoutError
[ERROR:2022-07-23 10:46:28,130] rhasspyserver_hermes:
Traceback (most recent call last):
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1821, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/quart/app.py”, line 1869, in dispatch_request
return await handler(**request
.view_args)
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/main.py”, line 2164, in api_record_wake_example
async for response in core.publish_wait(handle_recorded(), messages, message_types):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 995, in publish_wait
result_awaitable, timeout=timeout_seconds
File “/usr/lib/python3.7/asyncio/tasks.py”, line 423, in wait_for
raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
[DEBUG:2022-07-23 10:46:17,392] rhasspyserver_hermes: Publishing 67 bytes(s) to rhasspy/hotword/recordExample
[DEBUG:2022-07-23 10:46:17,391] rhasspyserver_hermes: → RecordHotwordExample(id=‘90ee2573-3a9c-47d9-99db-0c8a8ed89533’, site_id=‘default’)
[DEBUG:2022-07-23 10:46:17,387] rhasspyserver_hermes: Waiting for hotword example (id=90ee2573-3a9c-47d9-99db-0c8a8ed89533)
[DEBUG:2022-07-23 10:45:58,118] rhasspyserver_hermes: Publishing 67 bytes(s) to rhasspy/hotword/recordExample
[DEBUG:2022-07-23 10:45:58,117] rhasspyserver_hermes: → RecordHotwordExample(id=‘be45f1c6-d987-47b6-943d-992510dd65fa’, site_id=‘default’)
[DEBUG:2022-07-23 10:45:58,111] rhasspyserver_hermes: Subscribed to hermes/error/hotword
[DEBUG:2022-07-23 10:45:58,110] rhasspyserver_hermes: Subscribed to rhasspy/hotword/default/exampleRecorded/#
[DEBUG:2022-07-23 10:45:58,109] rhasspyserver_hermes: Waiting for hotword example (id=be45f1c6-d987-47b6-943d-992510dd65fa)
[DEBUG:2022-07-23 10:43:52,096] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.vocoder) (larynx vctk_small) [‘dummy’, ‘universal_large’] = False
[DEBUG:2022-07-23 10:43:52,096] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.vocoder) (larynx vctk_medium) [‘dummy’, ‘universal_large’] = False
[DEBUG:2022-07-23 10:43:52,096] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.vocoder) (larynx universal_large) [‘dummy’, ‘universal_large’] = False
[DEBUG:2022-07-23 10:43:52,095] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx southern_english_female) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,095] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx southern_english_male) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,095] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx scottish_english_male) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,095] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx northern_english_male) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,094] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx judy_bieber) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,094] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx mary_ann) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,094] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx kathleen) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,094] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx ljspeech) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,094] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx ek) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,093] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx blizzard_lessac) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,093] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx blizzard_fls) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,093] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_slt) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,093] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_slp) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,092] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_rxr) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,092] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_rms) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,092] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_lnh) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,092] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_ljm) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,091] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_ksp) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,091] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_jmk) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,091] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_fem) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,091] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_eey) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,090] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_clb) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,090] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_bdl) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,090] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_aup) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,089] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_ahw) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,089] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx cmu_aew) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,088] rhasspyprofile.download: (and text_to_speech.system text_to_speech.larynx.default_voice) (larynx harvard) [‘dummy’, ‘harvard’] = False
[DEBUG:2022-07-23 10:43:52,087] rhasspyprofile.download: speech_to_text.deepspeech.mix_weight >0 0 = False
[DEBUG:2022-07-23 10:43:52,087] rhasspyprofile.download: speech_to_text.kaldi.mix_weight >0 0 = False
[DEBUG:2022-07-23 10:43:52,086] rhasspyprofile.download: speech_to_text.pocketsphinx.mix_weight >0 0 = False
[DEBUG:2022-07-23 10:43:52,085] rhasspyprofile.download: speech_to_text.deepspeech.open_transcription True False = False
[DEBUG:2022-07-23 10:43:52,085] rhasspyprofile.download: speech_to_text.kaldi.open_transcription True False = False
[DEBUG:2022-07-23 10:43:52,068] rhasspyprofile.download: speech_to_text.pocketsphinx.open_transcription True False = False
[DEBUG:2022-07-23 10:43:52,067] rhasspyprofile.download: speech_to_text.system vosk dummy = False
[DEBUG:2022-07-23 10:43:52,066] rhasspyprofile.download: speech_to_text.system deepspeech dummy = False
[DEBUG:2022-07-23 10:43:52,065] rhasspyprofile.download: speech_to_text.system kaldi dummy = False
[DEBUG:2022-07-23 10:43:52,064] rhasspyprofile.download: speech_to_text.system pocketsphinx dummy = False
[INFO:2022-07-23 10:43:50,450] rhasspyserver_hermes: Started
[DEBUG:2022-07-23 10:43:50,447] rhasspyserver_hermes: Subscribed to hermes/asr/textCaptured
[DEBUG:2022-07-23 10:43:50,446] rhasspyserver_hermes: Subscribed to hermes/hotword/+/detected
[DEBUG:2022-07-23 10:43:50,445] rhasspyserver_hermes: Subscribed to hermes/intent/#
[DEBUG:2022-07-23 10:43:50,444] rhasspyserver_hermes: Subscribed to hermes/nlu/intentNotRecognized
[DEBUG:2022-07-23 10:43:50,443] rhasspyserver_hermes: Subscribed to hermes/audioServer/default/audioSummary
[DEBUG:2022-07-23 10:43:50,441] rhasspyserver_hermes: Subscribed to rhasspy/asr/default/+/audioCaptured
[DEBUG:2022-07-23 10:43:50,441] rhasspyserver_hermes: Subscribed to hermes/audioServer/default/audioSummary
[DEBUG:2022-07-23 10:43:50,439] rhasspyserver_hermes: Subscribed to rhasspy/asr/default/+/audioCaptured
[DEBUG:2022-07-23 10:43:50,438] rhasspyserver_hermes: Subscribed to hermes/nlu/intentNotRecognized
[DEBUG:2022-07-23 10:43:50,424] rhasspyserver_hermes: Subscribed to hermes/intent/#
[DEBUG:2022-07-23 10:43:50,422] rhasspyserver_hermes: Subscribed to hermes/asr/textCaptured
[DEBUG:2022-07-23 10:43:50,421] rhasspyserver_hermes: Subscribed to hermes/hotword/+/detected
[DEBUG:2022-07-23 10:43:50,419] rhasspyserver_hermes: Connected to MQTT broker
[DEBUG:2022-07-23 10:43:50,412] rhasspyserver_hermes: Connecting to localhost:12183 (retries: 2/10)
[ERROR:2022-07-23 10:43:49,409] rhasspyserver_hermes: mqtt connect
Traceback (most recent call last):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 290, in start
self.client.connect(self.host, self.port)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 937, in connect
return self.reconnect()
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 1071, in reconnect
sock = self._create_socket_connection()
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 3522, in _create_socket_connection
return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
File “/usr/lib/python3.7/socket.py”, line 727, in create_connection
raise err
File “/usr/lib/python3.7/socket.py”, line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
[DEBUG:2022-07-23 10:43:49,406] rhasspyserver_hermes: Connecting to localhost:12183 (retries: 1/10)
[ERROR:2022-07-23 10:43:48,403] rhasspyserver_hermes: mqtt connect
Traceback (most recent call last):
File “/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/init.py”, line 290, in start
self.client.connect(self.host, self.port)
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 937, in connect
return self.reconnect()
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 1071, in reconnect
sock = self._create_socket_connection()
File “/usr/lib/rhasspy/.venv/lib/python3.7/site-packages/paho/mqtt/client.py”, line 3522, in _create_socket_connection
return socket.create_connection(addr, source_address=source, timeout=self._keepalive)
File “/usr/lib/python3.7/socket.py”, line 727, in create_connection
raise err
File “/usr/lib/python3.7/socket.py”, line 716, in create_connection
sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address
[DEBUG:2022-07-23 10:43:48,401] rhasspyserver_hermes: Connecting to localhost:12183 (retries: 0/10)
[DEBUG:2022-07-23 10:43:48,400] rhasspyserver_hermes: Starting core

So, we get back to testing our audio devices in Rhasspy as per the manual, and when we test our device we get this instant reponse:

[DEBUG:2022-07-23 11:06:35,486] rhasspyserver_hermes: Handling AudioDevices (topic=rhasspy/audioServer/devices, id=2150372c-02d1-4b4c-b9a9-82c8c319276e)
[DEBUG:2022-07-23 11:06:35,445] rhasspyserver_hermes: Publishing 101 bytes(s) to rhasspy/audioServer/getDevices
[DEBUG:2022-07-23 11:06:35,445] rhasspyserver_hermes: → AudioGetDevices(modes=[<AudioDeviceMode.INPUT: ‘input’>], site_id=‘default’, id=‘7d8a031f-81e3-4771-8a7a-f266ade46af0’, test=True)

Obviously we’ve already checked that the root user is a member of the docker and audio groups, as are our users.

So next, we try to open a bash session inside the Rhasspy docker container:

tim@daedalus:~ $ docker exec -it rhasspy bash

List the USB audio devices:

root@eda8a4b84efc:/# arecord -l
**** List of CAPTURE Hardware Devices ****
card 1: Y02 [BY Y02], device 0: USB Audio [USB Audio]
Subdevices: 1/1
Subdevice #0: subdevice #0

And here come the point where we’re out of our depth - when we try to record inside the docker container we can also get this to work as long as we specify the device to use using the -D switch:

root@eda8a4b84efc:/# arecord -DCARD=Y02,DEV=0 -r 16000 -c 1 -f S16_LE test.wav

If we don’t specify the device but rely on the default, then we get this error:

arecord: main:828: audio open error: No such file or directory

This could explain why Rhasspy cannot record audio - but how do we set our default device within the docker container so that Rhasspy can start hearing us?

I’ve got it!

Turns out (I did not know this, nor could I find any reference to it) that the /usr/share/alsa/alsa.conf in the docker container had different default devices listed than the alsa.conf:

defaults.ctl.card 0
defaults.pcm.card 0

should have been

defaults.ctl.card 1
defaults.pcm.card 1

in my case. Copying the alsa.conf with the right values (and then resetting audio to arecord, and then restarting Rhasspy) from my host machine to the docker container did the trick:

docker cp alsa.conf {container-id}:/usr/share/alsa/alsa.conf