Preview of 2.5.11

A preview of Rhasspy 2.5.11 is now available via Docker! Just run docker pull rhasspy/rhasspy:2.5.11 to update. I’ll mark it as latest once we work out the inevitable handful of bugs I introduce in each version :laughing:

Thanks to everyone who contributed, and to the Rhasspy community for keeping the party going :partying_face:

The full changelog is below, but here are some of the big highlights:

  • Vosk support for ASR! This works best in the “open transcription” mode where you can say anything. In restricted voice command mode, I’m only able to limit Vosk’s vocabulary; I can’t control the actual grammar yet.
  • “Unknown words” in Kaldi. Check the “Replace unknown words with …” in the settings and re-train. Incorrect words should now be replaced with <unk> and (usually) result in a “not recognized”. You can adjust the probability of unknown words too :slight_smile:
  • Arabic added as a new language with a Kaldi model

Changelog

Added

  • Option to allow “unknown” words to be recognized with Kaldi with Text FST grammar enabled
  • Preliminary support for Arabic (ar)
  • Initial support for Vosk ASR (English, German, French, Spanish, Italian, Russian, Portuguese, Vietnamese, Arabic)
  • Multiple audio streams for Precise/Porcupine/Snowboy/Pocketsphinx wake word service (thanks Romkabouter)
  • Raven speed optimizations for dynamic time warping code (thanks maxbachmann)
  • Dialogue manager will convert audio files to WAV using soundfile and audioread
  • dialogue.sound_suffixes profile setting controls file extensions searched for when dialogue feedback sound path is a directory
  • rhasspy-speakers-cli-mqtt will convert audio bytes to WAV using soundfile and audioread
  • Wake word systems can receive raw UDP audio from multiple sites, and forward it to MQTT (see wake..udp_site_info)

Fixed

  • Websocket queues are retained between restarts (thanks sabeechen)

Changed

  • The rhasspy-silence CLI tool can now split audio by silence and trim silence
18 Likes

At first big Thanks for this release. Awesome!

What does multiple audio streams mean in particular? Will it be able to play any sound and rhasspy still listens for the wakeword now?

I’m asking because of a radio stream or music stream feature I would like to use.

1 Like

Great work, I will try it soon :slight_smile:

No, it means you can have multiple satellites pushing an audioStream to Rhasspy.
Previously, when connecting a second (esp32) satellite, Rhasspy would stop working because the two audiostream were mixed.
Now each siteid/audioFrame topic has its own process.

Hopefully fixes this issue: ESP32 Satellites · Issue #209 · rhasspy/rhasspy · GitHub

2 Likes

Has anyone had any luck with this? I’m using a Pi 4 as a base station with Text FST grammar selected. Training usually completes in roughly 14-17 seconds. I’m not able to complete training with unknown words enabled. I’m seeing a variety of behaviour whilst following the log during the process.

  1. Training often stumbles/halts quite early on without error, though I can’t identify any patterns.

  2. If training does progress further through the procedure it throws timeouts

File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1313, in api_train
    result = await core.train()
File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 462, in train
    timeout_seconds=self.training_timeout_seconds,
File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 995, in publish_wait
    result_awaitable, timeout=timeout_seconds
File "/usr/lib/python3.7/asyncio/tasks.py", line 423, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
  1. Regardless of 1 or 2, a ‘405 Not Allowed’ response is thrown after 60 seconds in place of a training completed in X seconds.

Rhasspy can be woken once training has “completed/failed”, but ultimately does not capture any speech and ultimately timeouts.

Curious to know if anyone else has had any success! :smiley:

1 Like

So… is this just for wake word detection, or can it also be used to stream audio from sattelites, to then be processed for intent recognition?

I’m thinking mainly about whether this is useful for this

Hmmmm…I may need to add some more options here. For a lot of sentences, my current implementation will increase training time a lot. I need to at least make the training timeout length an option, at least.

Try decreasing the number of frequent words it uses (default is 100), and see if that helps. If not, I may need to add an option to have a “possible unknown word” inserted for every 2nd or 3rd (etc.) actual word. Right now, every word in every sentence gets a “possible unknown word” path in the intent graph.

Not just for wake word detection, no. Here’s an example I have with porcupine:

{
    "wake": {
        "porcupine": {
            "udp_audio": "0.0.0.0:12345:satellite1",
            "udp_site_info": {
                "satellite1": {
                    "forward_to_mqtt": true,
                    "raw_audio": true
                }
            }
        }
    }
}

This config says to accept audio on the server on UDP port 12345 for satellite1. Additionally, audio from satellite1 is raw PCM rather than WAV chunks ("raw_audio": true).

Once the wake word has been detected, this raw PCM audio will be automatically packaged up as WAV chunks and forwarded over MQTT to the rest of Rhasspy’s services ("forward_to_mqtt": true). So you will get ASR, NLU, dialogue, etc. for that satellite’s site ID :slight_smile:

The trigger for MQTT forwarding is currently whether or not the wake word service has been disabled. This may or may not be a good idea, but Rhasspy’s dialogue manager automatically disables the wake word service after detection and re-enables it after ASR + feedback sounds have finished. So it seemed natural to have MQTT forwarding start when the wake word service is disabled, and then stop when it’s re-enabled. Thoughts?

I run that command, and then restart rhasspy from the web and from Docker, but version 10 still comes out from the web and not 11.

Restarting will restart the old container, not 2.5.11
What command do you use?

The best way to do this is to first stop and remove the old container and then start the new one. Something like:

$ docker pull ...
$ docker stop rhasspy
$ docker rm rhasspy
$ docker run ...
1 Like

I tried frequent words as low as 1, same result.

A configuration option for training timeout length would be great nonetheless. I tried setting app.config['BODY_TIMEOUT'] in Quart to extend the training timeout but it didn’t seem to take/had no effect - not sure why, any ideas? I had hoped to see quite how long training would take with frequent words set as 10.

Did you happen to see any other errors in the logs before the timeout? I did use some different Kaldi features to handle unknown words, so maybe something is also wrong there.

There’s a hard-coded training timeout of 600 seconds (10 minutes) currently. If it’s still taking that long with a value of 1, I don’t think it’s going to work great for most people in the Pi.

One option is to do the second, third, etc. word thing I mentioned. Another might be to just create a separate, low probability “catch all” intent that is made up of these frequent words. That should add almost no time to the training.

Hi,
I just came over this older Post:

Could you, if not already done, add Speex to the Container?

Thanks Kuumaur