Preview of 2.5.11

A preview of Rhasspy 2.5.11 is now available via Docker! Just run docker pull rhasspy/rhasspy:2.5.11 to update. I’ll mark it as latest once we work out the inevitable handful of bugs I introduce in each version :laughing:

Thanks to everyone who contributed, and to the Rhasspy community for keeping the party going :partying_face:

The full changelog is below, but here are some of the big highlights:

  • Vosk support for ASR! This works best in the “open transcription” mode where you can say anything. In restricted voice command mode, I’m only able to limit Vosk’s vocabulary; I can’t control the actual grammar yet.
  • “Unknown words” in Kaldi. Check the “Replace unknown words with …” in the settings and re-train. Incorrect words should now be replaced with <unk> and (usually) result in a “not recognized”. You can adjust the probability of unknown words too :slight_smile:
  • Arabic added as a new language with a Kaldi model

Changelog

Added

  • Option to allow “unknown” words to be recognized with Kaldi with Text FST grammar enabled
  • Preliminary support for Arabic (ar)
  • Initial support for Vosk ASR (English, German, French, Spanish, Italian, Russian, Portuguese, Vietnamese, Arabic)
  • Multiple audio streams for Precise/Porcupine/Snowboy/Pocketsphinx wake word service (thanks Romkabouter)
  • Raven speed optimizations for dynamic time warping code (thanks maxbachmann)
  • Dialogue manager will convert audio files to WAV using soundfile and audioread
  • dialogue.sound_suffixes profile setting controls file extensions searched for when dialogue feedback sound path is a directory
  • rhasspy-speakers-cli-mqtt will convert audio bytes to WAV using soundfile and audioread
  • Wake word systems can receive raw UDP audio from multiple sites, and forward it to MQTT (see wake..udp_site_info)

Fixed

  • Websocket queues are retained between restarts (thanks sabeechen)

Changed

  • The rhasspy-silence CLI tool can now split audio by silence and trim silence
21 Likes

At first big Thanks for this release. Awesome!

What does multiple audio streams mean in particular? Will it be able to play any sound and rhasspy still listens for the wakeword now?

I’m asking because of a radio stream or music stream feature I would like to use.

1 Like

Great work, I will try it soon :slight_smile:

No, it means you can have multiple satellites pushing an audioStream to Rhasspy.
Previously, when connecting a second (esp32) satellite, Rhasspy would stop working because the two audiostream were mixed.
Now each siteid/audioFrame topic has its own process.

Hopefully fixes this issue: ESP32 Satellites · Issue #209 · rhasspy/rhasspy · GitHub

2 Likes

Has anyone had any luck with this? I’m using a Pi 4 as a base station with Text FST grammar selected. Training usually completes in roughly 14-17 seconds. I’m not able to complete training with unknown words enabled. I’m seeing a variety of behaviour whilst following the log during the process.

  1. Training often stumbles/halts quite early on without error, though I can’t identify any patterns.

  2. If training does progress further through the procedure it throws timeouts

File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__main__.py", line 1313, in api_train
    result = await core.train()
File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 462, in train
    timeout_seconds=self.training_timeout_seconds,
File "/usr/lib/rhasspy/rhasspy-server-hermes/rhasspyserver_hermes/__init__.py", line 995, in publish_wait
    result_awaitable, timeout=timeout_seconds
File "/usr/lib/python3.7/asyncio/tasks.py", line 423, in wait_for
    raise futures.TimeoutError()
concurrent.futures._base.TimeoutError
  1. Regardless of 1 or 2, a ‘405 Not Allowed’ response is thrown after 60 seconds in place of a training completed in X seconds.

Rhasspy can be woken once training has “completed/failed”, but ultimately does not capture any speech and ultimately timeouts.

Curious to know if anyone else has had any success! :smiley:

1 Like

So… is this just for wake word detection, or can it also be used to stream audio from sattelites, to then be processed for intent recognition?

I’m thinking mainly about whether this is useful for this

Hmmmm…I may need to add some more options here. For a lot of sentences, my current implementation will increase training time a lot. I need to at least make the training timeout length an option, at least.

Try decreasing the number of frequent words it uses (default is 100), and see if that helps. If not, I may need to add an option to have a “possible unknown word” inserted for every 2nd or 3rd (etc.) actual word. Right now, every word in every sentence gets a “possible unknown word” path in the intent graph.

Not just for wake word detection, no. Here’s an example I have with porcupine:

{
    "wake": {
        "porcupine": {
            "udp_audio": "0.0.0.0:12345:satellite1",
            "udp_site_info": {
                "satellite1": {
                    "forward_to_mqtt": true,
                    "raw_audio": true
                }
            }
        }
    }
}

This config says to accept audio on the server on UDP port 12345 for satellite1. Additionally, audio from satellite1 is raw PCM rather than WAV chunks ("raw_audio": true).

Once the wake word has been detected, this raw PCM audio will be automatically packaged up as WAV chunks and forwarded over MQTT to the rest of Rhasspy’s services ("forward_to_mqtt": true). So you will get ASR, NLU, dialogue, etc. for that satellite’s site ID :slight_smile:

The trigger for MQTT forwarding is currently whether or not the wake word service has been disabled. This may or may not be a good idea, but Rhasspy’s dialogue manager automatically disables the wake word service after detection and re-enables it after ASR + feedback sounds have finished. So it seemed natural to have MQTT forwarding start when the wake word service is disabled, and then stop when it’s re-enabled. Thoughts?

I run that command, and then restart rhasspy from the web and from Docker, but version 10 still comes out from the web and not 11.

Restarting will restart the old container, not 2.5.11
What command do you use?

The best way to do this is to first stop and remove the old container and then start the new one. Something like:

$ docker pull ...
$ docker stop rhasspy
$ docker rm rhasspy
$ docker run ...
1 Like

I tried frequent words as low as 1, same result.

A configuration option for training timeout length would be great nonetheless. I tried setting app.config['BODY_TIMEOUT'] in Quart to extend the training timeout but it didn’t seem to take/had no effect - not sure why, any ideas? I had hoped to see quite how long training would take with frequent words set as 10.

Did you happen to see any other errors in the logs before the timeout? I did use some different Kaldi features to handle unknown words, so maybe something is also wrong there.

There’s a hard-coded training timeout of 600 seconds (10 minutes) currently. If it’s still taking that long with a value of 1, I don’t think it’s going to work great for most people in the Pi.

One option is to do the second, third, etc. word thing I mentioned. Another might be to just create a separate, low probability “catch all” intent that is made up of these frequent words. That should add almost no time to the training.

Hi,
I just came over this older Post:

Could you, if not already done, add Speex to the Container?

Thanks Kuumaur

1 Like

Just wondering when the 64 debs might be available.
No rush just curious :slight_smile:

1 Like

Working on it this week :slight_smile:

2 Likes

is there a way to get the best of both options? i really like the exact match on my sentences but also want wildcard on the end of them like uh: define (word). I was able to get it to work with the last image but idk what to do now. in one of these patches i swear it was just perfect. It would get the sentences and any extra words I’d throw in. It would just send it in the raw text.

I actually do know how to do this now. I will need to extend the templating language with a “wildcard” symbol. Maybe just “*”?

If I use the same trick as the “unknown words”, then * would match any single word from the base dictionary. Unfortunately, multiple wildcards in a row would not be subject to the statistics of the base language model (so “the big dog” would be just as likely as “dog big the”).

Thoughts?

multiple words i dont need. that’s just too much. u need like cloud huge ai for that. I’m just looking for one word. Sometimes i’ll ask google a couple words to throw in its search engine too. Like i would just like the ability to choose at which point open transcription starts? Idk. I mean its pretty good at recognizing my sentences with fast speed too. If we want to throw random words in for questions though it just is all over the place. If i tweak the older version just open transcription with 0.3 confidence. It will half the time get what I’m saying i think. Idk how we could do it but unless there’s huge priority. We are really close to getting there. I know bash maybe i could help u out. Would have to look at the code base. I mean I’m pretty surprised how well i got it working considering its completely offline. New voices are a huge step forward btw. good job. sound a lot better. even if it takes an extra 2-3 seconds. On a raspberry pi 4 4gb

1 Like

Update

The Docker images have been updated, and the Debian packages for 2.5.11 are now available!

Changes are described below:

Unknown Words

Based on feedback, I’ve modified the “unknown words” feature for Kaldi. To reduce training time with it enabled, I now create a single “unknown sentence” path in the grammar. Speaking something outside your sentences.ini should produce something like <unk> <unk> ... up to a maximum number of words.

You can adjust the probability of the “unknown sentence” (default is 1e-5). If you end up with too many false positives for unknown words, try lowering the probability.

Cancel Word

The old version of “unknown words” didn’t work, but I realized it was perfect for a “cancel” word! So now in the Kaldi settings, you can set a special word that will immediately cancel the current voice command. Just make sure it’s not a word you’re using in your intents :slight_smile:

So if you were to set the “cancel” word to “terminate”, you could say something like:

turn on the terminate

and Kaldi will return:

turn on the <unk>

which will cause fsticuffs to report a failed recognition. You can set the cancel word to whatever you like (I wouldn’t use a word from your intents), and change the probability.

8 Likes

I just pulled the new Docker Image.
Thanks for all the work.
Iam looking for the “cancel” word at Kaldi settings, but didnt found it. Are there some post tasks necessary to bring up the new features?

Btw.: Did you already add the german Voice “kerstin” to 2.5.11?