The official release of 2.5.10 is here! Thanks to every one for the testing and feedback
I’ve added a few bug fixes since the preliminary release, but the main new features since 2.5.9 are still:
- ASR support for Swedish (
sv
) - New Larynx that’s faster and has a ton of new voices
- Hint: switch to “Low Quality” on a Pi 4 or below for a big speed-up
- This version should also work with older x86_64 CPUs (no AVX)
- Kaldi ASR now has confidence value for words and sentences
- Dialogue manager now has a minimum ASR confidence threshold (
speech_to_text.<system>.min_confidence
where<system>
iskaldi
,deepspeech
, etc.)
Since the preliminary release, I’ve also added:
- ASR confidences show up now in
asr/textCaptured
andnlu/intent
messages (seeasrConfidence
andasrTokens
). - Custom entities from
/api/listen-for-command
now show up in the recognizednlu/intent
(seecustomEntities
) - TTS timeouts are based on the text length and
dialogue.say_chars_per_second
- Sound paths in the dialogue manager may be directories - a random WAV file is chosen each time then
- Only one satellite within a group should start recording if more than one detect the wake word at the same time
Added
- New version of Larynx with improved performance and 35 voices (20 English, 1 German, 3 French, 2 Spanish, 3 Dutch, 2 Italian, 1 Swedish, 3 Russian)
- Kaldi ASR model for Swedish (sv)
- Confidence and word timings for Kaldi ASR
- Minimum ASR confidence threshold for dialogue manager
- Detect AVX support and warn for Larynx, DeepSpeech, and Precise in Web UI
- Handle spaces in converter arguments with word!(converter, …)
- rhasspy-tts-cli-hermes TTS commands may be Jinja2 templates (–use-jinja2)
- Support for MaryTTS effects (jasonhildebrand)
- customData added to hermes/nlu/query message
- customData is copied by NLU services from query to intent/intentNotRecognized
- lang property added for wake, speech_to_text, and intent profile sections
- Wake, ASR, NLU services all set lang properties if null
- Profile now has “parent” setting, allowing one profile to load settings from another
- Dialogue manager sound paths may be directories, from which a random WAV will be chosen each time (thanks plafue)
Fixed
- Remote HTTP service sets site_id of satellite for ASR/NLU endpoints
- DeepSpeech token output (was letters, now words)
- Multiple values in custom converters are sent as a list on stdin
- Don’t show restart/shutdown button if “sudo” isn’t available (Docker, Hass.io)
- Added missing espeak phonemes for some profiles
- MaryTTS voice test in Web UI
- Remove dialogue session from site cache on end
- Don’t throw error about system not configured if message is intent for satellite (schnopsi)
- Custom entities from /api/listen-for-command are passed through to NLU intent
- Slots inside sub-directories will properly show up in the web interface
- Use locks in dialogue manager to prevent multiple group satellite sessions during audio playback
Changed
- /api/listen-for-command uses a proper wake workflow now (requires dialogue manager)
- Show absolute paths for custom models (precise, snowboy, porcupine) in Web UI
- TTS timeouts are computing using text length (dialogue.say_chars_per_second)