two things about R3 with porcupine I’m testing in a Ubuntu VM:
it seems the base config is not being overridden by the user config.
Changing the wake file to grasshopper_linux.ppn only took effect when I put it in the base config, but didn’t when put in the user config file. Anyone else see this?
I created a custom wake word ( linux *.ppn ) file with the Picovoice console and can’t get it to work with R3.
Is there something I need to do for compatability like an access key or something?
Error I got is shown below.
What can I use for a custom wake word if/until I can get porcupine working?
$ script/run bin/wake_detect.py --debug
DEBUG:rhasspy3.core:Loading config from /home/sass/working/rhasspy3/rhasspy3/configuration.yaml
DEBUG:rhasspy3.core:Skipping /home/sass/working/rhasspy3/config/configuration.yaml
DEBUG:wake_detect:mic program: PipelineProgramConfig(name=‘arecord’, template_args=None, after=None)
DEBUG:wake_detect:wake program: PipelineProgramConfig(name=‘porcupine1’, template_args={‘model’: ‘alice_en_linux_v2_1_0.ppn’}, after=None)
DEBUG:rhasspy3.program:mic_adapter_raw.py [‘–samples-per-chunk’, ‘1024’, ‘–rate’, ‘16000’, ‘–width’, ‘2’, ‘–channels’, ‘1’, ‘arecord -q -D “default” -r 16000 -c 1 -f S16_LE -t raw -’]
DEBUG:wake_detect:Detecting wake word
DEBUG:rhasspy3.program:.venv/bin/python3 [‘bin/porcupine_stream.py’, ‘–model’, ‘alice_en_linux_v2_1_0.ppn’]
Traceback (most recent call last):
File “/home/sass/working/rhasspy3/config/programs/wake/porcupine1/bin/porcupine_stream.py”, line 110, in
main()
File “/home/sass/working/rhasspy3/config/programs/wake/porcupine1/bin/porcupine_stream.py”, line 61, in main
porcupine = pvporcupine.create(
File “/home/sass/working/rhasspy3/config/programs/wake/porcupine1/.venv/lib/python3.10/site-packages/pvporcupine/init.py”, line 64, in create
Traceback (most recent call last):
File “/home/sass/working/rhasspy3/bin/wake_detect.py”, line 80, in
asyncio.run(main())
File “/usr/lib/python3.10/asyncio/runners.py”, line 44, in run
return loop.run_until_complete(main)
File “/usr/lib/python3.10/asyncio/base_events.py”, line 646, in run_until_complete
return future.result()
File “/home/sass/working/rhasspy3/bin/wake_detect.py”, line 69, in main
detection = await detect(rhasspy, wake_program, mic_proc.stdout)
File “/home/sass/working/rhasspy3/rhasspy3/wake.py”, line 109, in detect
wake_event = wake_task.result()
File “/home/sass/working/rhasspy3/rhasspy3/event.py”, line 48, in async_read_event
event_dict = json.loads(json_line)
File “/usr/lib/python3.10/json/init.py”, line 346, in loads
return _default_decoder.decode(s)
File “/usr/lib/python3.10/json/decoder.py”, line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File “/usr/lib/python3.10/json/decoder.py”, line 355, in raw_decode
raise JSONDecodeError(“Expecting value”, s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Regarding Procupine, I think your issue is that presently Rhasspy3 is using Porcupine1, which I don’t think supports the custom wake word generated from the present Picovoice Console since those now require the API key to use. That functionality is in Porcupine2. I know some users here (myself included) managed to patch in Porcupine2 into Rhasspy2, but doesn’t appear to be in Rhasspy3. Maybe a good enhancement request?
Regarding your first point, I didn’t play with wake word yet, was just testing using the satellite in the browser, but my overrides did appear to work (I replaced the larynx2 TTS with Mimic3 as there’s a voice on there I like lol)
Think the docs mention other wake word engines that support this besides picovoice.
I believe precise allows you to train your own wake model. Haven’t played around with it personally but I believe that was the same engine used for Mycroft. I believe you could also try installing picovoice2 and integrating it with Rhasspy3 as well, since the dev stack and modular nature should allow for that
This is an OVOS plugin for openWakeWord, an open-source wakeword or phrase detection system. It has competitive performance compared to Mycroft Precise or Picovoice Porcupine, can be trained on 100% synthetic data, and can run on a single Raspberry Pi 3 core.
well the build tensorflow, for Mycroft Precise, using python 3.10 went mostly ok until near the end but never completed. I have to roll my python back to 3.7 and try again.
This is on my TODO list to fix, even though I don’t work for Mycroft anymore. Precise is such a simple model that it would be a tiny amount of PyTorch code these days.
Also, I have snowboy-seasalt as a Docker image if you want to train your own snowboy wake word.
Lastly, I plan to add snowman to the Rhasspy 3 wake word engine list.
After a couple of years using my custom made vocal assistant I’ve come to the conclusion that there is a missing component in the system: the DSP that remove unwanted noise from the user input (be it kitchen noises or other people voices aka cocktail party effect).
Without this component any vocal assistant pipeline will be erratic at best.
I firmly believe that it also have to be in the todo list for any open source vocal assistant to work good enough for wide user adoption.
There have been some impressive advancements in this area in the last 2 years with Google Voice Filter Lite and more recently :
Using a generic KWS that feeds a voice recognition system based on the catched keyword audio (or a personalized KWS tailored for specific voices) that then feeds a centralized voice separation model before the ASR component will much more effectively improve ASR and NLU confidence than any other noise reduction, beam forming or AEC system.
The best solution should be to not only remove/attenuate noises but to only keep the utterance voice. Recent ASR systems are today pretty tolerant to some amount of noise but often fall short with overlapping voices.
It also voids the need for multiple mics which is a huge plus regarding hardware complexity (as demonstrated by recent Google Nest HW changes)
Snowboy is an iconic piece of KWS history as the 1st to employ a DNN system, but sort of similar to eSpeak in use as really pretty terrible in use now, as the pace of change of technology of early 1st’s has them both now left far behind.
I guess because you can?
But for a while I have thought any BSS with a personalised VAD or KWS on each stream it splits into can detect a target be it personalised VAD or KW as that is the drawback with BSS algs as generally they will split into nSignals dictated by nMics that can find distinct sources from the TDOA they detect with no concept of content.
Seems that is what Espressif are doing as its a simple BSS spliting into 2/3 streams where they simply put a KWS on each stream to select the target.
Voicefilter-lite is a ML based BSS that steers a target into a single channel and can use the target to further filter the voice required, all in a single model.
For humans to interject requires 2 voices (noise sources) and when you get 3 or more it quickly becomes a cacophony where Google said great as with just 2x mics and a clever lite weight model they can get much better results in all scenarios that are usable anyway.
Esspressif have a 3mic version I guess because they have no voice-filter-lite and like beamforming more mics equals more resolution and seperation, but also another channel to scan for if its the one you require.
So, my use case would be (in pseudo pipeline yaml): no audio on server… no heavy processing stt/tts on client.
client/satellite:
mic:
vad:
remote: -> send vad wav to server
sound: <- audio from server
server/base:
asr-stt:
handle:
intent:
tts: -> audio to client
It doesn’t seem like your current pipeline supports this? I could probably create a dummy wake word that always returns true… but it seem asr is hard coded with mic processing? Am I wrong about this?
Got the snowboy custom wake word working, thanks!
I trained it with 25 utterances of various TTS voices from different countries/accents.
It seems to work pretty well in my Ubuntu VM. @synesthesiamWill this and Rhasspy3 work on a Pi4 also?
@synesthesiam have you given any thought to including speaker-verification/voice-authentication (ie, biometric) into Rhasspy3? Does the new modular nature of V3 it make this easier to integrate this feature into a custom assistant? What open source projects have some speaker-verification/authentication functionality atm, DeepSpeech? Coqui?
Your product is incredible, are there any estimated dates for finished product by docker? And if I use rasspy 2.4 will it be hard to migrate? if i use remote http.
Accumulating context within a pipeline is going to be needed for speaker identification/verification. I don’t have a specific program in mind for this yet, though. Some projects I’m looking at are personalVAD and Personalized PercepNet.
I don’t think it will, unfortunately. Seasalt only contained code for x86_64 systems, so I don’t think it will create wake words for arm64 systems. It might be possible to extend Seasalt, but I’m not sure how they created it in the first place.
Snowman, on the other hand, should work just fine on a Pi 4
Thanks! I plan to keep the same sentences format, and I will add some backwards compatible endpoints to the HTTP API. So using 2.4 to start shouldn’t be a big problem for migration. But I don’t have any estimated dates yet, sorry