OH wow man!! I can’t even explain how awesome that is!! So great!! i will be anxiously waiting to see things to come!! so COOL!!!
How about splitting this into two, one tool to record data on a satellite device, and a standalone software that can take that data and run it on any pc to train the model? I run a standalone on a pi and I don’t see me exchanging that one with a dedicated pc with a gpu just so I can train a model on there when I have a perfectly good desktop pc to train on, and I am fine with the performance of my pi so far. Pretty sure I am not the only one with a use case like that, so having a recording tool and a training tool, and the option to train off device would be a great thing
I initially was talking only for KWS and if all-in-one then on-device but say with multiple esp32-s3 or pi02w KWS you would do it off device on a Pi4 that might be connected to several in several rooms.
If you think of how much as Voice system is idle and keep to a secondary small model then a Pi4 without GPU is very capable of ondevice training without need of GPU.
KWS is a relatively small model to others and the secondary model is a small model of that.
I am same about GPU’s and large models such as ASR as yeah you could but the importance was KWS as you don’t need a GPU for it and garner much custom trained accuracy.
Its also less so on ASR as generally its broken up to phones and reconnected by CTC or some other meachanism.
With KWS it literally is a snapshot image that its trying to match of you saying the KW and why custom on device training can add so much to accuracy.
I know how long it takes my pc to train my KW model, and I run more than rhasspy on that pi, since it is idle so often. If the respeaker4 that I use as a mic would run on a current 64bit pi os, I would even just run rhasspy on my media pi so I can remove one of the 4 running pis
You are not training a full KW model you are training a small subset model to add weight to the full trained model.
It can run for days or even weeks before having to complete and is no problem at all. You have a simple routine that stops on load and waits for idle to start to train again. Also you can schedule ‘out of hours’ if you so want and how often it updates and how many times and reset to original.
PS the respeaker 4mic I presume does run 64bit as have 1 myself, its just a bit pointless that it has 4 mics and part of a urban myth multiple mics alone is some how better. Also doesn’t really matter in the manner you are running it but same as kernel updates the channels on each record start in a random order to what ever data word was avail when the driver starts and something Respeaker have never seemed to fix. Some people say they have a fork that fixes the prob with TDM sync but never tried.
Congrats! This is most excellent news!
I’m glad offline voice will get a first class place in Home Assistant with you at the helm! Ease of setup for users is definitely key. All the software additions to Rhasspy are entirely welcome, Mimic 3 and Coqui work wonders!
It even sounds like some of the ideas I was playing around with satellites in Home Intent will continue on and get implemented!
That’s right, though I hope to still collaborate with Mycroft. The open source voice community is so small that I think we need to work together
I think this is a good approach, especially if a GPU is involved. As long as the training tooling works with CPU only on ARM64, it should be usable on a Pi too.
Random question: if you were starting from scratch with Home Intent, would you use YAML again or TOML?
Absolutely, I love the architecture of Rhasspy with all the services meeting at the MQTT broker. My only wish would be to be able to set up a prefix for each service. That would make it easier to intercept and change messages and also for a bridge setup.
I went with YAML since that’s what most folks in HA are familiar with. I’m still not entirely sure how many people configure via YAML, as a lot more folks started using it after I developed a frontend. (The frontend just updates the yaml in the backend)
If I was starting it again, I would see if user-centric config was heading towards TOML and consider it seeing as the Python world really seems to be pushing for it.