It looks like Kaldi have merged a PR providing online wake word detection using TDNN (based on the MobVoi and HeySnips datasets):
The FA rate is pretty good on the chinese dataset (9 out of 5899 negatives but with words that are very very similar to the wake word). I’m wondering if the performance on embedded devices is acceptable for an always listening process. This is pretty interesting…
For a custom model, the training data still needs loads of WAVs though (as usual).
PS: Has anyone successfully compiled Kaldi for ARM architecture (RaspberyPi) ?