Still playing or rather pulling hair at the length of time to create datasets now fixed with the huge brilliant datasets from MLcommons.
Has some really interesting KWS mainly aimed at microcontroller but he has done something really interesting with cascaded models.
Where on a KW hit the input frame is passed to a cascaded model and tested again.
42io is a bit of a genius as his false positive/negative table shows and the dataset he used is actually really horrid as contains 10 KW doesn’t have nearly enough !KW and also lite on label sample qty so that just makes the result table a start point to be fed with a better dataset.
3ECNN13 10 | 11191 2ECNN13 83 | 11191 2ECNN47 48 | 11191 EDCNN47 2042 | 11191 ECNN47 4494 | 11191 DCNN13 4787 | 11191 DCNN47 4517 | 11191 MLP 5091 | 10991 CNN 4958 | 10991 RNN 4527 | 10991
Not sure on the effects on latency but the false positives/negatives on the 3 cascade is really impressive.
Still haven’t decided if these really lean low parameter cascades are a better option to single tier larger parameter models and only having modest computer hardware means these things take time.
MLcommons has reinvigorated my KWS interest as finally have a dataset for KWS at least I do have a english one that for the 1st time actually has the huge collection of words necessary.
Its now all about how you build your dataset and then picking a model that works well in regards to latency and load for platform.
Just to show how tiny the models are
~$ head /dev/zero -c32000 | valgrind bin/fe # 1,136,764 bytes allocated ~$ seq 637 | valgrind bin/guess models/mlp.tflite # 158,002 bytes allocated ~$ seq 637 | valgrind bin/guess models/cnn.tflite # 902,258 bytes allocated ~$ seq 637 | valgrind bin/guess models/rnn.tflite # 2,414,578 bytes allocated ~$ seq 637 | valgrind bin/guess models/dcnn.tflite # 465,122 bytes allocated ~$ seq 611 | valgrind bin/guess models/dcnn47.tflite # 981,583 bytes allocated ~$ seq 13 | valgrind bin/guess models/dcnn13.tflite # 689,566 bytes allocated ~$ seq 611 | valgrind bin/guess models/edcnn47.tflite # 1,670,261 bytes allocated ~$ seq 611 | valgrind bin/guess models/ecnn47.tflite # 8,637,011 bytes allocated ~$ seq 611 | valgrind bin/guess models/2ecnn47.tflite # 22,956,483 bytes allocated ~$ seq 13 | valgrind bin/guess models/2ecnn13.tflite # 7,264,955 bytes allocated ~$ seq 13 | valgrind bin/guess models/3ecnn13.tflite # 10,733,331 bytes allocated
The cascade method is something I never thought about but makes sense as a model is just statistical graph and prob why a 3 tier work better than a 2 tier and prob the stats are a log of tiers than linear.
You don’t have to embed this in a model as 2 models can do this and the 1st is just a normal model and works as normal but the 2nd model does a post check of x2 runs and cancels a running kw hit operation as that way you have no additional cascade latency over a normal model you just cancel with the very short delay of testing 2x kw frames in a non streaming model.
Think a 3 model would be better as the individual training of each adds to the overall accuracy.