Google and Arm have been playing with state-of-art KWS and publishing as opensource and was kicked started with GitHub - ARM-software/ML-KWS-for-MCU: Keyword spotting on Arm Cortex-M Microcontrollers
Google did some comparisons looking at low latency streaming KWS with google-research/kws_streaming at master · google-research/google-research · GitHub which Arm now seem to be also using as a base GitHub - ARM-software/keyword-transformer.
A KWS 20ms streaming model will run with less than 20% load of a single core with state of art accuracy, but its still the old adage of getting a dataset for custom KW but I have a little utility to quickly create an augmented dataset so you can create any custom KW for your voice or anyone else’s you wish to add to the mix.
Reader.py just shows a series of words on screen based on your KW and a few sentences containing as many phones & allophones and records each as a individual 2.5sec wav so you have a bit of reading time to err.
You then just run split0.py or split1.py which refer to the methods of the above Google-research training routine split_data.
split0.py uses my method for mixing and matching background noise and creates fixed training, testing & validation folders for the datset split type 0 of G-kws and split1.py doesn’t mix in any noise as that will be handled by G-kws and produces KW & !KW folders again for training by G-kws.
G-kws is kicked off with a script of settings and its just wait for complete
# Train KWT on Speech commands v2 with 12 labels
CMD_TRAIN="python -m kws_streaming.train.model_train_eval"
--data_url '' \
--data_dir $DATA_PATH/ \
--train_dir $MODELS_PATH/crnn_state/ \
--mel_upper_edge_hertz 7600 \
--how_many_training_steps 2000,2000,2000,2000 \
--learning_rate 0.001,0.0005,0.0001,0.00002 \
--window_size_ms 40.0 \
--window_stride_ms 20.0 \
--mel_num_bins 40 \
--dct_num_features 20 \
--resample 0.0 \
--wanted_words silence,notkw,kw0 \
--split_data 0 \
--train 1 \
--lr_schedule 'exp' \
--use_spec_augment 1 \
--time_masks_number 2 \
--time_mask_max_size 10 \
--frequency_masks_number 2 \
--frequency_mask_max_size 5 \
--feature_type 'mfcc_op' \
--fft_magnitude_squared 1 \
--cnn_filters '16,16' \
--cnn_kernel_size '(3,3),(5,3)' \
--cnn_act "'relu','relu'" \
--cnn_dilation_rate '(1,1),(1,1)' \
--cnn_strides '(1,1),(1,1)' \
--gru_units 256 \
--return_sequences 0 \
--dropout1 0.5 \
--units1 '128,256' \
--act1 "'linear','relu'" \
Took me a while to work out, to up the the dropout to cope with overfitting and false positives
dropout1 0.5 which I use seems to work well.
Sanebow has done a great wrapper in GitHub - SaneBow/tflite-kws: Keyword Spotting (KWS) API wrapper for TFLite streaming models.
The Dataset-builder has quite and exhaustive list of datasets from noise to keywords as opposed to ASR sentences the word datasets can be quite hard to find.
I should update g-kws but its only a install guide on Arm really as its the Google-research framework but for some reason has all in one repo so.
git clone https://github.com/google-research/google-research.git
mv google-research/kws_streaming .
you can delete the rest of the google-research stuff after that and just keep the important kws_streaming dir.