The model is a bit more problematic in that I think a GRU like Precise doesn’t run on tensorflow lite.
But basically run tensorflow you can use my example tfl-stream but you will have to create a mfcc frontend feed as the g-kws emebed mfcc in the model so you can just forward chunked audio.
That linto hmg is great to get a feel of what of how samples can effect the model as its extremely interesting and enlightening to see which fail.
Its through linto hmg I discovered how many bad samples are the the Google command set and then it clicked that the Google command set is a benchmark dataset and not nessacarily a good dataset to create a model on unless you trim out the bad.
The linto project has a MFCC routine and in there repo’s there is info and here is mine with just a chunked audio stream.
I could prob knock you up a MFCC version to test that GRU but after playing with an easy GUI I would suggest going a bit more hardcore with the G-kws models as the tensorflow-lite models run in much less load the difference is pretty huge.
mkdir g-kws cd g-kws git clone https://github.com/google-research/google-research.git mv google-research/kws_streaming .
you can delete the rest of the google-reasearch dir if you wish as dunno why they put all in one repo
#!/bin/bash # Train KWT on Speech commands v2 with 12 labels KWS_PATH=$PWD DATA_PATH=$KWS_PATH/data2 MODELS_PATH=$KWS_PATH/models_data_v2_12_labels CMD_TRAIN="python -m kws_streaming.train.model_train_eval" $CMD_TRAIN \ --data_url '' \ --data_dir $DATA_PATH/ \ --train_dir $MODELS_PATH/crnn_state/ \ --mel_upper_edge_hertz 7600 \ --how_many_training_steps 2000,2000,2000,2000 \ --learning_rate 0.001,0.0005,0.0001,0.00002 \ --window_size_ms 40.0 \ --window_stride_ms 20.0 \ --mel_num_bins 40 \ --dct_num_features 20 \ --resample 0.15 \ --alsologtostderr \ --train 1 \ --lr_schedule 'exp' \ --use_spec_augment 1 \ --time_masks_number 2 \ --time_mask_max_size 10 \ --frequency_masks_number 2 \ --frequency_mask_max_size 5 \ crnn \ --cnn_filters '16,16' \ --cnn_kernel_size '(3,3),(5,3)' \ --cnn_act "'relu','relu'" \ --cnn_dilation_rate '(1,1),(1,1)' \ --cnn_strides '(1,1),(1,1)' \ --gru_units 256 \ --return_sequences 0 \ --dropout1 0.5 \ --units1 '128,256' \ --act1 "'linear','relu'" \ --stateful 1
You just need to create a dataset and do something like I did in tfl-stream.py g-kws/tfl-stream.py at main · StuartIanNaylor/g-kws · GitHub
You can use the google command set or create your own with the dataset-builder GitHub - StuartIanNaylor/Dataset-builder: KWS dataset builder for Google-streaming-kws or another
Have a read of google-research/kws_streaming at master · google-research/google-research · GitHub as its extremely well documented and contains just about every current state-of-art model for KWS.
The CRNN-state is prob a good start and google-research/base_parser.py at master · google-research/google-research · GitHub contains all the parameters that after a bit of headscratching should get you going.
PS if you have 2 mics I have been playing with the pulseaudio beamformer which I ruled out as you can not steer it.
I have had a rethink and can now steer it so as well as cutting edge model you can prob also add beamforming if interested.
google-research/kws_experiments_paper_12_labels.md at master · google-research/google-research · GitHub gives pretty good guide and the command set for testing can be downloaded
# download and set up path to data set V2 and set it up wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz mkdir data2 mv ./speech_commands_v0.02.tar.gz ./data2 cd ./data2 tar -xf ./speech_commands_v0.02.tar.gz cd ../
# download and set up path to data set V1 and set it up wget http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz mkdir data1 mv ./speech_commands_v0.01.tar.gz ./data1 cd ./data1 tar -xf ./speech_commands_v0.01.tar.gz cd ../
But if you have a look on my dataset-builder repo I posted as many datasets & noise files I could find so even if of no use it might be a good source of datasets even if few.
sanebows code is far more polished than anything I have provided