Have a play with this and give a go its a simple CNN so its fast to train let it drop out with a patience of 5 or 10.
Test your hypothesis but from what I have seen what your doing looks flawed but hey.
Apols about the code but been aware the Mycroft Sonopy lib is likely broken so been interested in other libs and now all the frameworks are including MFCC math into the framework.
Was just to have a look see and if MFCC made much of a difference to spectrogram but try out some noise and do a cross entropy check on the labels and see how you go.
There is a cpu version of tensorflow 2.4 here
I will have a look at it. I will have to find some time to install everything though.
I dont have a windows or linux x86 here.
Ill have to install python and tensorflow on my mac.
Do you know if it would run on pine rockpro64 or a raspberry pi 4? Because those are the development machines i have set up.
Donāt do training on a SBC full stop unless you have days to spare.
You may have to compile as with 2.4 on a I5 3570 the default wheels fails as I donāt have cutting edge AVX-512.
Also the python and GCC compiler can often be different.
The compile is actually really easy and for me the only confusing part of the tensorflow info was.
ābazel build [āconfig=option] //tensorflow/tools/pip_package:build_pip_packageā as "bazel build "āconfig=opt //tensorflow/tools/pip_package:build_pip_package was all that seemed to be needed for cpu base.
The compile is painful though like several hours set it up as a job whilst you sleep as if it fails at least its not stolen your computer for that time.
gpu is a doddle just the same have the cuda and cudnn stuff preinstalled and its ābazel build --config=cuda --config=opt //tensorflow/tools/pip_package:build_pip_packageā instead
Bazel is relatively easy to install or you can just download the lastest balisk and create a symlink of bazel
Pytorch has annoyed me though as the strong input for torchaudio enforces non gpu mode of intel_mkl math libs or gpu math libs of nvidia and currently on arm the standard libs of openblas & 3wfft and likes have been fired off into the either for hardware specific libs!
I would really be having a good look at the Nvidia Nemo framework but should of known better with Nvidia even if its supposedly stamped as āopensourceā
The scripts here are pyhtonically atrocious but it was just me having a look at some of the additions of tensorflow 2.4
I noticed that tf.signal now has internal audio math so either use the collab or my python adaptation.
Which is this
Also they have added mfcc to that math
So I just hacked that in
Really fast simple and ultimately useless models but extremely good to assess things due to build speed and if you can place all dataset into labels of there own id with sufficient samples as then you can check how much cross entropy they have.
Where running through you dataset samples on the model you train and deleting the low dross start < 0.1 delete retrain. Delete < 0.4 or 0.3 retrain and you will find an accuracy increase of 3-4%
sample_file = data_dir/'go/3d53244b_nohash_1.wav'
sample_ds = preprocess_dataset([str(sample_file)])
for spectrogram, label in sample_ds.batch(1):
prediction = model(spectrogram)
print(f'Predictions for "{commands[label[0]]}"')
print(commands, tf.nn.softmax(prediction[0]))
Loads up a single train wav and runs inference on the model and the softmax for all labels is shown and hence cross entropy is checked.
Your just using a simple an quick to train model to test a dataset before submitted to the format and train of a desired model.
I often use āgoā as if things are going wrong its often a quick canary due to its simularity to ānoā and it shows.
MFCC just complex the non timeline axis of the model to 13 and greatly reduce the resultant model parameters.
The image is resolution is timeline x 13 so the remainder is focussed on and gives an accuracy boost over spectrogram of a couple of % but its biggest advantage it does this whilst compressing the model.
The high frequencies are just thrown away as to the ear they are extremely low energy and no use in general for recognition.
The reality is though we should be expecting anyone to do any of this S*** or even have a care in the world as from datasets checks to noise addition this should be all part of automated tools which we have a complete lack of any.
Once more the Linto HMG is the best click and view model generator I know but from datasets to tools to KWS rhasspy either omits or what is available is lack lustre and its not because of lack of mention?
@JGKK PS out of interest I thought I would try the full tensorflow on the Pi4-2gb I have and yeah its 300% slower than my I5-3570 but actually its a lot faster than I thought.