Porcupine V1.8 Feature Tour

rolyan_trauts · January 23, 2021, 1:21pm

Have a play with this and give a go its a simple CNN so its fast to train let it drop out with a patience of 5 or 10.

Test your hypothesis but from what I have seen what your doing looks flawed but hey.

Apols about the code but been aware the Mycroft Sonopy lib is likely broken so been interested in other libs and now all the frameworks are including MFCC math into the framework.

github.com

StuartIanNaylor/simple_audio_tensorflow/blob/main/simple_audio2_mfcc_frame_length1024_frame_step512.py

import os
import pathlib

#import matplotlib.pyplot as plt
import numpy as np
#import seaborn as sns
import tensorflow as tf
import time

from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras import models
#from IPython import display


# Set seed for experiment reproducibility
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)
time_start = time.perf_counter()

This file has been truncated. show original

Was just to have a look see and if MFCC made much of a difference to spectrogram but try out some noise and do a cross entropy check on the labels and see how you go.
There is a cpu version of tensorflow 2.4 here

github.com

StuartIanNaylor/simple_audio_tensorflow/blob/main/wheels/tensorflow-2.4.0-cp38-cp38-linux_x86_64.whl

version https://git-lfs.github.com/spec/v1
oid sha256:2550c17c9c295bd02b2528595506a83684e730fdd2bb885930f0099b9fbc9797
size 138981168

JGKK · January 23, 2021, 6:48pm

I will have a look at it. I will have to find some time to install everything though.
I dont have a windows or linux x86 here.
Ill have to install python and tensorflow on my mac.
Do you know if it would run on pine rockpro64 or a raspberry pi 4? Because those are the development machines i have set up.

rolyan_trauts · January 23, 2021, 8:43pm

Don’t do training on a SBC full stop unless you have days to spare.

You may have to compile as with 2.4 on a I5 3570 the default wheels fails as I don’t have cutting edge AVX-512.
Also the python and GCC compiler can often be different.

The compile is actually really easy and for me the only confusing part of the tensorflow info was.

“bazel build [–config=option] //tensorflow/tools/pip_package:build_pip_package” as "bazel build "–config=opt //tensorflow/tools/pip_package:build_pip_package was all that seemed to be needed for cpu base.
The compile is painful though like several hours set it up as a job whilst you sleep as if it fails at least its not stolen your computer for that time.

gpu is a doddle just the same have the cuda and cudnn stuff preinstalled and its “bazel build --config=cuda --config=opt //tensorflow/tools/pip_package:build_pip_package” instead

Bazel is relatively easy to install or you can just download the lastest balisk and create a symlink of bazel

Pytorch has annoyed me though as the strong input for torchaudio enforces non gpu mode of intel_mkl math libs or gpu math libs of nvidia and currently on arm the standard libs of openblas & 3wfft and likes have been fired off into the either for hardware specific libs!
I would really be having a good look at the Nvidia Nemo framework but should of known better with Nvidia even if its supposedly stamped as ‘opensource’

rolyan_trauts · January 24, 2021, 12:24am

The scripts here are pyhtonically atrocious but it was just me having a look at some of the additions of tensorflow 2.4

I noticed that tf.signal now has internal audio math so either use the collab or my python adaptation.

Which is this

github.com

StuartIanNaylor/simple_audio_tensorflow/blob/main/simple_audio.py

import os
import pathlib

#import matplotlib.pyplot as plt
import numpy as np
#import seaborn as sns
import tensorflow as tf
import time

from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras import models
#from IPython import display


# Set seed for experiment reproducibility
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)
time_start=time.perf_counter()

This file has been truncated. show original

Also they have added mfcc to that math

So I just hacked that in

github.com

StuartIanNaylor/simple_audio_tensorflow/blob/main/simple_audio2_mfcc_frame_length1024_frame_step512.py

import os
import pathlib

#import matplotlib.pyplot as plt
import numpy as np
#import seaborn as sns
import tensorflow as tf
import time

from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras import models
#from IPython import display


# Set seed for experiment reproducibility
seed = 42
tf.random.set_seed(seed)
np.random.seed(seed)
time_start = time.perf_counter()

This file has been truncated. show original

Really fast simple and ultimately useless models but extremely good to assess things due to build speed and if you can place all dataset into labels of there own id with sufficient samples as then you can check how much cross entropy they have.

Where running through you dataset samples on the model you train and deleting the low dross start < 0.1 delete retrain. Delete < 0.4 or 0.3 retrain and you will find an accuracy increase of 3-4%

sample_file = data_dir/'go/3d53244b_nohash_1.wav'

sample_ds = preprocess_dataset([str(sample_file)])

for spectrogram, label in sample_ds.batch(1):
  prediction = model(spectrogram)
  print(f'Predictions for "{commands[label[0]]}"')
  print(commands, tf.nn.softmax(prediction[0]))

Loads up a single train wav and runs inference on the model and the softmax for all labels is shown and hence cross entropy is checked.
Your just using a simple an quick to train model to test a dataset before submitted to the format and train of a desired model.
I often use ‘go’ as if things are going wrong its often a quick canary due to its simularity to ‘no’ and it shows.

MFCC just complex the non timeline axis of the model to 13 and greatly reduce the resultant model parameters.
The image is resolution is timeline x 13 so the remainder is focussed on and gives an accuracy boost over spectrogram of a couple of % but its biggest advantage it does this whilst compressing the model.
The high frequencies are just thrown away as to the ear they are extremely low energy and no use in general for recognition.

The reality is though we should be expecting anyone to do any of this S*** or even have a care in the world as from datasets checks to noise addition this should be all part of automated tools which we have a complete lack of any.
Once more the Linto HMG is the best click and view model generator I know but from datasets to tools to KWS rhasspy either omits or what is available is lack lustre and its not because of lack of mention?

rolyan_trauts · January 25, 2021, 1:43am

@JGKK PS out of interest I thought I would try the full tensorflow on the Pi4-2gb I have and yeah its 300% slower than my I5-3570 but actually its a lot faster than I thought.

https://github.com/bitsy-ai/tensorflow-arm-bin as the downloads on the TensorFlow site are old and also I think they have posted the armv6l one twice

If you just want to fire off a train and forget actually yeah you could train on a Pi 4.

Test set accuracy: 89%
Predictions for "no"
['up' 'down' 'stop' 'no' 'go' 'left' 'right' 'yes'] tf.Tensor(
[1.4503686e-04 4.8033642e-03 2.9509642e-05 9.7052008e-01 2.4486762e-02
 2.4404756e-06 6.0085756e-07 1.2100643e-05], shape=(8,), dtype=float32)
Predictions for "right"
['up' 'down' 'stop' 'no' 'go' 'left' 'right' 'yes'] tf.Tensor(
[1.1338150e-14 1.5254406e-15 9.5040550e-19 1.6817958e-17 5.2986136e-17
 9.5589510e-09 1.0000000e+00 2.3156420e-16], shape=(8,), dtype=float32)
Predictions for "left"
['up' 'down' 'stop' 'no' 'go' 'left' 'right' 'yes'] tf.Tensor(
[9.3506897e-06 2.2647366e-06 2.0204541e-05 4.9854199e-05 2.3224919e-07
 9.9054682e-01 1.4057276e-05 9.3572428e-03], shape=(8,), dtype=float32)
Predictions for "go"
['up' 'down' 'stop' 'no' 'go' 'left' 'right' 'yes'] tf.Tensor(
[6.7035542e-09 1.8232617e-04 9.5976709e-04 9.9639314e-05 9.9875832e-01
 2.8040101e-12 6.1784431e-09 5.8221472e-10], shape=(8,), dtype=float32)
Run time 841.416019846045

14 mins default pi4-2gb no OC

python3 simple_audio_mfcc_frame_length1024_frame_step512.py