I am not all that sure if using TTS is a good way to train a model in that the human dataset that created the TTS to synthetic output for a model to collect human voice might seem self evident where that might be going wrong.
I don’t understand why either as its not for a lack of datasets.
Accuracy of a model is all about accuracy of dataset with your own voice being must accurate then you have regional and dialects then gender then your overall language.
I have been manipulating purely the male northern English dataset.
Used Deepspeech to get start and end point of words in sentences and I am going to use Sox to strip and pad into 1 sec clips and normalise.
Why they are going the way they are going is a total mystery and why they are training with TTS is even a bigger one, to what is needed for accuracy.
But it will be the most accurate TTS KWS available!? The accuracy for TTS hotwords of various pitch will be extremely high.
It is probably good to give you a starter set for a custom keyword but that is really all.
Precise is pretty clunky and not all that accurate as think its phonetic based but interested in the Linto tensorflowlite version.
Also they might of created an image and fixed some of what I hate about the seedvoicecard drivers but they have an image with drivers installed.
As I really like that card but the drivers kill it for me, but going to check that image as actually for KWS its really interesting.