Just saw this:
Not much time to read it all but I guess it will interest lot of people here
Just saw this:
Not much time to read it all but I guess it will interest lot of people here
One more wake word engine to forget:
All models generated with Picovoice Console expire after 30 days. To generate models with longer expiration dates, a distribution license is required.
Porcupine is the datum to head for and dev target to aim at. Its considerably lighter whilst being more accurate than anything else.
The porcupine offering of several predefined keyword offerings or a custom training with a distributed license is required as no-one is going to want to retrain every 30 days.
Having said that custom wake word need is a gimic as do you call a dog a cat or Paris, Quarkybeddingspot because Paris is that Biatch who broke your heart?
Nope a dog is a dog and Paris is Paris and both Google and Amazon have been extremely successful running without custom wake words.
I am sure there will be many having a tantrum that they wish to call their AI Pretty Petal or whatever but when you have clear defined projects such as Rhasspy its pretty obvious what the KW should contain, but often like the commercial guys a couple of alternative models are offered.
Google & Amazon make use of that limited choice because it enforces use in numbers and with their analytics as those numbers feedback and improve the model.
For small projects custom keywords can dilute numbers so much that data availability becomes too sparse for effective use.
Custom keywords are not necessary, can actually hinder a project, but hey lets pander to those who want to give the puppy a name.
I have heard what @synesthesiam is doing with TTS which looks like its something Picovoice do which makes great models for “News Reader” middle of the road perceived English on good TTS engines.
The simplest way with the smallest model achievable and very high accuracy is to train with your own voice.
A broadening of size would be to train with a dialect that you use which will give much higher accuracy than general models.
If its single gender use then gender voice training also makes big increases in accuracy.
The required wealth of data is likely never going to be harvested when a small project is diluted over many key words.
For the ‘Pretty Petals’ out there custom keywords maybe seen as a value feature and some others might not care and have more qualms about accuracy.
What is interesting with Picovoice is their validation dataset as if they used TTS then their accuracy results don’t count for jack with real voice especially with strong dialect representation.
It grates on me that regional dialect and gender difference are excluded from datasets whilst a middle of the road elite are happy that it works great for them.
That dialect, gender even age and especially ‘own voice’ are excluded from generalised datasets or datasets are available generalised where local harvesting will create periodic model retraining.
Even opt ins to provide not just the data to opensource source datasets but the meta-data that is crucial for tailored training accuracy.
If picovoice is offered with a choice of non expiring keywords then it really doesn’t matter about custom keywords unless your trying to sell for commercial use, which they then provide a commercial license for.
Custom training rather than blackbox models is very important and if we are going to be inclusive and provide dialect, gender and age specific or weighted datasets so all can enjoy maximum accuracy and small lightweight models.
Picovoice isn’t opensource its shipped as binaries and that is the only reason why for many its one to forget.
I for one will be really happy when some realize the voice .AI goldrush bubble has already burst and focus is purely on some really good opensource.
This was already the case, but accounts only for custom wakewords
Ah yes I was mixing with precise lol !!
Anyway custom wakeword are everything. Google and Alexa doesn’t offer it only for marketing reasons. Having millions people saying ok google make it a standard and sell it to everyone. With custom wakeword no one could know which assistant you use. Purely commercial.
Really a shame snowboy is ending, I have better results with it than with snips wakeword which have ever very good. Custom with each for comparison.
There is a new opensource KWS as I also find Precise a bit heavy and not that precise.
Has model generation utils and if @synesthesiam does his TTS KW dataset populator the model generation tools are already there.
Also utils to capture own word and bolster model creation.
I think it was all a rapid dev project approx 6 months ago and still has some rough edges but looks really promising.
Who and how someone would know what assistant you use in comparison to accuracy is totally inconsequential as who and what would know?
It wasn’t just for commercial reason they where picked for uniqueness of voice capture and sylable count and there data capture fed back into their models.
If Alexa can respond to ‘computer’ where is the commercial identity you mention.
Strangely Google have the non commercial ‘Hey Boo Boo’ where ever that came from!? ‘Yabba dabba do’!
I am waiting for @synesthesiam great new TTS dataset populator and going to augment with ‘own voice’ and dialect/gender extracted ASR dataset words and run through Linto-HMG as been playing with it and the results seem extremely good.
But waiting for some rough edges to also be smoothed but herd datasets from small communities are a much better way to go as we can share common opensource voice data for accuracy.
But if you wish to go the route of custom KW then you can.
Actually wakeword is what I miss for rhasspy. Still use Snips on my prod due to this. I though snowboy was the answer, but investing on EOL solution is not something I like.
I need pi3/0 wakeword service for three different custom wakewords (one per person here). Snips is working nice, snowboy near perfect, now what’s next …
The only model available is French and me being typically monolingual means I am waiting to build a model.
You can try out with HMG now but all the tools are not really available but sound like they could arrive pretty soon.
I have given HMG a whirl with the Goggle command dataset using ‘visual’ as a KW and it seems quite promising and also an eye opener to supposedly validated datasets as there are some poor entries in the google dataset that have effect.
Many of us have different perspectives and needs and I will be looking at very simple linto based sateliites probaby feeding a Rhasspy or Voice2Json server.
Its been the KWS that has been my hold up and the models for the KWS but hopefully that is soon to change.
Tensorflow-lite will run on the zero https://www.tensorflow.org/lite/guide/build_rpi but to be honest I think the Pi3A+ is a better option as then I can also run AEC (Echo cancellation) of playing media and allow barge in.
Its £10 more than a Pi0WH but the performance jump is approx 10x+ that of a zero so 10x for £10 is pretty damn cheap.
I’ve tested Picovoice Porcupine and the results are not that good.
The CPU usage is awesome and the accuracy is as good as advertised
The search for an easy to create and use wakeword continues…
I still use Snowboy, works best for me even with custom wakeword.
well i use snowboy too since the other hotwords are not as cool as “jarvis”
i’d like to have a “vikky” hotword ( like the AI in the ‘i, Robot’ movie)
You can still create a custom hotword for snowboy here: https://snowboy.kitt.ai/dashboard
I know but in 6 months this will be shutdown.
What will happen then?
Custom wakeword for children will expire as their voice change…
What is I want to add a new wake word later?
I do not think Snowboy is a viable solution as of today (maybe they will release their training stuff… one can hope…).
I do not know
But the custom wakeword I created (47 yrs old), also work for my kids (6 and 9)
It might not be a good solution in the long terms, but it still works today and for me better than porcupine.
I thought the Linto KWS would be right up your street it being French based is why I am waiting for a dataset populator with my Brit twang.
Not sure if the KW is just Linto on the only tflite model they supply and do you say ‘Hey’ or ’ ‘Coucou’.
Its been a strange couple of months since my curiosity perked with Mycroft as these projects have been going for such an extended time that it seems odd there are such obvious holes missing from the project.
Seems here if it has Hermes in the title then it will become a community project if not then its something for @synesthesiam to provide.
Irrespective of any KWS we are lite on dataset creation and collection tools and quite a few easy solutions exists but seem to garner little interest prob due to lacking the Hermes tag.
There is a huge wealth of data out there where word extraction from ASR data can provide extremely large KWS datasets that its possible to cherry pick the metadata and create extremely accurate weighted language datasets with at least region and gender. Age does exist to a much lesser extent.
That a huge amount of effort can be given to the completion of skills whilst the very fundamentals of KWS have obvious need is extremely indicative and a sad reflection of ‘community’ priority vs self.
I have been posting for a while now examples that tensorflow KWS using common tools such as Keras is actually quite easy even for the like of me.
Been posting consistently that the problem is a not really the lack of a KWS its the lack of KWS datasets.
Once more it seems to be left to @synesthesiam to hand out code like Jesus does bread.
I don’t code as a pretty good ex hacker in fact not even that level, more molester of code I was pretty reasonable but MS brain damage just makes it too frustrating to learn again.
There are some extremely competent coders here and I am sat bemused looking at some obvious needs and a lack of any input.
I have actually found that Linto more or less have an opensource package for almost every need I can think of, its only @synesthesiam and the TTS2Dataset that they miss.
Its all rather fresh and some rough around the edges but a large wealth of code is already available.
For simple voice data collection they have https://github.com/linto-ai/linto-desktoptools-voxharvest that prob needs some custom fields for meta-data where things like region, gender and age can be added.
Prob needs 2 modes of operation as word based KWS and sentence based ASR prefer slightly different datasets. Both need to be split into folders of along the lines of meta-data and KWS its also handy to create folders for that word.
Own voice can greatly increase accuracy of KWS or ASR and we even lack those simple tools.
That voice workload can also be greatly reduced by pitch shifting and noise addition where a single word recording can become many.
The above is just one example that has pressing need where a herd shared piece of opensource would be extremely beneficial to a number of projects without merely appropriating code, adding a smattering of custom code and branding as your own.
For the preference of word based KWS there are absolutely massive datasets available where it would be extremely beneficial to strip words from ASR datasets and organise and collate metadata.
Its also extremely likely KW can be created by concatenation of extracted words and massive volumes of non-KW could be collated.
Then once more pitch-shift and noise-addition can greatly multiple individual KW/non-KW collection.
I have already posted that Linto have a great KW model creation tool https://github.com/linto-ai/linto-desktoptools-hmg and once again prob could do with a few additions as wish once you have created a folder based dataset you could export that dataset json, for example.
I couldn’t give a damn if its Linto or not as I was searching and realizing much was needed and then found Linto already had a headstart. I am having to reuse as its not just creation its maintenance so existing projects have even more value.
Is it so painful that a repo doesn’t contain the word Hermes or that another Voice project might have much to share?
Why the search continues, why some solutions are not adopted now, why not collaborated on now for some of us is not such a mystery, but it is extremely frustrating and it negates generally from the project.
If you have @synesthesiam tts2dataset populator then that fills a last gap and once more I have tried to highlight the rest of the need already has solutions and just needs a little additional collaborative work.
By crikey its Hermus!
Your tts2dataset utility should be great but it limits itself by the technology used and creates a narrow dataset on that technology.
We should be able to share datasets and all we need is a central database that can store URLs of the datasets we supply.
It should have queryable metadata of Language, Region, Gender and Age that archive URLs can be submitted to.
I can get 15gb free on my Google drive and could go to work on stripping the ASR sentences of the likes of https://openslr.org/83/ and CommonVoice via the metadata into words.
I could create another account and sign up to a different service… Add more.
Each unique subset I create is an archive of root language folder with corresponding metadata subfolders, containing the word folders.
Some fields about source dataset readme.info ting.
An extremely small database and web app could link huge numbers of community supplied dataset archives that can be quickly returned by a metadata query of the key fields you select.
I can just submit my archives that just become part of large distributed dataset.
A login and feedback rating system should also be able to create a sort order.
Output could be even a simple cli text file of wget urls and that is it.
Then we all can start to submit KW datasets or do we just pray to Hermus?
Someone might even grab Julius and convert into Phenomes https://github.com/julius-speech/segmentation-kit same metadata and folder structure containing Phonemes rather than words and maybe that might even prompt an extra query field.
Who knows but its actually that easy.
I guess even
Good news: Upgrade porcupine to version 1.8
master. Will be in next version