Mycroft Precise - Installation and Use

Thanks for the hint! Using pip3 did not solve the problem.

sudo apt install python3 -pip
pip3 install mycroft-precise

Sorry, the only help I can offer is now to support you in trying the virtual env instead of the docker.

However, I still don’t believe that the runner.py is nowhere in the docker.

Did you try sudo docker run -ti synesthesiam/rhasspy-server:latest bash and then search there? And also the pip3 stuff needs to run in there if you don’t find precise’ runner.py there.

No preference here. However, I might try reaching out to mycroft so we can at least share training data.

I kind of liked what was existing on snowboy, especially the option to send a link to your friends to ask for help. Setting something up like this would be great. I hope th ehosting doesn’t get to overwhelming here. I have another unused 2TB instance running in canada if we need more space (and you have a very simple way for me to set something up).

That sounds really nice - I really enjoyed switching my own system to MaryTTS - sounds so much better! So, if this doesn’t demand too much admin time or some others here in the community volunteer and help, it would be an amazing very relevant resource.

Yes, I saw your references - however, I found Rhasspy via an entry in some node red forum even if I studied the mycroft progress on a standalone server quite intensely. However, they seem to use deepspeech and I couldn’t get decent performance out of it here, so I was looking at Julius, and then kaldi in Rhasspy.

I will carefully ask in the forum and gauge the interest.

1 Like

Dear @ulno yes, i have done everything inside the docker container (search, attempt to install via pip3). Now I will practice my patience and wait until Rhasspy 2.5 runs well enough to test precise.

@j3mu5 1) How long must or must not be the wav files?
2) Can I let it record 24h hours of ambient “noise” from the mics and then use that as data/random?
3) The Mycroft guide says “not wakewords”. Can a gigantic wav file be used for this as well? Or maybe an automated wave file chopper?
4) Is it possible to combine more than one wake word at the same time? For instance: “hey computer” and “hey silly” with both of them activating at the same time?

@voice

  1. The length is not important. It can be anything from short clips with white noise to hours of audio from TV shows or your own recordings. During the editing process the complete file is processed and only those parts are used for training to which Precise reacts false positive.
  2. As long as you do not speak any wakewords in the time: Absolutely! I think that normal everyday sounds are just right.
  3. A gigantic file works without problems. Diversity (different files of different origin) is certainly helpful. After a longer training time I added white and pink and other noise and got some false positives.
  4. I did not try that. Maybe it works, maybe not.

Thanks. But according to this it seems that data/random is the same as not wake words? Are they different?

I believe data/random is intended to be a shared directory of background noise, whereas the “not wake words” are supposed to be things close to the wake word (like “hey microsoft” instead of “hey mycroft”). I’m not sure if the training infrastructure treats them any differently, though.

As @synesthesiam says, one can put wrongly positively recognized words in not-wake-word.

When you train Precise, a subfolder called generated is created in not-wake-word. In this folder are the audio snippets of the files from data/random with background noises, to which a false positive reaction was made. So I followed the progress of the training.

Now I only have to get Precise running on my Raspberry. :see_no_evil:

It runs here on 2.5 on a pi 3 with hey mycroft as wakeword.
I the venv, everything worked out of the box but the fix for the runner and phonetisaurus (had to install that manually from the downloaded package).
Not perfect yet as I don’t manage to run rhasspy 2.5 wiht systemd - somehow supervisord and systemd seem to not like each other, but might open another thread for this issue.

rhasspy-supervisor currently outputs configurations for supervisord and docker-compose. Maybe we should add a systemd output as well?

1 Like

I’m giving a try to precise with 2.5.0 but it does not work so far due to the following error with trying to load the engine:

2020-06-10 16:35:31,468 INFO success: wake_word entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[DEBUG:2020-06-10 16:35:43,510] rhasspywake_precise_hermes: Namespace(debug=True, engine=None, host='192.168.2.165', log_format='[%(levelname)s:%(asctime)s] %(name)s: %(message)s', log_predictions=False, model='okj.pb', model_dir=['/profiles/fr/precise'], password=None, port=12183, sensitivity=0.5, site_id=['entree', 'entree'], tls=False, tls_ca_certs=None, tls_cert_reqs='CERT_REQUIRED', tls_certfile=None, tls_ciphers=None, tls_keyfile=None, tls_version=None, trigger_level=3, udp_audio=[['localhost', '12345', 'entree']], username=None, wakeword_id='')
[DEBUG:2020-06-10 16:35:43,556] rhasspywake_precise_hermes: Using engine at /usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/precise-engine/precise-engine
[DEBUG:2020-06-10 16:35:43,586] asyncio: Using selector: EpollSelector
[DEBUG:2020-06-10 16:35:43,632] rhasspywake_precise_hermes: ['/usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/precise-engine/precise-engine', '/profiles/fr/precise/okj.pb', '2048']
[DEBUG:2020-06-10 16:35:43,676] rhasspywake_precise_hermes: Listening for audio on UDP localhost:12345
Traceback (most recent call last):
  File "/usr/lib/rhasspy/bin/rhasspy-wake-precise-hermes", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/__main__.py", line 125, in main
    hermes.load_engine()
  File "/usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/__init__.py", line 147, in load_engine
    engine_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE
  File "/usr/local/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/local/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/precise-engine/precise-engine': '/usr/lib/rhasspy/lib/python3.7/site-packages/rhasspywake_precise_hermes/precise-engine/precise-engine'
2020-06-10 16:35:44,470 INFO exited: wake_word (exit status 1; not expected)

Is there something missing in the Docker image?

Maybe this is because Pi-zero is not supported by Precise? I’ll try later on PI3B+

Precise works now (rhasspy v2.5) in the docker install on a pi4. Which is great as it is the best option for custom wakewords.

Finally I tried precise (with 2.5.0) on my PI3B+… but the CPU load goes to 95+% :frowning:

@j3mu5 did you move to 2.5.0+precise? Please could you share your experience? For the time being, I’ll stick to snowboy…

I have recently made use of this guide and I noticed a few problems with it.

First of all, mycroft-precise needs python<3.8. Trying to use it with python3.8 is futile atm. The official version does not run because the dependencies can’t be resolved and the pull request for a newer version of tensorflow has only half of the scripts even running, and those that run don’t work properly. I wasted 12 hours trying to get a working model and if I used the precise-test script it still only got less than 50% of my wakewords correct.

I did use wsl to begin with, but since I can’t get mic input there, I decided to go the virtual machine way. I tried Hyper-V first, which works pretty well with either a debian or an ubuntu 18.04 but getting sound into there is a torture. I ended up getting pulseaudio to work, but I could not get alsa to work and after a day of trying I decided to switch. Oracles VirtualBox was next, getting audio to work was easy, but I didn’t get tensorflow to run at all. tensorflow is compiled to use avx instructions and I did not manage to get those passed through. Documentation is lacking for VirtualBox, the noly thing I could find was reported to work in beta3 of version 5 but using those instructions did not work for me. In a last ditch effort I used VMware Player and I actually got it working there.

Next was the choice of operating system. I personally prefer debian but after I got it running on ubuntu with minimal effort in quite a few experiments I decided to stick with that for the moment. It has to be ubuntu 18.04 for ease of use since that comes with python 3.7, ubuntu 20.04 comes with python 3.8 and I was not motivated to try a downgrade there.

The following is an updated/corrected version of this guide using ubuntu 18.04. It can be used for other distributions as well but I did not check if all the dependencies were there. i also removed quite a few of not needed sudos, as well as most, if not all mentions of copying files between wsl and the linux since I used a vm with the guest extension installed so I can just drag and drop files and use a shared folder to copy the files.

The only things I needed to install on ubuntu to get everything working as described in this post were git and ffmpeg.

sudo apt-get install git ffmpeg

After that, I could jump to the next part of the guide.

Originally. this is where ./setup.sh was called, but doing that results in quite a few errors because one of the dependencies updated and produces quite a few errors. All the errors produced by this are “Str object has no property decode” errors and it can be fixed by going through all the places it complains about and removing the .decode("utf-8") part. But it can be prevented by editing setup.py before installing mycroft precise.

In the setup.py look for the requirements. Change the line containing h5py to h5py<3.0.0 and everything is fine again.

On with the guide:

In the next part I changed the path of the convert files, since it does not work from the path. I personally created the two files in the file manager and copied the content into it with the gui texteditor ubuntu came with, but nano works fine.

I also made the files executable so I could call them with ./convert_mp3.sh and ./convert_wav.sh but that is optional since it works otherwise.

Next move the files to be converted into the created folders by whatever means you like. I used a shared folder with my vm to copy stuff over.

In the next part I differentiated between not-wakewords and stuff for data/random. I threw everything into data/random that was random background noises, long audio files and so on, but small utterances that were not the wakeword landed in WAKEWORD_NAME/not-wake-word. I specifically recorded words that sounded similar to my wakeword, as well as random words that just came to mind.

If not-wakewords were converted:

mv to_be_converted/converted/ WAKEWORD_NAME/not-wake-word/
rm to_be_converted/convert_me/*.*

mv to_be_converted/converted/ WAKEWORD_NAME/test/not-wake-word/
rm to_be_converted/convert_me/*.*

Of course, having access to a full desktop environment means that all this copying around can also be done graphical. I also found it helpful to name my files properly. I included which microphone it was recorded with, what it contains. Wakewords have my wakeword name in the file name, not-wakewords have not-wakeword in the file name (and I plan on adding exactly what was said in there as well), things i recorded for data/random have an approximate of the situation I recorded in the name and so on. This is important to figure out exactly what did not work during testing to know in which direction the model needs to be trained further. When training with data/random the automatically generated files are also named after the original audio file it came from, so having those named properly helps much.

With a decent PC precise-train runs very fast so I normally just start it with -e 1000 instead of 100. This script can also be terminated at any point so having a high number and terminating once the accuracy value is close to 1 is also workable.

This command can also be used as
precise-train-incremental WAKEWORD_NAME.net WAKEWORD_NAME/ -r data/random/SUBFOLDER
which I have been using to train with different data sources one at a time. I am not sure if the first command works if there are subfolders to train against everything but after my first try of having a few hundreds of files in data/random I decided to sort it into folders depending on the data.

In my tests I did not see the need to train immediately after precise-train-incremental but it can’t hurt to do it because it copies parts from data/random into the wake-word and not-wake-word directories and i am not sure if it also retrains the model with what if finds right then. It does not hurt to run a test in between, I personally run a test after the first training, the incremental training and so on.

Having access to a microphone in a vm, I added another step to the testing:
precise-listen WAKEWORD_NAME.net

I use this to see if my model still goes off on random noises, words and so on. I keep a list of words that still set it off and then record them with the mic my rhasspy uses (and eventually also with my headset, just so I have different qualities of mics in hopes of making the model more universal for me) and put some of them into not-wakeword and test/not-wakeword after conversion. I record those words like I record wakewords, one word per file with a bit of silence (or in case of my mic, random noise) before and after the word.

If the model is set off by random noises I try to make a long recording of the situation and put it into data/random for incremental training.

There is a tool named precise-collect that helps with recording but only with the mic you have access to on your pc. For the recordings with the mic my rhasspy uses I just record everything with arecord and manually create the small files with audacity.

Once the model performs decently, it is time to export it:

2 Likes

This :ok_hand:t2::ok_hand:t2::ok_hand:t2: There is an issue open for this. I was not updating my precise source install because of this.

I mostly save the result achieved after the incremental training and than copy the files saved to test/not-wake-word/generated over to not-wake-word/generated and than do another training round. This mostly improves the model.

If you do that, then sure, the training helps but without mentioning that bit of information, I am not sure if the training is needed. Also, it might improve the model, but copying those files over makes the testing worse. For testing, one should use samples that aren’t used in training to verify that the learning process worked. If the model was trained on the samples in testing then the result of the test will be better than the model might be in reality. Tests should be similar to what the model is trained on, but not 100% the same.

Yes i fully agree it makes testing a bit worthless but in my real life experience it still gives the better model. I prefer to test using it :see_no_evil:
After training precise models for over a year now it turned out for me that more data to train on equals better in 90% of the cases. Although more epochs doesn’t equal better in all cases as ive over trained a few times.

edit one more thing:
I really recommend playing with the -th threshold parameter of both the train and train incremental command as it has a big impact on the sensitivity in noisy environments vs false positives that the final model will have and the -s sensitivity parameter for incremental training as this has a big impact on how retrainings are triggered.
I recommend to use shorter random audio files if you turn up the sensitivity parameter in incremental training.

More data to train is of course better, but if you are starting out and trying to get a working model together I think it is better to insure your tests accurately represent the models quality. If I notice something in the generated folder falling through a test, I try to find similar things for my data/random and not-wake-words and keep what fell through in the testing folder only. That way my model will get better over time while still being represented by somewhat accurate testing.

Once the data the model is trained on is at a point that it rarely triggers falsely, then training on what it puts into the test folder sounds more helpful than detrimental to me.

The reason behind that is pretty simple, I train on a system with a different mic than I use for rhasspy, so to perfectly test it, I have to convert the model and put it into rhasspy and see what still triggers it. So I depend on the test function with audio recorded with the rhasspy mic to figure out if it will work. The first model I trained was trained blind, I put most of my roughly 30 samples I got from raven into the wakeword folder, the rest went to test, put everything from raven that falsely triggered it into not-wake-word folder and trained on tons of random data. The resulting model got all my tests correct, but I only had 5 or so wakewords for testing and they were medium quality at best and once I deployed it, it did not react to anything at all.