Wakeword engine

KiboOst · July 17, 2020, 4:57pm

Of course, use everything you need

synesthesiam · July 17, 2020, 8:47pm

OK, I’ve removed dtw-python as a dependency, so all that’s left is pythons-speech-features, which is MIT licensed. I used @fastjack’s Javascript code as a base for writing my own dynamic time warping calculation and template averaging.

I’ve added Raven to the Rhasspy web UI as an option in master, but no recording just yet. If you put WAV templates in {profile}/raven/ it should just pick them up. I created a “sensitivity” parameter in the web UI that expands/contracts the probability detection range around 0.5 (now that I’m using cosine distance).

Once I get the Docker images built and pushed, we should have a beta version of Raven out for community testing! For now, anyone building from source can give it a try.

synesthesiam · July 22, 2020, 7:34pm

@fastjack, I found accuracy in Raven to be significantly worse with my hand-rolled DTW (based on your code).

I thought dtw-python might be doing something magical under the hood, but it turns out (from their paper) that they just use a “symmetric2” step pattern by default. This doubles the distance added to the match/replacement cost computed during optimization:

distanceCostMatrix[rowIndex][columnIndex] =
  Math.min(
    cost + distanceCostMatrix[rowIndex - 1][columnIndex],          // Insertion
    cost + distanceCostMatrix[rowIndex][columnIndex - 1],          // Deletion
    (2 * cost) + distanceCostMatrix[rowIndex - 1][columnIndex - 1])      // Match

Maybe you’re already aware of this, but if not it gave me much better accuracy

fastjack · July 22, 2020, 7:38pm

Interesting! I’ll test that soon

fastjack · July 22, 2020, 8:22pm

Just tested it and the accuracy significantly dropped with the 2 * cost for diagonal moves.

Weird that you’re seeing accuracy improving… Did you changed any other parameter?

synesthesiam · July 22, 2020, 8:41pm

Hmmm…the only other difference I can see is that I initialize 0, 0 in the cost matrix to distance(s[0], t[0]) whereas you set it to 0.

Maybe I’m accidentally doing something difference else where too. I’m adding Raven into the next Rhasspy release, so hopefully there will be some more eyes on the code

fastjack · July 22, 2020, 8:54pm

I do not get this part of your code:

            if (
                self.probability_threshold[0]
                < probability
                < self.probability_threshold[1]
            ):
                # Detection occured

Why is there a maximum probability? According to the probability formula I used, the probability of a template compared to itself (perfect DTW cost) is around 0.73.

The detection should not occur if the probability is between 0.45 and 0.55 but if it is above 0.5.

Maybe I miss something though…

synesthesiam · July 22, 2020, 9:00pm

I must have misinterpreted you above in the thread. You mentioned 0.45 and 0.55, and I thought you were saying that this was a range of good probabilities.

Maybe this is why I’ve been getting more false negatives…

fastjack · July 22, 2020, 9:04pm

Oh… I meant that if you define the threshold to like to 0.45 you’ll get more false positive and if you increase the threshold say to 0.55 you’ll get more false negatives.

The probability value goes from 0.1 (definitely not the template) to 0.73 (definitely the template).

synesthesiam · July 22, 2020, 9:10pm

Thank you for the clarification. I’ll get that corrected before pushing the release.

Do you have an idea of how Snips mapped its “sensitivity” value onto the probability range? I’d prefer to expose a [0, 1] value like the other wake word services in the web UI.

fastjack · July 22, 2020, 9:15pm

No idea… I guess we could take the maximum possible value (~0.73) and the default threshold of 0.5, calculate the minimum value to be 0.73 - 0.5 = 0.27 and stretch that (0.27 - 0.73) to 0 - 1… That should work… Maybe invert it so higher sensitivity means lower threshold.

fastjack · July 22, 2020, 9:36pm

This also seems weird:

shift_sec: float = 0.05
        Seconds to shift overlapping window by

If this is indeed in seconds then the shift is 50ms which is bigger than the window size. 10ms should be a more appropriate value (0.01).

fastjack · July 23, 2020, 12:29pm

Found this for an eventual C++ features extraction library for ARM architectures:

maxbachmann · July 23, 2020, 1:12pm

The library has no License -> all rights are reserved
So when we want to use it we would have to ask them for the permission to do so first.

synesthesiam · July 23, 2020, 3:42pm

Thank you for taking the time to look over my code Raven is getting better, bit by bit!

fastjack · July 23, 2020, 3:58pm

Has the accuracy improved with the last changes (threshold and window shift)?
What does the CPU usage looks like on a Raspberry Pi for instance?

synesthesiam · July 23, 2020, 6:03pm

I didn’t see any changes in performance on my desktop. I’m setting up a fresh Pi Zero to test out next, though.

Thanks, @maxbachmann. Glad to have someone on the team who’s helping ensure Rhasspy doesn’t get caught up in licensing issues

synesthesiam · July 23, 2020, 9:07pm

Well, the CPU usage on a Pi Zero spikes pretty high when it detects speech. And there’s about an 8 second delay between saying the wake word and it reporting the detection

So for the Zero, we’ll definitely need a native code solution. I’ll test on a Pi 3 tomorrow, but I expect it to run well there.

koan · July 24, 2020, 6:07am

Can you profile the code on a Pi Zero? That could help in deciding which parts to reimplement in native code.

I’m already using Raven on a Pi 3 here, it works well.

maxbachmann · July 24, 2020, 7:20am

I did not profile it, but I would expect this: https://github.com/rhasspy/rhasspy-wake-raven/blob/master/rhasspywake_raven/dtw.py to be quite slow (at least thats my experience with implementing similar algorithms in python)