Wakeword engine

Of course, use everything you need :+1:t2:

1 Like

OK, I’ve removed dtw-python as a dependency, so all that’s left is pythons-speech-features, which is MIT licensed. I used @fastjack’s Javascript code as a base for writing my own dynamic time warping calculation and template averaging.

I’ve added Raven to the Rhasspy web UI as an option in master, but no recording just yet. If you put WAV templates in {profile}/raven/ it should just pick them up. I created a “sensitivity” parameter in the web UI that expands/contracts the probability detection range around 0.5 (now that I’m using cosine distance).

Once I get the Docker images built and pushed, we should have a beta version of Raven out for community testing! For now, anyone building from source can give it a try.

7 Likes

@fastjack, I found accuracy in Raven to be significantly worse with my hand-rolled DTW (based on your code).

I thought dtw-python might be doing something magical under the hood, but it turns out (from their paper) that they just use a “symmetric2” step pattern by default. This doubles the distance added to the match/replacement cost computed during optimization:

distanceCostMatrix[rowIndex][columnIndex] =
  Math.min(
    cost + distanceCostMatrix[rowIndex - 1][columnIndex],          // Insertion
    cost + distanceCostMatrix[rowIndex][columnIndex - 1],          // Deletion
    (2 * cost) + distanceCostMatrix[rowIndex - 1][columnIndex - 1])      // Match

Maybe you’re already aware of this, but if not it gave me much better accuracy :slight_smile:

1 Like

Interesting! I’ll test that soon :+1:

1 Like

Just tested it and the accuracy significantly dropped with the 2 * cost for diagonal moves.

Weird that you’re seeing accuracy improving… Did you changed any other parameter?

Hmmm…the only other difference I can see is that I initialize 0, 0 in the cost matrix to distance(s[0], t[0]) whereas you set it to 0.

Maybe I’m accidentally doing something difference else where too. I’m adding Raven into the next Rhasspy release, so hopefully there will be some more eyes on the code :slight_smile:

1 Like

I do not get this part of your code:

            if (
                self.probability_threshold[0]
                < probability
                < self.probability_threshold[1]
            ):
                # Detection occured

Why is there a maximum probability? According to the probability formula I used, the probability of a template compared to itself (perfect DTW cost) is around 0.73.

The detection should not occur if the probability is between 0.45 and 0.55 but if it is above 0.5.

Maybe I miss something though…

I must have misinterpreted you above in the thread. You mentioned 0.45 and 0.55, and I thought you were saying that this was a range of good probabilities.

Maybe this is why I’ve been getting more false negatives…

Oh… I meant that if you define the threshold to like to 0.45 you’ll get more false positive and if you increase the threshold say to 0.55 you’ll get more false negatives. :wink:

The probability value goes from 0.1 (definitely not the template) to 0.73 (definitely the template).

:man_facepalming:

Thank you for the clarification. I’ll get that corrected before pushing the release.

Do you have an idea of how Snips mapped its “sensitivity” value onto the probability range? I’d prefer to expose a [0, 1] value like the other wake word services in the web UI.

No idea… I guess we could take the maximum possible value (~0.73) and the default threshold of 0.5, calculate the minimum value to be 0.73 - 0.5 = 0.27 and stretch that (0.27 - 0.73) to 0 - 1… That should work… Maybe invert it so higher sensitivity means lower threshold.

This also seems weird:

shift_sec: float = 0.05
        Seconds to shift overlapping window by

If this is indeed in seconds then the shift is 50ms which is bigger than the window size. 10ms should be a more appropriate value (0.01).

1 Like

Found this for an eventual C++ features extraction library for ARM architectures:

1 Like

The library has no License -> all rights are reserved
So when we want to use it we would have to ask them for the permission to do so first.

2 Likes

Thank you for taking the time to look over my code :slight_smile: Raven is getting better, bit by bit!

1 Like

Has the accuracy improved with the last changes (threshold and window shift)?
What does the CPU usage looks like on a Raspberry Pi for instance?

I didn’t see any changes in performance on my desktop. I’m setting up a fresh Pi Zero to test out next, though.

Thanks, @maxbachmann. Glad to have someone on the team who’s helping ensure Rhasspy doesn’t get caught up in licensing issues :slight_smile:

Well, the CPU usage on a Pi Zero spikes pretty high when it detects speech. And there’s about an 8 second delay between saying the wake word and it reporting the detection :frowning:

So for the Zero, we’ll definitely need a native code solution. I’ll test on a Pi 3 tomorrow, but I expect it to run well there.

Can you profile the code on a Pi Zero? That could help in deciding which parts to reimplement in native code.

I’m already using Raven on a Pi 3 here, it works well.

I did not profile it, but I would expect this: https://github.com/rhasspy/rhasspy-wake-raven/blob/master/rhasspywake_raven/dtw.py to be quite slow (at least thats my experience with implementing similar algorithms in python)

1 Like