Collecting positive wakewords on non-raven systems

Yeah 1 channel is fine its the AGC (ALC) that my memory can not remember as that could help much.
As you may have little to no signal at distance.

That complicates things as when you use the card directly it will be blocked for all other applications. I will set up a asound.conf with dsnoop and the dsnoop as default.
You can give it a try.
I’m not a big fan of docker and have no understanding of how it interacts with the hosts audio. I only know that I ve met a lot of people struggling with it :see_no_evil:

1 Like

Docker isn’t that bad its docker syndrome that gets us all that host & container are completely separate instances and either it has to be shared via the docker run or set up internally.
Again if KWS was seperate it would just work via a network connection :slight_smile:

I have no understanding how it interacts either but I tend to break linux systems regularly, like every few hours, when I experiment with stuff, I am just not compatible with it, so I found that using docker to capsulate my applications helps me keep the host running, I just break the docker container instead and can fix that by starting from scratch.

My containers all run as network:host and quite a few of them run as priviledged. i don’t care much about capsulizing for the sake of capsulizing, I just like docker containers better to separate my stuff than venvs

I’ll look at that too tomorrow. I agree that unfortunately the 4 mic really is probably the worst choice from respeaker. It gives no better if not worse performance than the 2mic and doesn’t have an audio output. Only that nice led ring but that’s really it.

Yep and not having a go but so that others might read before getting one that the 4 mic apart from the pixel ring is a whole lot of pointless.

My strategy here is to just buy more sd cards and make clones/backups of stable starting points so I can just pull the card if things go broken and reflash a previous image. That’s what I like about raspberry’s and co.

Well, I did not know any of that when buying it, I use an usb sound card with a small speaker my father built for me (I suck at soldering and stuff) and it works. The speaker quality is somewhat bad, but since it is only supposed to output tts I don’t mind.

If I had to get a mic today i would ask my father to solder me a small mic to something that can use it and plug that in somewhere. Another reason I like the respeaker4 is the 2 i2c ports it has, lets me connect my 433mhz chips to it even thought the pins are all blocked by the mic, not a argument to buy one but a nice to have since I already use that feature.

2 Likes

That would work, but once I have to set the same thing up again I have a problem. Quite a few of the dockers i was running on my pi are now on a family pi because my father wants to use them, too. So the copy paste of working dockerfiles and established data and configurations helps

@Daenara Got one myself and the same, still confused why Mycroft and others recommended them.
Same with the P3Eye as really the opposite is true as unless you have inbuilt DSP just get a single mic that also has sound out on the same card.
All are pretty much similar (2mic to usb card)

The new raspberry 2mic looks pretty nice a tad expensive and still not in stock anywhere.
But you can just stick a I2S mems on the gpio if you wanted to there are a lot of options that depend on if you want to use AEC or not, that all produce relatively similar results and more mics = better is a fallacy unless you have the DSP to use them.

PS I struggled finding good unidirectional mics and ended up buying x25 where I will flog and post on ebay at the cost of postage added as not every might want x25 :slight_smile:

The raspberry pi 2 mic also has line in or the best cheapest soundcard was this one by far.

https://www.scan.co.uk/products/enermax-ap001e-dreambass-usb-soundcard-plus-earphones-genie-with-integrated-80-hz-plus6-db-bass-boos

Even if such an odd shape definately the only card that can record 24bit S24_3LE I know of at that price and electrets just wire direct whilst also being stereo in.

I still mention 2x mic as if unidirectional use one on 2x instances of KWS at different angles to get a form of budget beamforming.

I kinda solved my own problem here. It is not perfect in any way and I sometimes have to adjust timings because sometimes it takes longer before I get the hotword detected message but here is a python script that buffers 10 seconds of audio streamed over mqtt and saves the last second and a half or so as a wav file.

For this to work all the audio has to be streamed over mqtt, as soon as udp streaming is set up it won’t work. It should work with multiple satellites but since I only have my all-in-one system I have not tested that.

import datetime
import json
import os
from io import BytesIO

from paho.mqtt.client import Client
from pydub import AudioSegment

audio_file = {}
frame_rate = 16000


def audio_callback(client, userdata, message):
    global audio_file, frame_rate
    topic_parts = message.topic.split("/")
    if message.topic == "hermes/hotword/computer/detected":
        wake_word = topic_parts[2]
        site_id = json.loads(message.payload)["siteId"]
        filename = wake_word + "_" + str(datetime.datetime.now().timestamp()) + ".wav"
        folder = wake_word + "/" + site_id
        os.makedirs(folder, exist_ok=True)
        file_path = folder + "/" + filename
        audio_file[site_id] = audio_file[site_id].get_sample_slice(max(int(audio_file[site_id].frame_count()) - frame_rate - 3000, 0), int(audio_file[site_id].frame_count()) - 1000)
        audio_file[site_id].export(file_path, format="wav")
        print(f"Recorded audio: {file_path}".format(file_path=file_path))
        audio_file[site_id] = audio_file[site_id].empty()
    elif topic_parts[3] == "audioFrame":
        site_id = topic_parts[2]
        if site_id not in audio_file:
            audio_file[site_id] = AudioSegment.empty()
        audio_file[site_id] += AudioSegment.from_wav(BytesIO(message.payload))
        if audio_file[site_id].frame_count() > (frame_rate * 10):
            audio_file[site_id] = audio_file[site_id].get_sample_slice(max(int(audio_file[site_id].frame_count()) - frame_rate * 2, 0), int(audio_file[site_id].frame_count()))


mqtt_client = Client("audio_capture")
mqtt_client.username_pw_set("mosquitto", "mosquitto")
mqtt_client.on_message = audio_callback
mqtt_client.connect("192.168.0.4", 1883)
mqtt_client.subscribe("hermes/audioServer/+/audioFrame")
mqtt_client.subscribe("hermes/hotword/+/detected")
mqtt_client.loop_forever()
1 Like

Sorry i forgot all about it :see_no_evil: Been busy doing a lot of other things

Not a problem. I got a solution. Right now I just run the script on my pc if I am doing something there, but I can record what triggered my pi. Sadly the timings vary way to much, I had to adjust to nearly 2 second recordings so the wakeword isn’t copped in half.

I would also maybe try and find out why you are getting false positives as generally a GRU is pretty accurate if trained correctly.
I would check your dataset by running inference of your KW dataset against KW and also !KW against !KW.

You might have some dross in there that needs pruning, but not sure why Precise seems to return so many false positives or why there isn’t a weight to favour false negatives over false positives.

I have a different take on how you should handle noise and that labels should be balanced to get optimal results.
Just adding everything and anything to !kw will just make the model more gaussian and likely have diminishing returns as more false positives could be created.

Just blindly mixing noise and dataset is a bad idea as you have no idea or control over levels.
Adding anything and everything to !KW creates an imbalance in samples between the 2 labels of KW & !KW which should be kept similar in qty.
I suggest normalising your dataset and creating noise files a 5, 10, 15 & 20dB below your dataset level.
Duplicate and split the dataset by mixing 25% of the dataset at the x4 dB levels.
Try to mix good KW with higher noise and lesser KW based on inference of your trained model to lower noise.
Make a final inference run and any really low inference results for its label just remove from the dataset and retrain.

1 Like

I don’t mix noise in and I have a pretty good idea why I have false positives. I did record my data with my respeaker4 while it had much background noise (the mic directly, not the room, it is annoying that way) and a few days later it stopped with the random noise and it became more clear. Since it changes between those two modes basically on random, I need it to respond to both.

Also most data I used to train it was me talking with one person or me watching streams on my pc, so if I have other situations like me being on my phone with speaker or me talking louder because I have to talk over ppl my model doesn’t know what to do. I can’t record those in one long recording because the ppl I tend to talk to don’t want to be recorded on principle so I have to do small snippets when it responds and hope no one that objects is on the recording.

I think we need to find out why your 4 mic is adding random noise.
Where you not going to provide a /etc/asound.conf so we could have a look see?

I don’t rate the respeaker 4mic anyway but if I had one like yours that if not configured wrong is actually defective then it would be in the bin as its essentially useless.

I still use the default config from respeaker which I posted. And as far as I know it is a driver issue. I found one problem that has been fixed which reduced the problem by far but it still acts up between no noise at all and quiet background noise. The overwhelming noise only happens rarely and never on channel 1. It is not only my mic, many ppl have had this problem. Those driver just don’t work well.

Yep keep your Pixel ring but just don’t use that audio and get a USB soundcard or ditch the pixel ring and get a 2mic.

I have a 4mic and 4 mic linear and never noticed the same just wasn’t very impressed and that without audio out on the same clock AEC was not possible so the card was useless to me.