I have now already tried to tap the audio stream from mqtt so I can build my own central wake word system. But somehow I might be a bit stupid.
I use the topic "“hermes/audioServer/satelliteID/audioFrame”, but all I get is an annoying cracking. I have no idea how to get this converted properly now. Does anyone have a little code snippet for me? First of all it would be enough for me to write the whole thing into an audio file.
Each of those frames contains a wav header. So when you want to sticht them together you would need to remove the header and then place a new one.
Should be PCM 16Bit audio, Mono with 16kHz.
This can get you started, the script records 4 seconds so you need to adjust it to your own situation
Thank you, that is the hint! The convert I did’t know now either specifically.
To finish some troubles from other people (in case I forget to make it public on GitHub when I’m done)
Here is the code:
import paho.mqtt.client as mqtt import pvporcupine import wave import struct ## TODO: # - handle to multiple side ids # - add a callback function for each side id # - do some funny stuff and play a little # - clean up the code iter=0 # porcupine configuration (adjust to your model paths accordingly) ACCESS_KEY = "Your Key" porcupine = pvporcupine.create( access_key= ACCESS_KEY, keywords=['picovoice', 'bumblebee'], model_path='./porcupine_params_de.pv', sensitivities=[0.45, 0.45] ) # needed length of audio frame frame_length = porcupine.frame_length # Callback-Funktion, die aufgerufen wird, wenn eine neue Nachricht empfangen wird def on_message(client, userdata, msg): process_audio(msg.payload[44:]) def save_audio_block(audio_block, filename="new_audio_block"): # write audio block to wav file # iterate name if file exists global iter iter += 1 filename = filename + str(iter) + ".wav" with wave.open(filename, "wb") as wf: wf.setnchannels(1) # Mono (1 Kanal) wf.setsampwidth(2) # 2 Bytes pro Sample (16-bit) wf.setframerate(16000) # sample rate wf.writeframes(audio_block) def process_audio(audio_frame): # loop through audio_frame in chunks of frame_length for i in range(0, len(audio_frame), frame_length*2): # get the current chunk chunk = audio_frame[i:i + frame_length*2] # convert to pcm pcm = struct.unpack_from("h" * frame_length, chunk) #save_audio_block(chunk) # process audio frame with porcupine keyword_index = porcupine.process(pcm) if keyword_index == 0: # detected `porcupine` print('picovoice') elif keyword_index == 1: # detected `bumblebee` print ('Bumblebee') # MQTT-Broker-Settings mqtt_broker_host = "192.168.178.58" mqtt_broker_port = 1883 # MQTT-username und -password mqtt_username = "user" mqtt_password = "password" mqtt_topic = "hermes/audioServer/YOURSIDEID/audioFrame" # create client client = mqtt.Client() client.on_message = on_message client.connect(mqtt_broker_host, mqtt_broker_port) # subscribe to MQTT topic client.subscribe(mqtt_topic) # wait for messages client.loop_forever()
Thanks for your replay, while i was anwsering
Because i see you send the data to porcupine there is another thing i want to highlight:
Porcupine has some restrictions, at least on Android - very possible this is also the case on other platforms.
For Android it only works with 16Bit Mono 16kHz audio with chunks of 512 Bytes.
It seems thats the case in python, too. At least the 512 Bytes. Im not sure how to handle different streams from the satellites and if multiple instances cause trouble with the “3 free users”. So maybe i take a measure of min-buffer to detect a wake word an pass that buffered audio_frame to my loop so i can handle multiple audio-streams, without using multiple instances. Just a thought, what do you think about that idea?
I am pretty sure that won’t work because I think porcupine also uses older buffers to detect the wake word - because it could just be split into multiple buffers.
I’d ran your python script im multiple instances with different Api keys.
I’m not sure I understand you correctly. My idea was to buffer 1-2 seconds (depending on how much is needed at least) per siteid and then process these buffers in a loop in 512byte chunks, sort of shifting over the sitid-buffers. But I’ll try all this when I get back from vacation. Sure there is a simple solution for sure. But this way is a bit of an experiment.