How to save the last spoken command to an audio file and access it?

fenyce_m · June 22, 2020, 10:20pm

Hello,

I would like to be able to save a recorded version of a spoken command to a file and be able to access it. I noticed I can play the last spoken command by clicking the play button in the web interface

Is there a way of saving that to a file and access it? Is Rhasspy saving that already anywhere? I don’t mind changing a script somewhere but I cannot find where the audio is acquired in the first place.

I am using Rhasspy 2.5.0 on a Raspberry PI 4 with the latest Raspi OS lite and I am triggering the recognition remotely using the HTTP API api/start-recording and api/stop-recording

Any help would be greatly appreciated!

DanielW · June 23, 2020, 6:08pm

When you look at https://github.com/rhasspy/rhasspy-server-hermes/blob/master/rhasspyserver_hermes/main.py

You can see in line 1527 that it plays the last captured audio from memory. You could change that script and add a new endpoint to download the audio file.

But I think you can also get the audio through MQTT (not sure).

synesthesiam · June 23, 2020, 8:58pm

I’ll be extending the /api/play-recording endpoint to handle this soon. Right now, you POST to it to get the last recorded command to play. With the next version, a GET will pull the WAV data.

Another option, as @DanielW suggested, would be to listen for the rhasspy/asr/<siteId>/<sessionId>/audioCaptured MQTT message.

fenyce_m · June 23, 2020, 10:46pm

@DanielW thank you so much for the pointer, that’s the section of the code I was looking for.

@synesthesiam that sounds great! It would be exatcly what I need! Right before you posted I was thinking of doing something along these lines (please forgive my terrible coding skills):

@app.route("/api/save-recording", methods=["GET"])
# here we want to get the value of filename (i.e. ?filename=_MyFileName.wav)
async def api_save_recording():
    assert core is not None
    _filename = str(request.args.get("filename"))

    if core.last_audio_captured:
        wav_bytes = core.last_audio_captured.wav_bytes
        _LOGGER.debug("Saving audio file, size: %s byte(s)", len(wav_bytes))
        await buffer_to_wav_file(os.path.join(/home/pi,_filename),wav_bytes)

    return "OK"

with adding something like this to utils.py

def buffer_to_wav_file(filename,wav_buffer):
        wf = wave.open(filename, mode="wb")
        wf.setframerate(16000)
        wf.setsampwidth(2)
        wf.setnchannels(1)
        wf.writeframes(wav_buffer)
        wf.close()

Possibly not the most elegant solution… but do you think it could work?