Hi,
This looks exciting to me,cause i am not a great Programmer.
Is it now possible to easy combine any esp32 development Board like this… and a i2s Mic Array Board like this?
The software will run on every esp32, it is up to you to create a new device with implements the readAudio() at a bare minimum.
This is the method that records audio.
Since the device you mention also has leds, you can further implement updateColors() and updateBrighness()
Ok, I have uploaded the firmware to my M5Stack Atom Echo, and it’s really easy to use! For now I’m using the button to start a session instead of using a wake word. Granted, the audio quality isn’t good, but this works nice as a low-end satellite for testing Rhasspy Hermes apps.
One thing I noticed is that sometimes after pushing the button the device immediately stops recording and thinks it captures an intent for a short word, “yes”. Is this a known issue?
@romkabouter Wow that sounds great!
Do you have any experiences what hardware has the best results in sound quality?
Is Matrix Voice worth the money or is the AI tinker AudioKit as good as Matrix Voice?
Any suggestions what hardware I should start with?
Thank you!
Thanks, indeed the sound is not great but I think that is more the issue of the M5. Sound with the factory software as a BT sink was also not great. It is a pretty small device and speaker so that is to be expected.
I have noticed that as well, I have not put effort in that yet. The hardware button just publishes a message to startSession, which in turn triggers the HotwordDetected state. Maybe a silence is detected straight after that by Rhasspy. Need to take a look at that.
You can set the wakeword to local, then it should NOT send audio when Idle but only when you press the button and the HotwordDetected state is triggered.
Well, the Matrix Voice can only play 44100 sample rate.
Receiving audio on that rate does not work well and you will hear hissing sounds very often.
I therefore recommend not higher than 22050 samplerate, the software does resampling to play it on the Matrix Voice. It does not do a very good job at that however.
The Matrix Voice is a nice device, but that lack of support for audio playing makes me say that it might be better to have a AudioKit or an M5 Atom Echo. Both of them are much cheaper. I do not own an AudioKit but it has the same I2S support as the M5 Atom Echo.
If you have no need to play audio then I think the Matrix Voice might be better. Although much bigger, is has shiny leds
The M5 Atom Echo on the other side is much more a finished device, coming in a nice little case an all.
The AudioKit does not have a case, neither has the Matrix Voice.
Basically it boils down to, as always, “it depends”.
If you want great sound quality, you can build a device yourself with a good speaker powered by an esp32 running this software. I will accept pull requests for new devices
The audiokit has bugged me for a while as for me the esp32 and audio is great but the rest of the dev board is redundant.
I can not find a simple small dev kit anywhere so I will let you know how soldering those go and to be honest I could just solder the audio inputs and 3.3 direct but those adapter boards are so cheap thought I would give it a go.
I got x2 with 2xa1s for ÂŁ10 so will let you know how they go on after the slow boat from china
I also have a what is hopefully a killer KWS but so you don’t get trapped by the obsolesce of a system its from KW they will broadcast until they get a mqtt message to stop and that is it, no rhasppy specifics as a simple app server side will have to act as a bridge/relay.
I haven’t found vad apart from the ADF and haven’t checked how well that works so if not a server can still run vad on the incoming chunks.
@romkabouter Thank you for your fast reply. With audio quality a meant the quality of recorded audio that is send to rhasspy. So how is the speech recognition performance with rhasspy? Is the quality good enough to cover one room? What do you think is the best one?
Thank you!
I think that is fine, I had no problems with Rhasspy with it. I was in a room about 30m2, but your miles may vary. It is also dependant on your surroundings.
Small update: I have got the cores switched. Default core for tasks is 1.
The audio task should therefore not run on 1 but on 0 for better performance.
I was getting fallout off the messages.
@romkabouter I now noticed the same behaviour when using my laptop as a satellite, but just once. So I don’t think it’s an issue in your code: it’s just that it’s triggered much more frequently with the Atom Echo’s lower-quality microphone and/or speaker.
The wifi code runs on core 0 and think it consumes a lot of the cores capability depending on action.
Should be OK but apparently you need to be careful as its quite easy to set off a core 0 panic.
The task priority is set to 3, so the wifi task should be able to handle it.
Setting the audiostream task to core 1 gave to much pressure on core 1 (since that is the default core for arduino code if I am not mistaken)
In any case, with the streamtask pinned to 1 there audioflow was flaky.
With the task running on core 0, it works well.
Yeah arduino code runs on Core 1 as Core 0 is running freertos & networking stack
PS I got the 2x Ai Thinker A1S modules for ÂŁ4 each with a AC101 audio codec onboard the make the new raspberry Pico look a poor choice.
The breakout board where for standard esp32 so may just solder direct to the back with with my MS eyes and hands it might be optimism will just have to be patient.