Rebranded the Matrix Voice to esp32-rhasspy-satellite

Hi all!

I have rewritten the entire code of the matrix voice streamer.
Since is supports multiple devices now, I have rebranded the repo as well.

The code now uses a statemachine and it is easy to add devices.
Each device kan implement the needed methods to make it work as a satellite.

Supported:
M5 Atom Echo:

AI tinker AudioKit:

Matrix Voice:

The repo can be found here:

7 Likes

Impressive rewrite! I have been following this development for a while now, and I have updated your repo information in awesome-rhasspy too.

I’ll try it soon on my M5Stack Atom Echo.

1 Like

Thanks for that :slight_smile:

I’d really like local hotword again, so waiting for porcupine

Hi,
This looks exciting to me,cause i am not a great Programmer.
Is it now possible to easy combine any esp32 development Board like this… and a i2s Mic Array Board like this?

Yes.

The software will run on every esp32, it is up to you to create a new device with implements the readAudio() at a bare minimum.
This is the method that records audio.
Since the device you mention also has leds, you can further implement updateColors() and updateBrighness()

You can find the current possible methods here:

Ok, I have uploaded the firmware to my M5Stack Atom Echo, and it’s really easy to use! For now I’m using the button to start a session instead of using a wake word. Granted, the audio quality isn’t good, but this works nice as a low-end satellite for testing Rhasspy Hermes apps.

One thing I noticed is that sometimes after pushing the button the device immediately stops recording and thinks it captures an intent for a short word, “yes”. Is this a known issue?

@romkabouter Wow that sounds great!
Do you have any experiences what hardware has the best results in sound quality?
Is Matrix Voice worth the money or is the AI tinker AudioKit as good as Matrix Voice?
Any suggestions what hardware I should start with?
Thank you!

Thanks, indeed the sound is not great but I think that is more the issue of the M5. Sound with the factory software as a BT sink was also not great. It is a pretty small device and speaker so that is to be expected.

I have noticed that as well, I have not put effort in that yet. The hardware button just publishes a message to startSession, which in turn triggers the HotwordDetected state. Maybe a silence is detected straight after that by Rhasspy. Need to take a look at that.

You can set the wakeword to local, then it should NOT send audio when Idle but only when you press the button and the HotwordDetected state is triggered.

Well, the Matrix Voice can only play 44100 sample rate.
Receiving audio on that rate does not work well and you will hear hissing sounds very often.
I therefore recommend not higher than 22050 samplerate, the software does resampling to play it on the Matrix Voice. It does not do a very good job at that however.

The Matrix Voice is a nice device, but that lack of support for audio playing makes me say that it might be better to have a AudioKit or an M5 Atom Echo. Both of them are much cheaper. I do not own an AudioKit but it has the same I2S support as the M5 Atom Echo.

If you have no need to play audio then I think the Matrix Voice might be better. Although much bigger, is has shiny leds :smiley:
The M5 Atom Echo on the other side is much more a finished device, coming in a nice little case an all.
The AudioKit does not have a case, neither has the Matrix Voice.

Basically it boils down to, as always, “it depends”.
If you want great sound quality, you can build a device yourself with a good speaker powered by an esp32 running this software. I will accept pull requests for new devices :slight_smile:

Still good @romkabouter as https://uk.banggood.com/ESP32-Aduio-Kit-WiFi-bluetooth-Module-ESP32-Serial-to-WiFi-Audio-Development-Board-with-ESP32-A1S-p-1449256.html is £10 and has a AC101 codec which should be pretty good quality audio.

The audiokit has bugged me for a while as for me the esp32 and audio is great but the rest of the dev board is redundant.

I can not find a simple small dev kit anywhere so I will let you know how soldering those go and to be honest I could just solder the audio inputs and 3.3 direct but those adapter boards are so cheap thought I would give it a go.

I got x2 with 2xa1s for £10 so will let you know how they go on after the slow boat from china

I also have a what is hopefully a killer KWS but so you don’t get trapped by the obsolesce of a system its from KW they will broadcast until they get a mqtt message to stop and that is it, no rhasppy specifics as a simple app server side will have to act as a bridge/relay.
I haven’t found vad apart from the ADF and haven’t checked how well that works so if not a server can still run vad on the incoming chunks.

@romkabouter Thank you for your fast reply. With audio quality a meant the quality of recorded audio that is send to rhasspy. So how is the speech recognition performance with rhasspy? Is the quality good enough to cover one room? What do you think is the best one?
Thank you!

Nice, I’d like to check it when your done :slight_smile:

I think that is fine, I had no problems with Rhasspy with it. I was in a room about 30m2, but your miles may vary. It is also dependant on your surroundings.

Small update: I have got the cores switched. Default core for tasks is 1.
The audio task should therefore not run on 1 but on 0 for better performance.
I was getting fallout off the messages.

Please check release 7.1