Rebranded the Matrix Voice to esp32-rhasspy-satellite

romkabouter · January 13, 2021, 8:28pm

Hi all!

I have rewritten the entire code of the matrix voice streamer.
Since is supports multiple devices now, I have rebranded the repo as well.

The code now uses a statemachine and it is easy to add devices.
Each device kan implement the needed methods to make it work as a satellite.

Supported:
M5 Atom Echo:

AI tinker AudioKit:

Matrix Voice:

The repo can be found here:

koan · January 13, 2021, 8:53pm

Impressive rewrite! I have been following this development for a while now, and I have updated your repo information in awesome-rhasspy too.

I’ll try it soon on my M5Stack Atom Echo.

romkabouter · January 15, 2021, 2:00pm

Thanks for that

I’d really like local hotword again, so waiting for porcupine

Sebastian_Seitz · January 15, 2021, 8:38pm

Hi,
This looks exciting to me,cause i am not a great Programmer.
Is it now possible to easy combine any esp32 development Board like this… and a i2s Mic Array Board like this?

romkabouter · January 16, 2021, 7:57am

Yes.

The software will run on every esp32, it is up to you to create a new device with implements the readAudio() at a bare minimum.
This is the method that records audio.
Since the device you mention also has leds, you can further implement updateColors() and updateBrighness()

You can find the current possible methods here:

github.com

Romkabouter/ESP32-Rhasspy-Satellite/blob/master/PlatformIO/src/device.h

int hotword_colors[4] = {0, 255, 0, 0};
int idle_colors[4] = {0, 0, 255, 0};
int wifi_conn_colors[4] = {0, 0, 255, 0};
int wifi_disc_colors[4] = {255, 0, 0, 0};
int ota_colors[4] = {0, 0, 0, 255};
enum {
  COLORS_HOTWORD = 0,
  COLORS_WIFI_CONNECTED = 1,
  COLORS_WIFI_DISCONNECTED = 2,
  COLORS_IDLE = 3,
  COLORS_OTA = 4
};
enum {
  MODE_MIC = 0,
  MODE_SPK = 1
};

//Devices can have several outputs. 
//The Matrix Voice has an jack and a speaker
enum {

This file has been truncated. show original

koan · January 20, 2021, 8:09pm

Ok, I have uploaded the firmware to my M5Stack Atom Echo, and it’s really easy to use! For now I’m using the button to start a session instead of using a wake word. Granted, the audio quality isn’t good, but this works nice as a low-end satellite for testing Rhasspy Hermes apps.

One thing I noticed is that sometimes after pushing the button the device immediately stops recording and thinks it captures an intent for a short word, “yes”. Is this a known issue?

alex4444 · January 20, 2021, 9:19pm

@romkabouter Wow that sounds great!
Do you have any experiences what hardware has the best results in sound quality?
Is Matrix Voice worth the money or is the AI tinker AudioKit as good as Matrix Voice?
Any suggestions what hardware I should start with?
Thank you!

romkabouter · January 20, 2021, 10:31pm

Thanks, indeed the sound is not great but I think that is more the issue of the M5. Sound with the factory software as a BT sink was also not great. It is a pretty small device and speaker so that is to be expected.

I have noticed that as well, I have not put effort in that yet. The hardware button just publishes a message to startSession, which in turn triggers the HotwordDetected state. Maybe a silence is detected straight after that by Rhasspy. Need to take a look at that.

You can set the wakeword to local, then it should NOT send audio when Idle but only when you press the button and the HotwordDetected state is triggered.

romkabouter · January 20, 2021, 10:45pm

Well, the Matrix Voice can only play 44100 sample rate.
Receiving audio on that rate does not work well and you will hear hissing sounds very often.
I therefore recommend not higher than 22050 samplerate, the software does resampling to play it on the Matrix Voice. It does not do a very good job at that however.

The Matrix Voice is a nice device, but that lack of support for audio playing makes me say that it might be better to have a AudioKit or an M5 Atom Echo. Both of them are much cheaper. I do not own an AudioKit but it has the same I2S support as the M5 Atom Echo.

If you have no need to play audio then I think the Matrix Voice might be better. Although much bigger, is has shiny leds
The M5 Atom Echo on the other side is much more a finished device, coming in a nice little case an all.
The AudioKit does not have a case, neither has the Matrix Voice.

Basically it boils down to, as always, “it depends”.
If you want great sound quality, you can build a device yourself with a good speaker powered by an esp32 running this software. I will accept pull requests for new devices

rolyan_trauts · January 21, 2021, 12:27am

Still good @romkabouter as https://uk.banggood.com/ESP32-Aduio-Kit-WiFi-bluetooth-Module-ESP32-Serial-to-WiFi-Audio-Development-Board-with-ESP32-A1S-p-1449256.html is £10 and has a AC101 codec which should be pretty good quality audio.

The audiokit has bugged me for a while as for me the esp32 and audio is great but the rest of the dev board is redundant.

I can not find a simple small dev kit anywhere so I will let you know how soldering those go and to be honest I could just solder the audio inputs and 3.3 direct but those adapter boards are so cheap thought I would give it a go.

I got x2 with 2xa1s for £10 so will let you know how they go on after the slow boat from china

I also have a what is hopefully a killer KWS but so you don’t get trapped by the obsolesce of a system its from KW they will broadcast until they get a mqtt message to stop and that is it, no rhasppy specifics as a simple app server side will have to act as a bridge/relay.
I haven’t found vad apart from the ADF and haven’t checked how well that works so if not a server can still run vad on the incoming chunks.

alex4444 · January 21, 2021, 7:33am

@romkabouter Thank you for your fast reply. With audio quality a meant the quality of recorded audio that is send to rhasspy. So how is the speech recognition performance with rhasspy? Is the quality good enough to cover one room? What do you think is the best one?
Thank you!

romkabouter · January 21, 2021, 8:25am

Nice, I’d like to check it when your done

I think that is fine, I had no problems with Rhasspy with it. I was in a room about 30m2, but your miles may vary. It is also dependant on your surroundings.

romkabouter · January 21, 2021, 7:43pm

Small update: I have got the cores switched. Default core for tasks is 1.
The audio task should therefore not run on 1 but on 0 for better performance.
I was getting fallout off the messages.

Please check release 7.1

koan · January 26, 2021, 10:29am

@romkabouter I now noticed the same behaviour when using my laptop as a satellite, but just once. So I don’t think it’s an issue in your code: it’s just that it’s triggered much more frequently with the Atom Echo’s lower-quality microphone and/or speaker.

romkabouter · January 26, 2021, 11:26am

ok great, thank you for the feedback

rolyan_trauts · January 26, 2021, 12:17pm

The wifi code runs on core 0 and think it consumes a lot of the cores capability depending on action.
Should be OK but apparently you need to be careful as its quite easy to set off a core 0 panic.

romkabouter · January 26, 2021, 7:08pm

The task priority is set to 3, so the wifi task should be able to handle it.
Setting the audiostream task to core 1 gave to much pressure on core 1 (since that is the default core for arduino code if I am not mistaken)

In any case, with the streamtask pinned to 1 there audioflow was flaky.
With the task running on core 0, it works well.

rolyan_trauts · January 27, 2021, 12:00am

Yeah arduino code runs on Core 1 as Core 0 is running freertos & networking stack

PS I got the 2x Ai Thinker A1S modules for £4 each with a AC101 audio codec onboard the make the new raspberry Pico look a poor choice.
The breakout board where for standard esp32 so may just solder direct to the back with with my MS eyes and hands it might be optimism will just have to be patient.

romkabouter · January 27, 2021, 11:41am

Yeah, the pico does not cut it I guess. Good luck soldering!

rolyan_trauts · January 27, 2021, 2:06pm

Not for Audio or Wifi/Bt but pico has USB which the ESP32 doesn’t but ESP32 is also 240Mhz.

I have a A1S AudioDevKit to test on as not sure if I might get some small boards built or solder direct.