I saw the DPS and WHDL code used for the Matrix Voice, don’t know if I really need to reach that kind of complexity, but is something to keep in mind.
Thanks! I also found this, very similar (same mechanism on the kws part):
THIS is definitely something to look at, thanks!
Yep! once tested
I made a new component that merge the INMP/MAX with the blink/status part of the M5 Echo Led, and I pull it once ready!
The Satellite will ask the Base “what’s time is it?”; the Base send the TTS to a topic similar to hermes/audioServer/baseID/audioFrame (sorry I dont remember the correct topic and now I’m at work) but the satellite is subscribed to hermes/audioServer/satelliteID/audioFrame.
The only way to make it work is to change the baseID and the satelliteID to make them equals.
I need the Base to understand that the request came from satellite1 and publish the audio output with the topic to which the satellite1 is subscribed.
Is it something related to this? I think that the small form factor can influence the overall performance… also the sound output is very poor.
I think you have some incorrect settings.
The base listens to all hermes/audioServer/<siteID>/audioFrame topics for all siteId you fill in (comma separated) on the “Satellite siteIds:” field in the various setting sections.
Make sure your base and sat have different siteId’s
It’s the opposite: I want the Satellite to play the TTS produced by the Base.
In the example, I ask “What time is it?” to the Satellite. It sends the audio sentence to the Base, trigger the automation (Home Assistant) that produce the answer. Home Assistant send the time text to the Base for the TTS.
The Base publish the produced TTS on the topic hermes/audioServer/Base/audioFrame.
But the Satellite is subscribed only to the topic hermes/audioServer/Satellite/audioFrame.
So no sound is produced!
It works only if the Satellite is subscribed to the hermes/audioServer/Base/audioFrame topic, hence the need to have the two identical IDs.
I’d like to find a way for the Base to publish the audio on the related Satellite topic, the one who trigger the request.
Thanks!
So maybe a Matrix Voice can be a faster solution? Do u have any experience with that?
Don’t know, how Home Assistant handles that, but in general, any automation system should be able to address any response back via Rhasspy to the one satellite that had been used as (voice) input.
Using the “base’s” resources to generate audio via it’s TTS system is possible, see once more Tutorials - Rhasspy, especially the graphic in “Shared MQTT Broker”.
As @romkabouter already has mentionned: You will have to put each satellite in the base’s list of satellites to serve with this specific service. Make sure, each satellite has a unique name.
I also had some trouble at first because I was thinking of “publishing” as sending messages from one machine to another. But its actually easier than that.
Make sure your base and satellite machines are all given unique SiteIDs, like:
On your Base Rhasspy, under settings for the services you want to call from your satellites, simply enter the SiteIDs into the Satellite IDs field, like:
This tells the base to listen for (and respond to) messages from any of these satellites.
That is it !!
Behind the scenes
The satellite does NOT send any message to baseID - instead
The satellite publishes a message using the appropriate topic which includes its own satelliteID.
The base rhasspy is subscribed to (listening to) messages for the satelliteID (and any othersatelite you specified).
The base then publishes its response with a topic which includes the satelliteID.
Both satellite and base are publishing messages with the same siteID in the topic.
Reading the wiki, I made some mistakes… I’ll try a different configuration and retest… thanks!
On the hardware side, I’m kinda stuck for the satellite. I’m tempted to try the ESP32-LyraTD with the built-in FPGA but, even if it’s a two years-old piece of hardware, I didn’t find any kind of test or experiment or blog inside the usual community (like hackaday). Do u think it worth a try? It’s cheaper than a raspi lol.
I knew it was the wrong place to ask such a question lol
Even a Korvo1 or Korvo2; all of them cost from 50 to 70€.
I first need to understand the better one for my purposes.
The Raspberry Pi Zero 2 if it wasn’t for stock and scalpers is only $15 and very hard to beat.
The standard esp32 with no ps-rams is very short of resources and even with.
The newer esp32-s3 has much more scope but prices on those are much higher and the esp32-box they did is like the Zero and currently out of stock as not sure what happened to the supposed esp32-box-lite release.
Its great that its a showcase of what you can do when pushing the newer esp32-s3 chips to the max but contains quite qty of espressif closed source blobs.
Personally I just want the ADC & ESP32-S3 on a low cost dev board as the rest is superficial to my needs but unfortunately doesn’t exist.
I would check if Mouser are up to date as I don’t think espressif see’s itself as a product supplier it merely did some limited runs as a product showcase.
It was showcased and we have the documented design also with software libs to provide various low cost wireless designs but likely they would need to be fabbed, but try mouser as maybe.
I have rhasspy set to aplay and the sound is coming out of the speaker plugged into the matrix voice. Broker is set to external. Here is an HA automation which, with help, I used to format the time and reply to the matrix being asked “What is the time”.
service: mqtt.publish
data:
topic: hermes/dialogueManager/endSession
payload_template: |-
{% if now().strftime("%M") | int == 0 %}
{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "The time is {{ now().hour }} hundred "}
{% elif now().strftime("%M") | int < 10 %}
{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "The time is {{ now().hour }} oh {{ now().minute }} "}
{% else %}
{"sessionId": "{{ trigger.event.data._intent.sessionId }}", "text": "The time is {{ now().hour }} {{ now().minute }} "}
{% endif %}
Cool I thought everywhere was out of stock, which I had stopped checking as got myself a esp32-s3-box-lite.
I will have both when the lite arrives as already have the esp32-s3-box as the 4 channel ADC on the esp32-s3-box is hard to source, the lite version doesn’t have the dock and also uses a 2 channel adc which is easy to source as been wondering if the firmware can be mangled to run on a standard esp32-s3 dev kit.
I can hack a bit of C and I am groaning at the idea, but interested if there is any noticeable difference in recognition. So I can test that at least
New items here!
I’m moving my first steps with IDF and ADF… so far so good. I miss a bit platform.io and some useful features but it’s fine.
I started with the LyraTD-MSC since I think it will be easy for my main objective (it’s very similar to the Matrix-Voice, a normal ESP32 with a DSP, mic array and so on)
Once I’ll have some experience, understanding wiring and components, I’ll try to thinker a bit the ESP32-Rhasspy-Satellite and use it. @romkabouter: I’ll keep u updated!
I got the esp32-s3-box-lite seems to work just as well as the non-lite version so maybe that firmware can be hacked to use 2x I2S mics instead.
Haven’t really done anything dev wise.
I think the KWS might be non streaming and quite low speed rolling window as it seems to show the effects of that sort of method where if the window and timing is off you can get false positives.
Not exactly sure of that as it seems to work much better in certain positions and could of been position more,
It does work quite well and can operate under a bit of 3rd party noise but at full blast its aec doesn’t seem that great.
The internal amp and speaker are tiny and it does sound like a barbie toy and would only use for bleeps and announcements sounds than any form of media or voice output.