Slightly confused about rhasspy-satellite

rolyan_trauts · May 4, 2020, 3:43am

I am trying to get my head round rhasspy-satellite and if its wrongly named or I need to think of another infrasture name.
I probably had preconcieved ideas on what satellite should be and maybe being too specific but to enable a mode where voice clients have a really simple layout and are basically mic/speaker wifi satellites to a singular more powerfull SoC.

I have been playing with snapcast for this and it looks like it should very much fit the bill, I need to do actual tests of multiple clients on a host but think you can actually hardcode IP and Port if needed.

If I take a simple 2 satellite system of left/right speaker/mic broadcasting is just a standard snapcast where the channel of the stream are set.
Each satelite can also contain a server and the thing to test is multiple streams/instances client side as not sure if this can be a singular server or instance, but there is always docker to the rescue.
Each stream can go to a loopback and on the other side would just look like another alsa source.

So essentially that is my satelite mode and it has some advantages as audio processing load is pushed to the left/right satelites or however many you may include in a room.
I am looking at the collection of libraries for satellites only and wondering why you would need so much if these are satellite only as it would seem to be practically a complete rhasspy install.

Even KWS is not needed on a satelite but an initialise KWS to an authoritive KWS brings some load advantages to the server if needed but everthing else should be server located and not satellite as its no advantage to satellite load.
By using snapcast you can vastly reduce latency as steams as they happen without the need for ‘vad silence’ processing into wav chunks also the streams of multiple satellites are latency adjusted by the shared network time of the snapserver.
The multiple devices of the loopback can be aggregated via type multi then routed and summed as a quick start, but would be great to maybe add some finnesse to the audio routing of satellites, but as base its ready to go.

Looking at the modules included in rhasspy-satellite is not rhasspy-mesh more appropriate?
As it would seem from the modules included a distributed mesh of modules might be the idea?
Naming isn’t really a problem apart from maybe it might confuse but for want of a better name I have a very specific and what would seem a logical model for a rhasspby satellite that actually requires nothing of the current rhasspy-satellite repo and curious to what rhasspy-satellite is supposed to be?

I would like to take a look at adding rhasspy custom rpcjson to snapcast to include channel vad for channel selection and return back state signals that could be used for LED indication, but after that due to diversification of use and what I can think of thats a complete satellite system.

romkabouter · May 4, 2020, 9:15am

That is the easiest way, and then control what’s active via the settings.
You can also only run the docker services needed, but for non power users that is to complicated

Not in my opinion, mesh implies that satellites also connect to each other and relay messages. Which is not the case.

The hermes protocol was chosen, or better, evolved, because of Snips was shutting down

rolyan_trauts · May 4, 2020, 11:37am

That is the easiest way, and then control what’s active via the settings.
You can also only run the docker services needed, but for non power users that is to complicated

I sort of dissagree as a satelite only needs to pipe audio to and from a server and all you need to look at the ton of python libs and docker instances you have, that I have to say that is just not true.

Not in my opinion, mesh implies that satellites also connect to each other and relay messages. Which is not the case.

As name not so bothered but was trying to understand why so much of what is a complete rhaspby server was in the satellite package, I thought the rationale for that huge amount was so that a satellite could also be a distributed service of rhasspy as couldn’t understand why so much was included in a satellite.
You do seem to have the makings of a mesh but usually a satellite is much more diminutive than what it orbits, but seems little difference in whats in the repo.

ced_cox · May 4, 2020, 12:27pm

For me not. I don’t want X satellites broadcast continuously audio.

So hermes is a good choice for me.

Ced

rolyan_trauts · May 4, 2020, 12:30pm

Doesn’t have to have X satelites broadcast audio continously but can do so with a simple pipe.
Add vad or KWS it only broadcasts from mics that recieved good signal and provides huge advantages in a wide array microphone system.
But that isn’t my point its why is there so much there in the Rhasspy-satelite that repo is enormous?
Confused.com !?

ced_cox · May 4, 2020, 12:42pm

Without hotword ? Just VAD ? no. In noising room, the satellite will continuously send via pipe or anything else.
For me, on satellite, you need audio recording (via pyAudio), hotword, (myChoice : TTS) and audio playing.
and tts it’s my choice but you can only user server one.
And you can do this 3-4 actions with rhasspy satellite. with good perf.

Yes rhasspy satellite is big, but it’s a project where a lot of modules need to speak together. And when you download rhasspy-satellite, may be you download module that you’ll not use. But someone else will use. So it is in default rhasspy-satellite package.

But, as someone said before, if you want something light, you can use little docker images of strict necessary module.

Ced

rolyan_trauts · May 4, 2020, 12:51pm

I am afraid what you are saying is paradoxical as much of the docker service there rely on each other and from a guestimate of what would be required for a functional rhasspby satelite, that is quite a lot…
My question is why is the satellite so big and no one can seem to give a reasonable reply apart from choice of it being so big as none of that is needed for a satelite.
You don’t even need pyaudio as you can simple arecord into a pipe with a single cli command if you wanted to.

Seriously its paradoxical and non sensical as a paragraph that starts with the easiet way but too complicated for non power users.?!

That is the easiest way, and then control what’s active via the settings.
You can also only run the docker services needed, but for non power users that is to complicated

I thought I would ask, I don’t get it obviously, but you actually need very little in fact really none of it for a Rhasspy satellite and there is no paradox in that.

romkabouter · May 4, 2020, 3:02pm

No, the satellite also needs hotword detection. Otherwise your network will be cluttered with audiostreams

rolyan_trauts · May 4, 2020, 3:19pm

There is far more than just KWS in the repo and no you don’t have to have KWS on a satellite if your network is cluttered or not by a few compressed audio streams is choice and also dependent firstly on VAD.

IMHO it would seem bloat, but I am not the one developing but the arguments for generally are not true.
That compressed streams of mono 16khz audio via flac or vorbis isn’t a valid argument for the rationale of the design.
Not sure what the bitrate of S16_LE 16000hz mono is via flac or vorbis but its not that huge depending on compression ratio.

ced_cox · May 4, 2020, 3:31pm

To have a wifi board that emits continuously is not an option for me. I prefer the board emits only when necessary.

Ced

rolyan_trauts · May 4, 2020, 3:34pm

It might not be an option for you but the repo you are creating for Rhasspy-satellite is taking options away.
If you wish you can install KWS or all manner of things and that is choice but still much of what is there still isn’t needed for this.

ced_cox · May 4, 2020, 3:47pm

Maybe, but maybe all of this would be used by 90% of people ? People wants simplicity for use, for install and for configuration.

It’s sure, things could be done differently. It’s not the choice done for Rhasspy. Maybe you should look voice2Json.

Ced

rolyan_trauts · May 4, 2020, 3:49pm

The thing I keep repeating is that you are not making things simple but doing the oppisite you are adding complexity to Rhasspy-satellite with what are paradoxical rationales.

I get that you guys want the whole Hermes framework, I don’t know why but actually its not needed or makes things any simpler for use or employment.

fastjack · May 4, 2020, 4:13pm

If you are talking about Snapcast as an alternative for broadcasting audio frames between satellites and master, I’d say that I’m not a fan of basing such an essential piece of the Rhasspy ecosystem on someone else’s software when we have already a perfectly working and simple solution using MQTT.

MQTT is now the “de facto” standard for IoT devices and the Hermes protocol is pretty good and also completely open (it has some short comings though but they will probably be addressed in the future).

I really like the “pluggable” nature of MQTT.

Also, I’ve used Snapcast extensively and although it is an incredible piece of software (perfect for synchronized audio playback), its group/client/stream management has given me headaches.

I agree with @ced_cox that broadcasting WIFI packets all the time is usually not wanted / required / advisable.

For an optimal setup, I think the satellite should only do audio IN, audio OUT and wakeword detection (pretty much like an Echo Dot or a Google Home). I agree that rhasspy-satellite is pretty bloated with stuffs that should probably not be embedded like NLU and dialogue management. This will surely evolve in the coming month as the official release of Rhasspy 2.5 approach.

The audio IN service emits audio frames to a MQTT broker by default. You can change the configuration of the audio IN service to broadcast all audio frames to a centralized MQTT broker if you wish:

That’s the beauty of Rhasspy 2.5 modular approach

rolyan_trauts · May 4, 2020, 4:31pm

That is exactly what I was saying and rhasspy-satellite seems to be already losing the beauty of that modularity due to much unnecessary bloat.
It doesn’t have to be snapcast, could be pulseaudio but for me snapcast is this latency compensated, compressed stream audio system that seems to work beautifully and doesn’t require programming input or support as that belongs to someone else.

I am bemused as Hermes-audio for satellite-rhasspy only rationale is because you want to program and support it maybe?
A Rhaspy-satellite doesn’t need MQTT because its purely a voice capture and delivery mechanism for the server its the satellite for. The server needs “de facto” standard MQTT but once I repeat this does not mean Rhasspy-satelite does and once more repeat why does this seems necessary?

From not saying it will be broadcasted all the time even if it is the approx 128Kb/s streams of mic audio are no problem with even low grade 56Mb/s wifi.
That security on the network is no different to the audio chunks transmitted in MQTT streams.
But losing the advantages and benefit of a wide array ad hoc latency adjusted microphone system seems to be a high price to pay.

I am looking at hardware now and need to get as much performance out of approx Pi3 level due to high audio processing needs. I can do this by offloading much, keeping things simple and because of the brilliant modular nature of Rhasspy can have an authoritve server do this for me.

Wav chunks by nature bring in and add latency that I can improve by having direct streams.

Its no trouble but sounding like I will not be using the Rhasspy-satellite part of the offering and will have to hack together something of my own, likely in conjunction with snapcast as in this instance I believe it to be superior to rhasspy-microphone-cli-hermes as there is need for instaneous stream and I don’t need the packaging of MQTT for satellite to server audio.
The server will process that as that is what its there for, as is a satellite for Rhasspy as that is its only function to be a Rhasspy satellite.

romkabouter · May 4, 2020, 10:20pm

There are more Audio Inputs and ways to implement satellites:

https://rhasspy.readthedocs.io/en/latest/audio-input/

At this point, MQTT is already here and it can be used a a satellite setup.
It is certainly not the only way to setup satellites
But for other setups, there is no software yet.
Since Rhasspy is open source, I think it would be a great idea to have different types of satellites.

rolyan_trauts · May 5, 2020, 1:01am

MQTT is not already here you are just trying to enforce that its here in Rhasspy-satelite that isn’t even here yet.

I don’t understand the level of disengenuios comment on this topic is just rife as you post the documentation to provide contary evidence statements made 3 lines later.

PyAudio

Streams microphone data from a PyAudio device. This is the default audio input system, and should work with both ALSA and PulseAudio.

Add to your profile:
"microphone": {
  "system": "pyaudio",
  "pyaudio": {
    "device": "",
    "frames_per_buffer": 480
  }
}
Set microphone.pyaudio.device to a PyAudio device number or leave blank for the default device. Streams 30ms chunks of 16-bit, 16 kHz mono audio by default (480 frames).

See rhasspy.audio_recorder.PyAudioRecorder for details.

According to the documentation the sofware that doesn’t exist yet is actually the default.
The software exist and it require zero support or creation as its pure Alsa to set up the default playback device as a file or Fifo sink.

I am not going to use Rhasspy-satellite as I don’t want to touch that repo as its massive huge bloat for the needs for a sattelite that has lightweight extremely effective off the shelf opensource solutions much built deep into linux already.

I have a hunch why there is such strong oppisition to using a strong piece of opensource software like snapcast, that is extremely lightweight and tackles the huge problem of audio latency very effectively.
From reading and looking at the snapcraft choice of GNU licence I know I will be using a different type of Rhasspy-satellite and it will not be from Rhasspy-sattelite as that repo is absolutely packed with unesscary bloat that has no need or rationale to be there.

That is an example of honesty without paradoxical statement and I am going to halt in this conversion as its been inflamed by some blatant untruths.
I have no idea why Hermes for satellite is so important to you, you haven’t given a single clear rationale why apart its something you choose without justification.
Even in paragraphs under scrutiny what you say is a paradox?!

I am confused and bemused as always, have no wish for pointless argument but the replies have just exascerbated and made it more evident that the current choice is rather dubious.
I guess its all about sphere of interest and specific knowledge but there seems very little correlation to actual benefit.

So I will leave that one as is and apols if my opinion differs.

JGKK · May 5, 2020, 6:13am

I really second the recommendation that you should take a look at voice2json.org the little sister of rhasspy if you feel that rhasspy itself is to bloated and has too much overhead for your usecase. Than you can still use the features you want but build the infrastructure around it to your liking.

rolyan_trauts · May 5, 2020, 6:30am

I may do but no-one is saying rhasspy is bloated the point in question is rhasspy satellite which definitely is.
To be honest I will prob use rhasspy but the satellites will be very much my own implementation.
Thankfully it needs little to no coding and hence why I am wondering what is going on?

Having a fat rhasspy central sever and distributed light satellites has numerous advantages, rhasspy as a server can be as fat as it wishes and it is no bother but strangely in comparison to the proposed satellite rhasspy is extremely lean as there is very little difference between the 2.

romkabouter · May 5, 2020, 8:45am

@rolyan_trauts, I think you misunderstand me.
The software that is not there is: A satellite which uses gstreamer or http stream to push audio to the server.

Also, you do not have to touch any repo, but if you want a satellite not using MQTT there are options to build one.
This has not been done, because of the simple fact that Rhasspy has evolved from only a server. Then came the idea of remote microphone and we looked at Snips.

When Snips was being shut down, the idea was to give the people a replacement. Thus using the Hermes Protocol.

Currently there is no support for Snapcast as audio input/output, but I think its a good idea to see if it can be fit in.
There Hermes protocol is not important to me as such, but the communication between the different modules uses it so it is needed.
Rhasspy is build modular, so other ways for input/output of audio would be great.
Also other software for satellites would be welcomed much I think, because everyone has different needs.

There is no holy grail in Rhasspy.