2023 - Year of Voice

AndreKR · December 21, 2022, 3:28am

Configuration via config files would be ok for me, the web UI isn’t strictly necessary and sometimes even makes things more difficult.
However, the dropdown boxes showing all available options are really the killer feature of Rhasspy.
Take for example Mycroft, if you really want local STT you have to first find the correct documentaion and then start fiddling around, to the point where the claim of being “privacy-focused” is borderline fraudulent - if you really want privacy then prepare for a night of reading external documentation and fiddling with the command line.

Sure, isn’t it like that already? But installation, model download, etc. of those programs shouldn’t be less automatic than it is now. I’m willing to start a Docker container each for Porcupine, Kaldi, etc. but I’d prefer not to fiddle around with pip and Python dependency version conflicts and finding some models in the correct format.

Do you mean the “training” that happens after you change sentences.ini or training a whole new wakeword/language/voice?
Calling a command line tool after changing sentences.ini isn’t too hard. I’ll probably set up a watcher of some kind that runs it whenever sentences.ini changes. Make sure it doesn’t corrupt everything if it is run twice in parallel.
Training new languages/voices is already an external process, and one that we mere mortals aren’t supposed to do anyway, isn’t it?

synesthesiam · December 21, 2022, 3:54am

I agree, but I think (to start) this could also be a command-line “wizard” that asks you a few questions and then configures everything. It could even be aware of the hardware and suggest changes for better performance.

As @cwagner said, I may want to split the web UI off into its own project. Historically, it’s been one of the hardest parts for me to design and maintain since I’m not really a web developer. I’d much rather expose everything over a nice web API and have someone with more skills make it nice

Most things in Rhasspy right now are wrapped with a Python library/service that talks over MQTT to the core. My idea is the “external command” options for different services would become the model for all services.

Installation and model downloading definitely still needs to automatic

Mostly the changes to sentences.ini, but I think it should eventually include wake word training too. One idea is to have a training server that Rhasspy base stations and satellites can download updated STT models, etc. from.

Jarvy · December 21, 2022, 5:13am

What if Rhasspy didn’t come with a web UI, just HTTP/Websocket/etc. APIs?

I think this really depends on who you want to use Rhasspy. By getting rid of the UI, it’s sort’ve makes Rhasspy more for the developer than the hobbyist. It also begs the question if you want it to be used for integrating with other software or something that can run more standalone.

What if Rhasspy had no “plugins”, but only ever called external programs?

This would really decrease the barrier to adding new plugins! Just call out to the host system with the right commandline options. Rhasspy just needs to keep track of how to call it and you’re good to go.

I think to aid in your frontend development, you could make a “config” file (or even a pydantic model) that define the options and types for a plugin and then just generate the fields in the associated dropdown. That’s how some of the UI in Home Intent is rendered. (You can see the Home Assistant options here that get auto-rendered into the full UI component - pic is slightly out of date). It was intended for people to be able to define their own plugin settings and just have the UI get built automatically.

I’ve done it a few times, and as long as you don’t need super integrated forms, it can work well, and really aids in speed of development. Also, I set it up to be overridable - so for example the HA script control is a custom UI component, while the rest of HA settings are auto generated - but it all lives on the same page.

What if training in Rhasspy was separated into its own standalone application?

This could prove to be really powerful. Folks might have a more powerful computer at the ready to do training on and then run HA or Rhasspy on a lower-powered Pi. If training was separate, you could offload it to more powerful machines (Rhasspy training as a service?) and share the models back. Right now Rhasspy model generation (with the “recommended” plugins) can run fairly quick on a Pi, but I’ve never really pushed it to see when it tips over. In theory, better and more computationally complex models could be used as long as the output runs well on a Pi (or whatever we’re using)

donburch · December 21, 2022, 8:45am

Michael my quick look at Paulus’ blog post and your post here leaves me confused and disappointed.

I had assumed Rhasspy Junior was intended as a user-friendly interface integrated into HA; which uses Rhasspy 2.5 “under the hood”. I gave Rhasspy Junior a lot of thought a month ago, and came to the conclusion that Junior does not need to be a big deal – basically a HA integration which is web interface calling current Rhasspy APIs; with only 2 areas that require much development effort. But Junior doesn’t work for me, and seems to have been abandoned

You have already spent 6 weeks on Rhasspy v3, so you obviously have your own ideas and plans which we can only guess at. Your comments and questions above sound as though you are throwing Rhasspy 2.5 out the window to make a new framework for system integrators. Certainly there are improvements which can be made to 2.5; but it seems to me already an excellent extensible modular platform which achieves your stated developer-oriented objectives. Is there anything actually WRONG with rhasspy 2.5 ?

From where I stand Home Assistant/Rhasspy only needs:
• user-friendly perspective (full Rhasspy 2.5 is under the hood if/when users want more control)
• HA Rhasspy integration to have a user interface
• HA Rhasspy integration to scan hardware and create default sentences.ini
• cheap satellites with decent hardware (ESP32 based ?) and easy to install.

I get that you’re a developer; I am (or was) too; focussed on the technicalities and making a technically superior system. I get that Paulus, Frenck etc at Nabu Casa also have the same mindset. I know Home Assistant is a project by developers for other developers, and has a tradition of being user-UNfriendly to new users without the skills and expertise … but I believe that implementing and using Rhasspy should be the easiest thing to do because that’s what more and more people will be trying to do.

What level of technical expertise is required to use (drive) a car ? Does one have to be a Mechanic ? A Mechanical and Chemical Engineer ?

Can a modular privacy-focussed local Home Automation system be useful to non-engineers ? YOU BET !
I get that it is a waste for developers to get bogged down writing user-friendly documentation and answering the same basic questions of the support forum, But there are others with appropriate skills who would love to be allowed to help.

As for your questions …

It doesn’t make any difference to me how it works behind the scenes. I want it to work and be easy to learn and use.

Sounds like a plan. My coding and web development skills are about 15 years rusty though. If you want my help.

totesz · December 21, 2022, 10:09am

Just please take other languages really into consideration. I’ve been trying to set up mycroft with Hungarian before, and the sentence/template syntax was not just PITA but eventually I just gave up my translation efforts. Hungarian is a type of language which uses endings where the vowels depend on the word, so it causes problems both ways (commands and responses).

I’m really looking forward to this (as the wife-approval-factor for non-English speakers is heavily depending on such a feature), but until I see how the templating would work I’ll have to remain sceptical.

rolyan_trauts · December 21, 2022, 12:39pm

110% behind this direction and for me been a no-brainer for some time.

This will make things far more manageable and the simpler modules are just building blocks for what is essentially the serial chain of voice modules.
This should of always been decoupled from skill servers and all that is needed is a skill router that allows for the simple or the highly complex as you merely add more skill servers without need to maintain or understand the controls and methods of a skill but just pass inference.

A voice system is merely a set of applications / containers / instances that queue and pass to the next module in what is essentially a serial queue.
The less that is embedded into rhasppy means a bigger choice of implementation that is also more scalable.
The metadata needs for a voice system are extremely simple and that simplicity creates a building block system where complexity is choice.

It will be more manageable, offer more modules, be more scaleable and if it done right we could start to see plug & play linux inference based skill servers that can gather bigger herds because they are interoperable and not limited to a single system.

Its as simple as queue → routes that connect to the next stage that just advertises if busy or free.

What you have posted is Intents for Home Assistant and there is absolutely no need in a voice system as that should happen in a HA skill server that is routed and passed an inference?

ThisisDennis · December 22, 2022, 12:26pm

What if Rhasspy didn’t come with a web UI, just HTTP/Websocket/etc. APIs?
I’m a little confused, too. I think it would be very important to stay on an easy GUI. I can understand, that it’s not your point to focus on, but it’s an important entry point for all new users.
It’s also useful if you just want to change a small thing, without remote connection. Or a computer without the right setting.
What if Rhasspy had no “plugins”, but only ever called external programs?
I’m not sure if i have an opinion to that point.
What if training in Rhasspy was separated into its own standalone application?
That would be nice, if we can train our models on more powerful processors with the ability of sharing.
Would be also cool to train only sentence-files for a specific skill, but i think thats not the way it works. Or a Server to expand the ability of understanding.

rolyan_trauts · December 22, 2022, 11:00pm

There isn’t really enough info to go on but its such a radical change it doesn’t mean necessarily there will not be a webui even if how currently implemented massively insecure.

There is a problem with the current infrastructure of an all-in-one in what is not just the fastest evolving tech scene its one that is evolving at unprecedented speed.
Already much of what is contained in Rhasspy is obsolete where better open source SOTA boasting models exist freely that are aimed at various platforms from mobile to GPU.
Then we have hardware that in this scene is seeing almost as fast rapid evolution from Apple < 7watt idle RTX2080ti + ML perf to RK3588, NPU accelerators and problems with the Pi supply chain.

The current choice of all-in-one means it gives a few choices of certain modules and elsewhere specific modules and a protocol that is specifically rhasspy.
This means the current system is relatively locked in to a very narrow spectrum that also provide 100% support needs by a small (singular) dev team.

The OP that has questioned current infrastructure has been posed has been well overdue for sometime and 2023 and technology in general is adopting voice methods at a fast pace and the current all-in-one is just a huge constriction to choice, scalability and security.
The current voice scene is so fast moving that current modules are already relegated to a toy base.

Likely there will be a HaSkillServer but the current system and protocol is applied to all modules and is massively over complex as a voice system is not a control system and currently there are huge swaithes of control protocol on modules without need purely because they are part of an all-one.

The training of rhasspy currently only works due to low command volume and relatively unique phonetic collections as the ASR and NLU methods are quite old and even with fairly modest additions of ‘subject’ and ‘predicate’ accuracy will plummet.
It works on low volume predicates like ‘turn’ and subjects such as ‘light’ but a common skill such as a audio server with a modest library could flood an all-one-one with subjects and decimate how the current system garners accuracy.

If Rhasspy and Hass have any ambitions to be more than a toy system it needs a complete rethink in terms of voice control and like CISC vs RISC complexity can be built by reusing simple building block modules that scale.
A model doesn’t get better when you train it on a more powerful processor it is locked because a model is what it is and you can just train it faster and currently we are not really training a model just reorganising phonetic catchment.
The models we use are part of an all-one-one that has a hardware specific of raspberry pi and that is why we have the models we have which is also hugely restrictive.
It doesn’t get better with better hardware because we have specific models aimed at specific hardware.
Accuracy can be maintained or even increased by partitioning into predicate and subject domains whilst an all-in-one at any level will do the opposite and why the current infrastructure was and is deeply flawed.

But your worries are also misplaced because we never needed a front-end voice complexity that we currently have.
A very simple simple zonal, channel based system of KWS->KWS/Audio processor->ASR->Skill router->TTS is all we need and its a very simple serial chain.
The complexity under the hood to create a working voice system was never needed as it confused control with voice and partitioning this should give choice of hardware, model, scale and complexity and reuse of software from larger herds will reduce maintenance and increase support availability than pointlessly refactoring code to a smaller pool and embedding system specifics.

synesthesiam · December 23, 2022, 4:28am

@donburch Hopefully I can clear up some confusion

I’m not saying Rhasspy will be dropping the web UI, just that it should be optional. Like Hermes/MQTT, having so much baked into Rhasspy’s core has made it difficult keep up with the pace of change in the voice space (as @rolyan_trauts mentioned).

Regarding Rhasspy 2.5 vs 3.0, I believe for many users that the internal workings are less important than their sentences, slots, and profile settings. I will do my best not to break things unnecessarily, but it may take me a while.

I agree! As we’ve talked about with Rhasspy Junior, I think it’s possible to layer a user-friendly interface on top of something that more advanced users also enjoy. My plan is (loosely):

Voice services as regular programs that can still be used independently of Rhasspy
Small HTTP/websocket servers that wrap the voice services for satellites
Rhasspy’s core, which configures and coordinates the voice services into voice loops (wake → spech to text → etc.)
Web UI and other protocols like Hermes on top of the core

One feature the new parser has is that you can embed template pieces into words. In English, for example, you can have turn on the light[s] for both “light” and “lights”.

This helps with matching, but responses are usually more difficult. I’d be very interested to hear about what sorts of information needs to be tracked for Hungarian (gender, case, etc.). Please PM me or reply here

In the year I was at Mycroft, things changed so much! The best idea I’ve had is to lower the barrier to entry for adding a service to Rhasspy. Something as simple as: if your program takes a WAV file and returns text, you can be a speech to text service. No Python, no MQTT, just a program with arguments, standard in, and standard out.

But I do want there to be an “easy button” for users, which selects the programs based on some constraints (Pi 4 vs. GPU) and installs them.

And a logo that doesn’t look like it was drawn by a programmer would be nice

rolyan_trauts · December 23, 2022, 8:05am

If you look at some of the models that BigAI are doing such as GPT3/ChatGPT or Whisper things are moving at unprecedented speed.
You could do something extremely simple by sharing a host folder that is the output of one container and the input of another and a simple Inotify folder watcher to run a command.
The reciprocal could happen to state that has been cleared as an ‘I am free’.
But basically inotify-simple · PyPI and whatever is the run command.
My preference would be Unix sockets as they can be both file and net based and the same inter-process queue-bridge could be used at each step in the chain. As a file socket would act the same as above but with web based you can have multiple instances to scale to needs.
The only conf would be a filename or host:port for the chain to connect to next.

There are SOTA models now that if you wish and have intent on buying the hardware Large models such as Whisper, Hi Fidelity TTS and GPT style NLU is a valid option as selecting much lesser models to run on PI.
So I don’t think especially with HA that you can provide specifics just the queue/bridging models to link them as if you provide for one you exclude another or have to provide all.

If you take Whisper the install is
pip install git+https://github.com/openai/whisper.git
It uses ffmpeg sudo apt install ffmpeg
It runs via whisper audio.flac --model medium

It really doesn’t need a web page to be setup… Its support is on its webpage and its herd is much larger than rhasspy with multiple how-to’s and alternative refactored code.

donburch · December 23, 2022, 10:43am

Michael, you have indicated that web development isn’t your thing … understood, and i agree that your effort is much better spent on the technicalities (what I think of as “the back end”). So I am seriously considering giving the Junior UI a go myself.

I am particularly suspicious of things like wi-fi which are sold as “it just works” like magic - because invariably they dont.

So separate, but definitely not optional - especially for new users. People need to check that their audio devices are working, setup friendly names for HA devices, and check/edit the values for the arguments in intents, and see error messages.

EDIT: body of post moved to a new topic: Home Assistant Rhasspy Integration GUI

Well, not by me then either

tjiho · December 23, 2022, 1:57pm

2023 sounds amazing !

About web UI, if there is a good web api, UI will follow. There is a big community, someone (me ?) will develop a ui if there is a web api.

Manage it with command line would be a really great plus.

donburch · December 23, 2022, 10:36pm

Another thought …

I appreciate that Rhasspy Satellite is a fairly recent concept which required a significant refactoring not so many versions ago … and so at the time it was considered an advanced option … but has it now proved itself as the best logical approach for Rhasspy going forward ?

Experience has shown that Rhasspy satellites use only audio input, wake word detection, MQTT and audio output modules. By packaging just these modules (yes, I strongly believe it should still be modular) the overhead is reduced.
Would this be a reasonable subset to implement on a cheap ESPHome platform ? Have you had discussions with Nabu Casa’s ESPhome team about audio options ?

How many users have only an all-on-one Rhasspy ? And in these cases would it be reasonable to run separate instances of Rhasspy Base and Rhasspy Satellite on the same machine ? To run Base it must have reasonable CPU, so would the extra overhead be significant ?

So… am I really suggesting splitting Rhasspy core into 3 or 4 separate but closely tied projects - Rhasspy Satellite, Rhasspy Base, Rhasspy GUI, and Rhasspy training ?

rolyan_trauts · December 24, 2022, 1:54am

Rhasspy satelite is an absolute terrible bloat and a ridiculous idea akin to making a module for a Rhasspy Keyboard for input that uses a net based MQTT network to receive its key strokes.

We have always been missing a module which is a KWS server / audio processor that sits and queues KWS to an ASR and also contains further filters, VAD or AEC if that is how you wish to setup your initial audio stream.
Rhasspy talks to KWS server / audio processor and a KWS server contains modules that likely the only preferential constraint is that a single zone (room) contains the same model of KWS so that argmax is comparative but even that is not essential.
KWS are just ears that are extremely simple input devices that are set up as channels in a zone for input audio that simply mirror the same system of many of the current wireless audio systems available.
Its a very simple premise but audio in on a zone provides audio out on that zone…
It has a minimal number of commands which is not much from start and stop and it doesn’t even have a pixel ring as a pixel ring is a standalone Ha device where a zone may only have a single shared pixel ring whilst KWS might even be hidden, whilst a pixel ring could be prominent and central.

There is no such thing as a Rhasspy satellite as all was ever needed was wireless audio and wireless KWS in a simple zonal system.
We do need a Rhasspy KWS server just as RaspiAudio SqueezeLite has LMS server to cordinate or Snapcast, Airplay or even Sonos (Not that I know much of that system).

Or MQTT Rhasspy keyboards it is…

A KWS will stream to a KWS server that may filter and apply Rhasppy metadata of the zone and channel of origin so that TTS output is a simple mapping to the same, where is purely a bridge so any KWS device can work with Rhasspy.
Streaming from that point is a strange one as all the latest and best ASR uses quite long CTC and uses a mixture on phonetics and sentence context to make highly accurate results as does say OpenAi’s Whisper and actually trying to stream to such models it causes a hike in load and lowers accuracy as often the context width is reduced and from playing the latency is not all that much different.

If you are going to copy consumer ewaste from the likes of Google & Amazon where each unit is this all-in-one then a streaming mode of older smaller models because that is all will fit and run then maybe streaming mode is a thing.

If you are going to have a modern multi zone Sota voice system you would have a single brain fed by distributed KWS and models would not be streaming to garner context but run far faster than realtime so latency of return is not noticeable but also so they don’t lag on multiple requests.
You only have to get to 2/3 zones and the investment cost of a central well powered single brain starts to become more cost effective as the only addition is KWS ears and Audio out cost can be discounted because it is already encompassed as that rooms wireless audio system. If you went for a PI4 with constrained models the 2nd only needs a Pi02W for audio in & audio whilst processing happens on the 1st and here is where argmax comes in as the kws-server could pick the best stream or a preferential default.

You can still put a centralised system KWS & Audio in a box and use it as an all-in-one but the all-in-one peer-2-peer type control network of Rhasspy satelite is an absolute thunderclart of unnecessary and complexity as why are we copying Google & Amazon when there are clearly better less ewaste infrastructures that can be easily accomplished where opensource can excel.
Its even a copy of a single enclosure but even Google & amazon worked out client server is the most efficient way and that should of been copied as a home server not a single box.

KWS are generic devices that just need a ‘driver module’ installed in the KWS server make a brand of one yourself by all means but the are just an auto broadcast on KW mic with start and stop commands and literally that is all that is needed.
Voice commands are highly sporadic and voice system that spend much time idle its absolutely text book centralised server and for some reason we have gone peer2-peer and lost all the advantages of cost and load that can provide via a single home server where the only clients needed are audio in and out and absolutely kick the ass of Google & Amazon.

synesthesiam · December 24, 2022, 6:10pm

From this comment, it honestly doesn’t seem like you really used Rhasspy satellites much. We all want things to be improved, but there’s no need to be so negative about something a lot of people here worked hard on. Especially when many of the complaints were addressed ages ago.

Most satellites used an internal MQTT broker with local KWS and VAD, and just did HTTP calls out to the base station for speech to text, etc. with their siteId (which could contain a “zone”). So no, “keystrokes” were not going over the network. And the satellite didn’t even have to be running anything related to Rhasspy, as long as it could HTTP POST some WAV data.

Again, there is a lot of room for improvement here. Streaming raw audio from satellites over MQTT/UDP is obviously not going to scale with many satellites. And setting up satellites in Rhasspy is unnecessarily complex since it was bolted on later, rather than part of the original design.

As @donburch said, a lot of Rhasspy users are probably using it in base station/satellite mode, so this needs to be at the forefront for designing v3. And I absolutely agree that “design” here should not entail Rhasspy-flavored versions of already existing standards!

This is what I’m thinking, though “Rhasspy Satellite” could just be a configuration in the base station relating an existing streaming audio service to a zone (as @rolyan_trauts has alluded to).

Yes, I’ve talked to Jesse (the ESPHome maintainer) about this some. Paulus has a contact over at Espressif, so I think the plan would be to get their audio framework involved. Espressif has a number of two mic boards based on the ESP32 that could form the basis of a fairly cheap satellite (that dev board is $20 on mouser). I don’t know what would be involved with getting ESPHome onto it, and if it would be possible to still do local KWS and AEC.

rolyan_trauts · December 24, 2022, 9:08pm

You know very well from multiple comments and from the very start of dev on the ‘satellite’ I was totally opposed to the bloat and complete lack of need for it.
I can not help that fact people spent a lot of wasted time developing something without functional need and my objection has always been the unnecessary was developed and still is unnecessary whilst a crucial part of audio processing has always been missing.
The satellite mechanism is completely pointless and is just wasted load on what a satellite needs as its purely audio-in & out and its not my fault the dev continued whilst I was ignored.

Non of the complaints where ever addressed and there is a constant stream of confusion in the forum history on how to handle very simple multple KWS zonal systems.

Yes and it has never been fixed and I have repeatedly posted for a long time how simple the fix is and you just contradicted yourself in the next sentence, its not a fix its a badly fitted bandage.
There is no value or IP to what has been developed the ‘satellite’ dev veered off at a acute and complex direction to the detriment of the simple addition of a KWS server where a hugely important large load of audio processing could be shared that could allow even simple micro-controller to be satellites and I have been constantly bemused to why?
VAD can be central all you need is to be able to tell a KWS mic to start and stop and once more the fix is that simple. VAD should be able to reside on the satellite or central but currently it can not and the supposed fix forces so much unnecessary as a peer2peer client style architecture when a simple client-server would of sufficed.

Is it not time to get it right and fix it?

rolyan_trauts · December 24, 2022, 10:18pm

I will write it here in a relatively brief explanation but if you partition elements in basic lowest common denominator building blocks you can just collect those together to create any form and complex but there is choice of all.

If you embed function without need you will always be shackled providing for ill placed function and create a confusing and complex infrastructure and exclude certain choice.

There are only 2 types of interaction in a voice system Instructions and Responses.
A instruction is the OP and a response is prompted by a TTS question.
A instruction just needs the zone/channel and audio, whilst a response which turns on a mic has a skill server that got the original instruction and merely returns that zone/channel metadata but includes the skill server it is so the response audio can be returned.
A KWS server receives that and turns on the corresponding mic and the next response audio is shipped and returned to where needed because the skill server data is there.

Thats it that is also how simple the protocol could work because a voice system does not need to know about control.
It merely ships and routes what a voice server should do whilst skill servers do control.

There has always been 2 elements missing in the chain firstly a KWS Server and secondly a skill router.

The skill router is an intermediary fed by ASR that uses the attached metadata to do some very simple routing.
It forwards on predicate to the matching predicate skill server and if that skill server requires a response it returns to the skill server as there is only need for a single 1to1 simple low latency connection.
The Skill router sends TTS text to TTS and awaits a completion and then tells the KWS to turn the mic on.

Its the same for any type of voice interaction and everything is just a repetition of the above and its really simple and partitions the modules into basic function and those simple methods can be reused to create whatever needs and the complex but that is choice.

Keep Rhasspy as it as and restart anew V3 ( as new and seperate) with a simple, uniform API to local open source voice tools as what is needed is exceptionally simple and huge swathes of current has really no functional necessity apart from that it is.
Then if people want to use what exists because they developed it then they can, but don’t shackle once-more to what in the majority is functionally unnecessary for a voice system and even worse still not implement crucial elements such as audio processing.

romkabouter · December 25, 2022, 11:03pm

No it is not. That is just your, as always, totally unnecessary negative opinion.
If you would have put as much positive energy in helping Rhasspy to become what your vision is, Rhasspy would now be very close to that.
Instead you have chosen to only put a huge amount of negative energy into complaining and whining again and again into what you think is all so terrible.

Why is that? Why do you only choose the negative path on this instead of putting that same effort into actually changing things to the way you would like it so see? I have asked that a couple of times, but still no answer.
I really do not understand this and when Rhasspy as a whole is so terribly you still keep posting your negative comments instead of just finding other systems you dó like.
Most of the time I skip your lengthy and incoherent posts, but that question always pops up when I scroll past them.

donburch · December 25, 2022, 11:19pm

I picked up on rolyan’s comment a while back that he abandoned Rhasspy several years ago and has no experience with rhasspy satellite. Yet based on this total lack of actual experience he is stuck vehemently repeating allegations that only he seems to believe (like that Rhasspy is inextricably linked to Raspberry Pi), about software that is long since history.

Personally I don’t see much conceptual difference between cheap devices with mic and speaker spread around the house which listen for a keyword that are called “ears”, and the same device with same purpose called a “satellite”. Sure a Rhasspy satellite has the same user interface, but I don’t consider calling modules on a server to do all the cpu intensive processing as “bloat”. Similarly that rhasspy’s modular client-server architecture somehow does not allow KWS to be done on a separate shared server if one so wishes, or on the client so the audio doesn’t have to go through the LAN. He seems so fixated on using his own terminology that he can’t see that Rhasspy is conceptually pretty much what he is promoting.

I freely admit that, while Rhasspy’s documentation does contain all the required information, it is not arranged in a way that makes base+satellite configuration clear. I guess there must have been quite a bit of confusion at the time of the transition. And the confusion continues, resulting in new users needing to ask for help on the forum; often having struggled to piece together the necessary pieces of information spread through the documentation. Please Michael don’t take this as an attack - I don’t like writing documentation either, and at the time you were adding satellite to existing documentation.

I suggest that rearranging the current documentation to make base+satellite the default configuration (and all-on-one as the advanced option) would help. And a comprehensive tutorial for new users … which I started and got to 30 pages before deciding I needed to re-think my approach. Now I’m not sure whether v3 will make it a waste of effort.

Bottom line, I really am puzzled that rolyan spends so much time on the Rhasspy forum, given his extreme prejudice against it. I suspect rolyan could have developed his own system with half the time and effort he has spent trolling Rhasspy.

rolyan I don’t understand why you could feel responsible for other people’s effort; more so because you are the only one that considers it a waste. If you really believe it to be a waste, why not just move on ?

vajdum · December 26, 2022, 1:00pm

Beside the logo i dont like the name rhasspy either. In my language it sounds the same as raspi which leads to so much confusion i nearly never are able to use the word rhasspy. I need to use something like speech thing or such.