110% behind this direction and for me been a no-brainer for some time.
This will make things far more manageable and the simpler modules are just building blocks for what is essentially the serial chain of voice modules.
This should of always been decoupled from skill servers and all that is needed is a skill router that allows for the simple or the highly complex as you merely add more skill servers without need to maintain or understand the controls and methods of a skill but just pass inference.
A voice system is merely a set of applications / containers / instances that queue and pass to the next module in what is essentially a serial queue.
The less that is embedded into rhasppy means a bigger choice of implementation that is also more scalable.
The metadata needs for a voice system are extremely simple and that simplicity creates a building block system where complexity is choice.
It will be more manageable, offer more modules, be more scaleable and if it done right we could start to see plug & play linux inference based skill servers that can gather bigger herds because they are interoperable and not limited to a single system.
Its as simple as queue → routes that connect to the next stage that just advertises if busy or free.
What you have posted is Intents for Home Assistant and there is absolutely no need in a voice system as that should happen in a HA skill server that is routed and passed an inference?