With the separation of Rhasspy to multiple services, I was wandering how to handle the training process when the ASR and the NLU services are not bundled together.
They both need to generate very specific files according to which ASR and NLU system each provides so I think they should be responsible for their own training. But they both require the same intents and slots files…
- Should the intents and slots files be bundled and sent to each service?
- Should the two services share a common volume that contains the profile files (They will be pretty tied up together then)?
How do you think it can be managed?
1 Like
This is something I’ve been thinking about too lately, and I don’t really have a good answer. I agree that the services should be responsible for their own training, though we may extend Hermes to include some training messages.
I’m tempted to code the services so that they simply watch the relevant files (intents, slots, etc.), and re-train/re-load whenever they change on disk. Then, we can add layers on top of that (over MQTT, HTTP, whatever) that pull and overwrite files. In the case where Rhasspy + profile exists on a single machine, nothing extra is needed.
Thoughts?
As the ASR and the NLU services does not really need to keep the intents and slots files after training, maybe a MQTT rhasspy/profile/train
message can indeed push these files from the profile as a bundle directly to these services… They are not that huge so gzipping them should be pretty quick… But I’m wondering what service would send this message?
If these services share access to the intents and slots files, the web ui or a command line can trigger the training. But they will need to be on the same host to access the profile…
In a distributed/decentralized services architecture, what happens to the central non distributed profile files…
Separating the ASR and the NLU services might not be easy…
I think this needs additional thoughts but I’m sure we can find a smart way of handling this
1 Like
For NLU, the intents and slots could be shipped over MQTT as JSON, just like they are now to and from the web UI. They’re both stored as flat files, but bundled up into JSON objects during transit. Sending a single message like:
{
"intents": { ... },
"slots": { ... }
}
could work. What about slot programs and converters, though? We may have to require those to live wherever the NLU service is running.
ASR ends up needing one output of the NLU service: the intent JSON graph. I think this could be the second training message, emitted by NLU and consumed by ASR.
I also have a G2P service for pronunciations that ASR may need to consult. I could have G2P consume the intent JSON graph, extract all spoken words, and emit a message with all needed pronunciations. At that point, all that’s left is the language model, which each ASR service will have to be responsible for.
So overall, it might be: { intents, slots } -> NLU -> { intent graph } -> ASR/G2P and then G2P -> { pronunciations } -> ASR.