The more I think about it, the more it seems to me that using one (external) MQTT broker and letting the dialogue manager copy a
lang attribute from the
hermes/hotword/<wakeword_id>/detected message all the way to
hermes/intent/<intent_name> is the most flexible approach.
This gives the users and developers maximum architectural freedom. The two extremes are:
- You could run one Rhasspy instance with all components for the
fr language and the other one for
en. Each component handles Hermes messages with the same language as its profile and ignores all other messages.
- Or you could run all Rhasspy components independently as Docker containers. Some of the components, such as the NLU and ASR, will be duplicated: one with a
fr profile, the other with a
en profile. These components ignore Hermes messages from another language.
In practice, you would run a mix of these extreme scenarios. For instance, you would run separate NLU and ASR instances for each language, but the same TTS component could handle
hermes/tts/say messages with
en and just switch internally to another language output. With the right approach, even Rhasspy apps handling the intents can internally switch on the fly to another language.
This approach is also general enough to pave the way for later additions such as automatic language identification (you can find examples with TensorFlow and I found an interesting paper too): then you wouldn’t need a separate wake word for each language (which is, after all, a clever but ugly hack), but Rhasspy would be able to recognize the language of your command and then forward the audio to the correct ASR component (e.g. French or English). The
lang attribute would be copied over too in that case and the rest of the flow stays the same.