Hogwarts Assistant

My partner and I are big Harry Potter fans. So, I decided to build a Harry Potter inspired digital assistant for our house, I call it the Hogwarts Assistant. I’m using Rhasspy for the speech recognition and intent detection. I wrote a Python service, which I refer to as the Conductor, that is responsible for handling the detected intents and doing something meaningful with them. This all runs on a Raspberry Pi 4 and leverages a Jabra Speak 510 for the microphone and speaker.

The following is a breakdown of some, but not all, of the functionality. It seems like I’m adding intents ever couple of days as we think of additional interesting things to do with our setup. :wink:

=============

Wake Word

  • hogwarts: custom trained wake word. A ‘magical’ sound effect is used for the Wake and Recorded audio experience. Also, if an intent cannot be detected, quotes from the films are used to notify the user of the error.

Light Intents (using LIFX lights)

  • nox: turns off the lights. A ‘magical’ sound effect is played during the transition.
  • lumos: turns on the lights. A ‘magical’ sound effect is played during the transition.
  • dim the the lights to (1…10): adjusts the brightness from 1-10. A ‘magical’ sound effect is played during the transition.
  • its movie time: sets the lights to an optimal state for watching films.
  • set the mood to (slytherin | hufflepuff | gryffindor | ravenclaw): sets the color of our lights to the corresponding house’s primary color (e.g. slytherin == green, gryffindor == red, etc…).
  • mischief managed: resets the lights to their normal state.

Weather Intents (background sound effects are used for the forecast (e.g. if the forecast calls for rain, you hear rain in the background while the forecast is being read))

  • what’s the (1…7) day forecast: reports the N day forecast.
  • what’s the (monday | tuesday | wednesday | thursday | friday | saturday | sunday) (weather | forecast): reports forecast for a specific day of week.
  • what’s it like outside: reports current conditions.

Chromecast Intents

  • stop video: stops current media.
  • (play | pause) video: pause/play current media.

Misc Spell Intents

  • expecto patronum: temporarily pulses the house lights to mimic a patronus charm
  • (avada kedavra | crucio | imperio): since these are unforgivable curses, the user is warned to not use them. The warning pulls from a random set of humorous responses.

Misc Intents

  • (what did you say | say that again | can you repeat that | repeat that): Conductor re-runs the previous response.

News Intents

  • what’s the [muggle] news: Plays the latest hourly NPR audio stream. An intro is played first, indicating that NPR is the Daily Prophet’s muggle affiliate.
  • what’s the magical news: Reads a random article from the Daily Prophet fan fiction website (https://thedailyprophet.net/).

Calendar/Time Intents (clock ticking background effects are played when handling a calendar / time intents)

  • when is it: get the current date and time.
  • what time is it: gets the current time.
  • what day is it: gets the current day.

Joke Intents

  • tell me a joke: tells a random Harry Potter related joke.

Routine Intents (the Conductor can define more complex ‘routines’ which may perform a series of operations)

  • good morning: sets the lights to their normal state, tells you good morning (from a random set of phrases), optionally inserts some Harry Potter inspired humor into the good morning (from a random set of phrases), tells you the current weather forecast and wishes you a good day (from a random set of phrases).
  • good night: dims the house lights and wishes you good night and healthy sleep (from a random set of phrases and optionally inserting some Harry Potter inspired humor)

=============

I don’t have any videos to share that demonstrates our setup but I can create some samples if there is interest. Stay creative everyone!

9 Likes

Thanks for the description! I’d love to see a short video with the “magical” sound effects :slight_smile:

Here’s a quick video demonstrating some of the functionality. We don’t demo everything but it should give you the general idea. Hopefully the audio is good enough for you to catch all of the effects. For instance, sound effects are used during various light operations, for the weather, telling the time and playing the news. Also, while the video is zoomed in on just one of the lights in our house, the transitions occur with all of our lights. Sorry if the video quality is poor, I’m no videographer. I hope you enjoy. :wink:

5 Likes

I’d love to see al your configuration and sentences, really nicely done!

Very impressive! Would you be OK with me posting this video to the Rhasspy Twitter feed?

Thanks for the kind words. I’d rather it not be posted to Twitter. I was even a bit hesitant to share on this forum, as we are pretty private people. Thanks for your understanding.

1 Like

Sure, no problem :slight_smile: Thank you again for sharing it!

1 Like

Is it possible to have some code examples?

1 Like

Thanks for the interest. I don’t have plans to publicly stage my Conductor code but I’ll spend some time putting together a technical writeup that describes the architecture in greater detail.

1 Like

I told a friend about this today, and he suggested you try and sell it to Universal Studios to go in the “magic” wands they sell to folks in Florida :smiley:

2 Likes

If we talk about a commercial issue, ok, but why post on an open source group, which made everything available to you for free?

Here are some additional details on my setup.

Hardware

  • Raspberry Pi 4 8GB
  • Jabra Speak 510

Rhasspy Install

  • Pre-compiled Debian package
  • Running as a systemd service
[Unit]
Description=Rhasspy Service
After=syslog.target network.target

[Service]
Type=simple
ExecStart=/bin/bash -c '/usr/bin/rhasspy --profile en 2>&1 | cat'
RestartSec=1
Restart=on-failure
StandardOutput=syslog
StnadardError=syslog
SyslogIdentifier=rhasspy
User=pi

[Install]
WantedBy=multi-user.target

Rhasspy Settings

  • Audio Recording: PyAudio
  • Wake Word: Rhasspy Raven
    NOTE: Each member of our home has a ‘hogwarts’ template trained
    Probability Threshold: 0.52
    Average Templates: Enabled
    Minimum Matches: 1
    VAD Sensitivity: 1
  • Speech to Text: Kaldi
    Language Model Type: ARPA
    Minimum Confidence: 0
    Mixed Language Model Weight: 0
    Silence Method: VAD
    VAD Sensitivity: 1
    Skip Before: 0
    Minimum Duration: 1
    Maximum Duration: 20
    Speech Before: 0.3
    Silence After: 0.5
    Record Before: 0.5
  • Intent Recognition: Fsticuffs
    Fuzzy text matching: Disabled
  • Text to Speech: NanoTTS
    NOTE: The Rhasspy TTS is used as a fallback if the TTS service on the Conductor fails.
    Language: en-GB
  • Audio Playing: aplay
  • Dialogue Management: Rhasspy
  • Intent Handling: Remote HTTP
    Remote URL: http://127.0.0.1:8080/intent
    This URL points to an endpoint on the Conductor.
    Since the Conductor is also running on the Raspberry Pi, I’m just using the loopback address.

Rhasspy Sounds

  • Error WAV: Since Rhasspy supports defining a single WAV file, I setup a CRON job to rotate out the active Error WAV file ever hour from a collection of WAV files.
0 * * * * shuf -n 1 -e /home/pi/apps/conductor/resources/effects/audio/responses/misrecognition/*.wav | xargs -i cp {} /usr/lib/rhasspy/etc/wav/misrecognition.wav

Rhasspy Sentences

[ChangeLightState]
turn (on | off){power} the lights
turn the lights (on | off){power}
nox{power:off} [the] [lights]
lumos{power:on} [the] [lights]

[ChangeLightBrightness]
dim the lights to (1..10){brightness}
its movie time{brightness:4!int}
its time for a movie{brightness:4!int}

[ChangeLightColor]
set the mood to (slytherin | hufflepuff | gryffindor | ravenclaw){house}

[LightReset]
mischief managed

[PlayVideo]
take me to [the] $videos{video}

[StopVideo]
stop video [playback]

[VideoAction]
(play | pause){action} video [playback]

[WeatherForecast]
what's the (1..7){days} day forecast

[WeatherCurrent]
what's the weather
what's it like outside
how's the weather

[WeatherForecastDayOfWeek]
what's the (monday | tuesday | wednesday | thursday | friday | saturday | sunday){day_of_week} (weather | forecast)
what's the (weather | forecast) [on] (today | monday | tuesday | wednesday | thursday | friday | saturday | sunday){day_of_week}

[SayAgain]
what did you say
say that again
can you repeat that
repeat that [please]

[NewsLatest]
what's the [latest] (muggle | magical){source} news
what's the [latest] news{source:muggle}
tell me the (muggle | magical){source} news [please]
tell me the news{source:muggle} [please]

[Spells]
(expecto patronum | avada kedavra | crucio | imperio){name}

[ProphetStory]
tell me a (magical news | bad magical news | black magic | hogwarts | entertainment | matters of magic | muggle | magical places){source} story
tell me a story about (magical news | bad magical news | black magic | hogwarts | entertainment | matters of magic | muggle | magical places){source}

[CalendarCurrent]
when is it

[CalendarCurrentTime]
what time is it
tell me the time

[CalendarCurrentDay]
what day is it
tell me the day

[Jokes]
tell me a joke{source:magical}

[RoutineInvoke]
good morning{name:good morning}
goodmorning{name:good morning}
wakey wakey{name:good morning}
good night{name:good night}
goodnight{name:good night}
nighty night{name:good night}
night night{name:good night}
its nap time{name:good night}
its time for a nap{name:good night}
its time for bed{name:good night}
its bed time{name:good night}
its bedtime{name:good night}

Conductor
The Conductor service is a Python HTTP based service that exposes an endpoint for Rhasspy to transmit detected intents. I used a layered approach when building the Conductor framework. There is a Handler and Service layer. Intents defined in Rhasspy get mapped to Handlers which are responsible for unpacking the Intent’s details and invoking a Service which performs some action (e.g. turning the lights off). An example of the intent handling flow is listed below.

Unfortunately, what you won’t really see from this example is how Handlers and Services get registered and processed by the Conductor framework. When I have the time, I’ll post the generic Conductor framework, likely to GitHub, for anyone to use. I just need to spend some time cleaning things up, pulling out our specific Hogwarts Assistant application Handlers and Services (most wouldn’t be relevant for other use-cases) and documenting how to use it. It didn’t dawn on me that others might find the framework useful. So, I never planned to post it anywhere. But, I’m not opposed to it if the community will get use from it. :slight_smile:

Running as a systemd service

[Unit]
Description=Conductor service.

[Service]
Type=simple
WorkingDirectory=/home/pi/apps/conductor
ExecStart=pipenv run python /home/pi/apps/conductor/conductor.py 0.0.0.0 8080
ExecStop=/usr/bin/curl -X POST -G http://127.0.0.1:8080/admin/shutdown
User=pi

[Install]
WantedBy=default.target

Example Intent Handling Flow

  1. A user says ‘hogwarts nox’.
  2. Rhasspy detects this as the ChangeLightState Intent and sends the Intent details to the Conductor’s intent endpoint.
  3. The Conductor receives the intent and determines that the LightsHandler is configured to process ChangeLightState intents. So, the Conductor passes the intent to the LightsHandler for further processing.
  4. The LightsHandler inspects the intent and determines that ChangeLightState intents map to its ‘power’ method and proceeds to unpack the intent slot content and invoke that method.
  5. The LightsHandler power method simply delegates to the LightsService that is setup in the Conductor. For our Hogwarts Assistant setup, I coded the LightsService to make specific API calls for our LIFX lights.
  6. The LightsService’s power method performs two tasks. It engages the AudioService to play a sound effect and then makes a call to the LIFX API to change the state of our lights.

Excerpts from LightsHandler and LightsService classes

class LightsHandler(BaseHandler):
    
    def __init__(self, conductor):
        super().__init__(conductor, 'lights', {'ChangeLightState', 'ChangeLightBrightness', 'ChangeLightColor', 'LightReset'})

    def power(self, lights, duration, power):
        return self.conductor.lights.power(lights, duration, power)
        
    def _handle_intent(self, intent):
        if intent['intent']['name'] == 'ChangeLightState':
            return self.power('house', 2.0, intent['slots']['power'])
class LightsService(BaseService):

    def __init__(self, conductor):

        super().__init__(conductor)
        self.config = {LIFX: {'token': '<TOKEN_HIDDEN>'}}
        self._base_url = 'https://api.lifx.com/v1/lights/'
        self._timeout = 3.00
        
    def _lifx_headers(self):
        return {'Authorization': f'Bearer {self.config[LIFX]["token"]}'}        
        
    def power(self, lights, duration, power):
        try:
            self.conductor.audio.play(PlayRequest(get_deluminator()))
            lifx_url = f'{self._base_url}{LIGHTS[LIFX][lights]}/state'
            requests.put(lifx_url, 
                         params = {'duration': duration,
                                   'power': power},
                         headers=self._lifx_headers(),
                         timeout = self._timeout)
        except Exception:
            raise LightFailure(get_general_failure())
        return Response()
4 Likes

For me, this is the most intresting part :slight_smile:

1 Like

Ah, OK. Sorry, hindsight being 20/20, I guess it should have dawned on me that others could use the basic Conductor framework. :man_shrugging:

I’ll likely have some time over the next week or so to get it cleaned up and posted. Thanks for the feedback! :+1:

2 Likes

Hi All,

I’ve staged the base Conductor framework at the below location. I hope you find it useful. :slight_smile:

2 Likes

@x2012x absolutely awesome your assistant.

1 Like

Great work and nice to see such a project!

1 Like

I was checking the code and was wondering why you do not use the TTS already in Rhasspy?
That way, a Rhasspy user can use the TTS of choice set in Rhasspy and not be dependend on Google at all if he or she wishes it.

If the response contains a speech key, Rhasspy will speak the text with its TTS engine of choice.
It might kill the background audio feature however, so I was just wondering if this was the reason to not use Rhasspy for TTS

I like the background audio feature by the way :slight_smile:

1 Like

Thanks for the kind words.

@romkabouter, the main reason for building a TTS service in the Conductor was that I wanted a bit more granular control over the TTS experience. The ability to do a background track being a big factor. The obvious downside of course is that the current implementation relies entirely on Google’s TTS. Good enough for my current use-case but not flexible enough for the masses I’d say.

One thing worth mentioning, if you didn’t notice, the Conductor will automatically fallback on the Rhasspy TTS (returning contents to speak in the speech key) if there is a problem using the Conductor’s TTS service. So, it’s conceivable that a DEV using the Conductor could tweak the code to not use the Conductor’s TTS and always return the speech content to Rhasspy instead of only using it as a fallback.

I’ll make note to consider adding a configurable attribute that tells the Conductor to use it’s built-in TTS or Rhasspy’s. That way a DEV wanting to exclusively use Rhasspy’s could do it by way of configuration instead of having to do a code change.

Thanks for the input!

That is a good idea indeed! All and all a good setup :slight_smile:

1 Like