What's the craziest thing you can or cannot do with Rhasspy?

The privacy sell is a crazy one as ASR has gone offline because big data recognise they can track through services and with Google the offline modes are tightly coupled to services in Android.
Its also a crazy sell as many privacy advocates strangely also employ online services that big data no longer has the voice of intent but still has intent action data.

Having a voice assistant that lacks integrated audio processing where the initial audio pipeline of Beamforming/AEC/Filter/Separation is mostly missing and not integrated with modules has been a huge hole in dealing with noisy environments and the lack of attention this has received is pretty crazy as its the input to the system so obviously can have pretty drastic effects.

One of the craziest things most do as its the way Rhasspy works is to broadcast raw audio over a light weight control protocol such as MQTT whilst the software for existing standards of hugely more supported pools of opensource are shunned purely to label ownership in what is supposed to be opensource is absolutely totally crazy.
Same with audio delivery as if you are to integrate into a wireless audio system then surely it is better to adopt a wireless audio system that already exists than another proprietary branding that is being positioned purely because you have ownership than a choice of already existing and supported and once more piggybacking audio into MQTT that effectively destroys MQTT as a lightweight control protocol borders on absolutely bat s hit crazy.

I think the route @synesthesiam took to accomodate others in their effectively failed Snips attempt was crazy even though commendable as this great lightweight embedded local control platform became lost in this terrible peer-2-peer Hermes control protocol.
The lite nature of the initial Rhasspy to be able to quicky and easily train a localised GPIO controlled voice interface with zero reliance on internet connection than software distribution and upgrades was a pretty unique and useful piece of software that now in its current stage I am confused to if or what its effective at.
I would love Rhasspy to return the wonderful lite simplicity that @synesthesiam originally created and maybe could work well as much of the higher load additions are likely a much better fit to the Mycroft system as the work could be partitioned and benefit from it and separate any conflicting interest.

There is another thing that generally I think is crazy that distributed network microphones are branded and referred to as satellites whilst they should be like any other form HMI or maybe we should have Rhasspy keyboards and Mycroft mice!?
This branding craziness of a plethora of disparate systems has seriously hampered opensource smart assistants where there are clear partitions that should be a system to themselves that promote interoperability than what for most parts is the opposite and to know what you are good at and to stick to it and provide user choice as supposedly is the nature of opensource.

I’m noticing a trend of people in AI and NLP being on this forum. That’s a good thing, because it means there’s a bunch of critical thinkers on here. The fact that those same people use the software, means that it lives somewhere on the spectrum between state-of-the-art and usability. It is clear however that you have some frustration that research doesn’t trickle fast enough into these kind of projects. I have the same feeling coming from cryptography :slight_smile:
Any specifics that you’d like to see trickle down, especially pertaining to my point " While that’s truly cool, I’ve always been of the opinion that free software can do better than them."?

Exactly what I’m after, cool stuff!

@rolyan_trauts, can I batantly summarize your post as “Rhasspy is doing a lot of hammering screws (voice over MQTT)” and is a typical mess of many open source components that fit together but isn’t really modular yet?

The simplicity is still there, what is it you feel has increased the complexity of Rhasspy?
It is still a tool to quicky and easilty train a localised voice interface wit zero reliance on the net if you choose to.

@rubdos I guess you could look at it that way but its naturally modular as audio in becomes text out which is then fed…
It was made proprietary and non modular with the introduction of Hermes that does little else than confuse and being proprietary its has limited support by a few on here for no reason as better, more standard, more modular, more supported protocols exist…

@romkabouter you know I hate Hermes always did and especially the satellite versions, but lets not go over opinion worn thin.

The Pi0 is dead, long live the Pi02 and for @synesthesiam that for a long while did manage to just about work on a Pi0 but I expect could quite easily optimise for embedded Pi02 as they are so cost effective and likely isn’t a conflict of interest with his Mycroft work.

These are for example?

You need examples of audioRTP and wireless audio? Do a Google and save me wasting some time.

Yeah I know, but Rhasspy on a single setup does not do anything with that. That’s why I ask.
There is no network broadcast or anything, it is just that single simple device so I was wondering what made Rhasspy more complex to use for you than in the early days.

I agree with you there a better ways for real time audio.

Not much as that is what I think is to concentrate on that single setup and ditch much of the rest as that is the bit don’t like. A few later modules like Larynx are prob a tad heavy for that type of ‘embedded’ role and prob conflict slightly with what Mycroft might want to employ. Also maybe trim out some of the modules where there are duplicates without any advantage.
It is really the network broadcast, more complex satellite side that I really didn’t like and the hole in the initial audio processing to be able to cope better with noise.
I am chipping away at the noise thing and very slowly picking up some C++ skills to try and minimise load and if I do get some solutions that I might have I will share.

As yeah I also agree the single setup ‘embedded’ device mode does work quite well and was just thinking the focus should be to make that side lean and give the Pi0-2 some focus as it is just an incredibly cost effective platform.

Yes, that is indeed an issue. It would be really nice if some improvements are made in that area.

Even with big data they get this wrong I swapped my Google Nest Audio for 2x Amazon Echo Gen 4 because they have an aux in so can use them also as wired active speakers.
The Gen 4 is noticeably less accurate in the presence of noise than the echo dot gen 3 that I have also tested and supposedly same with the full gen 3 and its driving me crazy at times as its much worse than the Google Nest.
I am not sure Rhasspy will ever make improvements in that area as some of the biggest improvements Big Data make are by dictate, control and integration of hardware whilst Rhasspy operates in a bring your own mic to the party style of operation that is near impossible to provide for because of how wide ranging that can be.

Even though the Echo Gen4 in terms of noise wipes the floor with Rhasspy its still not great and at a guesstimate the mics might be getting to much feedback due to some sort of isolation problem.
There is a huge amount of engineering that goes into them a lot of choices of algs of use and mic type to how you even assemble, if you get these wrong or are merely unaware you will get worse results.
Having a complete absence of any of the all important audio processing setup in a project leads to the results it has.
In that regard it seems rudderless even though its been an issue for years.

Have you, or are you willing to share your slot programs? I’m especially interested in your music slot generation, as I’m working on a voice interface to my homebuilt jukebox. TIA

I used rhasspy to replace what I initially did with snips.

Currently I use nearly all cloud free software, with the only exceptions being a xiaomi/roborock cleaning robot and a xiaomi air cleaner I brought from China.

I use homeassistant as a “hardware abstraction” and nodered to build more complex automations. I couple rhasspy with homeassistant through nodered to also handle a bit more complex intents. There are probably better ways to do that but it works for me.

For the craziest things:

  • I coupled signal messenger through nodered with rhasspy, so I can text or voice message my flat with commands and get the response back as a message from everywhere
  • I kinda hacked the intercom of the house I live in (there was a youtube video where somebody reversed it with some arduino code) and can tell my flat via signal to press the buzzer to let me in or to turn my nuki smart lock in the same way so I don’t need to carry any keys.
  • I managed to build in selective room cleaning with some node red sorcery so I can tell the roborock robot to clean specific rooms.
  • I have a robot hand and robot arm that just do some stupid tricks when I tell something to rhasspy
  • I put zigbee rgb lights everywhere in the flat and can just control them (probably what everybody else does)
  • the craziest thing for me is that this all works offline (except sending signal messages) and with open source software. I created a flow in nodered that updates the slots with devices that are available in homeassistant and retrains rhasspy, so I don’t need to do much when I add new devices in homeassistant.
2 Likes

So I have done a few things and am working on others. This post is a combo of both. I currently control all of the HA functions in my house using Alexa and node-red. I am in the middle of converting from Alexa to Rhasspy. But I sill want to ask Alexa questions and have her answer them if they are not HA related. But I do not want any Echos in the house. I am working on forwarding Rhasspy commands that are not recognized to Alexa and having Rhasspy recite the answer back. I have this working. So if I ask Rhasspy “what is the population of Athens”, the command will not be recognized and node-red will forward it to alexa. Then node-red captures the response from Alexa and sends it to the Rhasspy satellite.

Some of my requests are HA informational requests and not action requests. For example, I maintain state of all of my lights in the house. I can ask Rhasspy “what lights are on?”, and node-red runs the code and Rhasspy responds back with audio of the list of lights that are currently on.

I am also building some voice recognition models using edgeimpulse.com. So far I have had some fairly good success.

4 Likes

Can you please give us some more (detailed) instructions, on how you forward and receive requests to alexa?

slot_programs/albums:

#!/usr/bin/env python3

import requests

SERVER="http://YOUR_KODI_URL:8080"
USERNAME="XXX"
PASSWORD="YYY"

client = requests.Session()
client.auth = (USERNAME, PASSWORD)
client.headers.update({"Content-Type": "application/json"})

url = SERVER + '/jsonrpc'

albums = client.post(url, data='{"jsonrpc": "2.0", "method": "AudioLibrary.GetAlbums", "params":{"allroles": true}, "id": 1}').json()
for album in albums['result']['albums']:
    if any(el in album['label'] for el in ['[', ',', '$']):
        continue
    print("(" + album['label'] + "):" + str(album['albumid']))

slot_programs/artists:

#!/usr/bin/env python3

import requests

SERVER="http://YOUR_KODI_URL:8080"
USERNAME="XXX"
PASSWORD="YYY"

client = requests.Session()
client.auth = (USERNAME, PASSWORD)
client.headers.update({"Content-Type": "application/json"})

url = SERVER + '/jsonrpc'

artists = client.post(url, data='{"jsonrpc": "2.0", "method": "AudioLibrary.GetArtists", "params":{}, "id": 1}').json()
for artist in artists['result']['artists']:
    if ',' in artist['artist']:
        continue
    print("(" + artist['artist'] + "):" + str(artist["artistid"]))

slot scripts in HA:

PlayArtist:
  action:
  - service: script.play_artist
    data:
      artist_id: "{{ artistid }}"
      volume: "{{ volume if volume is defined else '0.3' }}"
PlayAlbum:
  action:
  - service: script.play_album
    data:
      album_id: "{{ albumid }}"
      volume: "{{ volume if volume is defined else '0.3' }}"

Home Assistant script play_album:

play_artist:
  alias: Play shuffle artist
  variables:
    artist_id: ''
  sequence:
  - alias: Play Artist
    service: media_player.play_media
    data:
      entity_id: media_player.kodi
      media_content_type: artist
      media_content_id: '{{ artist_id }}'
  - service: media_player.shuffle_set
    data:
      entity_id: media_player.kodi
      shuffle: true
  - service: media_player.volume_set
    data:
      entity_id: media_player.kodi
      volume_level: 0.3
  - service: switch.turn_on
    entity_id: switch.speakercontroloutlet
play_album:
  alias: Play an album
  variables:
    album_id: ''
  sequence:
  - alias: Play album
    service: media_player.play_media
    data:
      entity_id: media_player.kodi
      media_content_type: album
      media_content_id: '{{ album_id }}'
  - service: media_player.shuffle_set
    data:
      entity_id: media_player.kodi
      shuffle: false
  - service: media_player.volume_set
    data:
      entity_id: media_player.kodi
      volume_level: 0.3
  - service: switch.turn_on
    entity_id: switch.speakercontroloutlet

Sorry for the code dump on a forum. I should clean these things up and put them on Github. The ones powered by Jellyfin are even more ugly :slight_smile:

3 Likes

I was planning on trying something like that too, would be very interested in how you accomplished this.

I use node-red as an intermediary to communicate between Rhasspy and Alexa. Let me know if you are still interested and I will start a new thread. @romkabouter @schnopsi

1 Like

Still interested ;). Thx!

Yes please, I have NR running as well :slight_smile:

Thanks for sharing! Should give me a headstart on my jukebox.

1 Like