The Future of Rhasspy

aaronc · December 26, 2021, 7:33pm

I’m happy to hear that you have a positive attitude toward both Mycroft and Rhasspy.

There is something of a history of voice computing projects losing steam after developers are hired by commercial projects. The Jasper project died after the developers were hired by Microsoft to work on Cortana. KSimon has been limping along since the primary developer was hired by Apple to work on Siri. So it’s not without reason that people were concerned.

I have also avoided Mycroft for years because I don’t buy their marketing around the cloud component (“we are open source, so you can totally trust us”, “using our servers helps anonymize your activities”). I’d love to feel more comfortable inviting that project into my personal space.

I’m happy to see that Mycroft does make the Selene back end available. I will try and set that up again (the last time I tried was several years ago using an unofficial and unsupported repository, and I gave up after a couple of days). One place where I do see the benefit of a centralized back-end is for organizations like schools and health care providers who need to keep PII protected while also providing a consistent and custom set of services.

Congratulations and best of luck with your new position. Thank you for the efforts you have devoted to open source voice interfaces.

synesthesiam · December 26, 2021, 8:24pm

Thank you again to everyone for the warm replies!

Very happy to hear this As @romkabouter mentioned, language support is very important both with (offline) STT and TTS. Every language we can get enough data for is another group of people that don’t have to rely on Google, etc.

It will! I’ve already started on a follow-on that I’m calling Mimic 3 (to fit Mycroft’s naming scheme). It’s close in architecture to Larynx, but is using a new TTS model that is almost 2x faster on the Pi 4. I’m still figuring out the best hyper-parmeters to use, but once I do I will retrain all of the voices and upgrade Larynx

I understand, and I won’t pretend like I can 100% promise what the future will hold. However, at least with Mycroft what I work on will be completely open source. Snips’ tech was amazing, but it’s effectively worthless now because of the Sonos acquisition.

I will be working to get Rhasspy’s tech into Mycroft over the coming year. The Mark II will hopefully be shipping in September (if they get enough orders), and my goal is to have offline STT and TTS for all of Rhasspy’s supported languages on board.

Definitely, and unfortunately. I feel like having some shared standards could help ease the burden of shifting between projects. For all its warts, Hermes enabled a lot of Snips users to switch over to Rhasspy without redesigning everything. I hope to finally have some time to work with @sepia-assistant and @DANBER to establish those standards.

No company is immune from acquisition, so you shouldn’t trust them! Mycroft seems very unlikely, however, to go down the same road as Snips.

You’re welcome, and I see myself as a long ways from finished

TinyDoT · December 27, 2021, 1:24am

Congrats on the new job!!!

Seeing everything you contribute will be open source, what works for one project will most likely work for the other project, so win win.

Byolock · December 27, 2021, 3:13pm

For me Mycroft is lacking three important things :

TTS Support is very limited. For German I only have the unusable slow Mozilla Deepspeach Model or low quality options. Hearing you want to create an even better Larynx called Mimic 3 which can be used with Mycroft sounds really great.
Im not using Home Assistant for controlling my Smart Home, so I need an option to process the intend in NodeRed or at least configure Get & Post HTTP API Requests. And it’s just great if you want your own automation flows. Like my alarm flow im currently building in rhasspy does not only ring an alarm at some time but also powers up my room heating and slowly increases brightness of my lights beforehand. (While writing this comment i found this : https://github.com/JarbasHiveMind/HiveMind-NodeRed . So there seems to be some kind of community skill which is able to do this)
It couldn’t be used completely offline. Sounds like this is going to change too.

Im actually really looking forward to this.

jrb5665 · December 28, 2021, 11:03am

One trick I use to improve performance is to use the engines with caching.
So what I do is have most responses as simple as possible, “like the light is on” rather than “the bedroom light is on” and for things that are more dynamic like weather, temperature, humidity, time date, etc I actually have node read send the sentence to TTS whenever it changes but only play it when there is a request, so it is already cached and the response is close to instant.

sanyaade · December 28, 2021, 4:40pm

Jarbas is an excellent buddy with one stop shop for everything AI, Voices and more…

HouseHasWheels · December 29, 2021, 2:07am

Happy Adventures with the new job.

For me Rhasspys’ ability to do custom intents is my favorite tool.
My Rhasspy installs: In a silly bot, in two solar monitoring stations (RV and house), in an offline house assistant, in a offline RV assistant, all simply give info to my custom python programs which do all the handling, and send responses back, go online and post this or that api, do gpio stuff, turn camera on and off whatever.
Rhasspy has the Power of Whatever: so that’s my favorite feature.
I did play with its home assistant integration as well, but found my custom python stuff much more usable for my unique projects.
For me to move to mycroft:
*. No need to be online, unless updating something, and no need to have anything posted to someone elses’ cloud server to do anything, sure If I need the weather I can bring LAN or Wifi up (I have an intents for that), Python does its thing like a request to a weather api, then LAN down wifi down (intents for those as well). It all works with Rhasspy.
*.I do use the mycroft custom wake word (I log into an older install of Ubuntu Studio I didn’t erase just because I setup the mycroft custom wake word tools on it awhile back.I’ve not needed to make any new ones in awhile though.

VoxAbsurdis · December 29, 2021, 7:52am

I deeply respect you @synesthesiam. While It saddens me that you’ll surely have less and less time for Rhasspy, I understand the need to eat and pay the bills. What devestates me far more than losing Rhasspy though, is that your skills are being gained by somewhere as deceptive as Mycroft.

Yes they tout loudly that their code is open, but this is not a privacy respecting project, nor is the company behind it. I came to Rhasspy precisely to get away from the likes of Mycroft. There are reasons that their project is specifically structured to default to using their cloud services (sound familiar?). There are reasons they demand user logins (also familiar…?). Their project has been specifically structured in a way that makes self hosting, and I quote “not easy and is unlikely to provide an equivalent user experience [as sending your personal info to their computers].”

I mean, is Mycroft different than GAFAM? Maybe, but if so, it doesn’t seem to be for lack of trying to emulate them. Let’s take a quick look, not at their glossy marketing homepage, but at their privacy policy (which covers both their website and their services). Just a quick skim through, copy and pasting the privacy relevant bits comes up with:

When you use our Services including the Mycroft Voice Assistant, your voice and audio commands are transmitted to our Servers for processing.

Because they made it, you know “not easy” to self host, like say Rhasspy currently is.

We collect information about you directly from you and from third parties, as well as automatically through your use of our Site or Services

Just to clarify that it’s not just “opt in” stuff.

When you use our Services, your audio commands are transmitted to Mycroft for processing, as part of the Services. We may also collect other metadata about your audio commands, such as the time and location.

Note that it’s not limited to information needed to fulfill the request.

we collect information about your device, including platform type and location

Once again, specifically not limited to fulfilling the service.

If you comment or post content to the Services, we may gather data about the content you post.

Just in case you were wondering if they would also use the content.

We automatically collect information about you through your use of our Site and Services, log files, IP address, app identifier, advertising ID, location info, browser type, device type, domain name, the website that led you to our Services, the website to which you go after leaving our Services, the dates and times you access our Services, and the links you click and your other activities within the Services.

That’s a list that would make Google proud, especially the advertising ID. What could that be for?

To send you news and newsletters, special offers, and promotions, or to otherwise contact you about products or information we think may interest you.

Because privacy focused companies are all about promotions and special offers.

for other research and analytical purposes

(how much vaguer and more all encompassing can you get than “other research” and “analytical purposes”?!)

To protect our own rights and interests, such as to resolve any disputes

So you control the assistant in my home and can use it’s info against me in a legal dispute?

We may share your information, including personal information

Just to be clear…

We may disclose the information we collect from you to third party vendors, service providers, contractors or agents who perform functions on our behalf.

Once again, pretty much to anyone.

If we are acquired by or merged with another company, if any part of our assets are transferred to another company, or as part of a bankruptcy proceeding, we may transfer the information we have collected from you to the other company.

As with pretty much any company… which is one of the problems with company run projects.

We may share aggregate or de-identified information about users with third parties for marketing, advertising, research or similar purposes.

Just in case you thought they were only collecting to improve the project.

We and our third party service providers use cookies and other tracking mechanisms to track information about your use of our Site or Services. We may combine this information with other personal information we collect from you (and our third party service providers may do so on our behalf).

Oh? Like who?

We use automated devices and applications, such as Google Analytics

Ah, that’s who (at least one of the who’s…).

our Site does not recognize browser “do-not-track” requests

Real “privacy focused”, right?

If you’d like to update your profile information with us, you may do so through your account. […] we may maintain a copy as part of our business records.

So you can ask us to delete it, buuuut… “business records” y’know?

Our Services are not designed for children under 13; and children under 13 are not permitted to have an account with us. If we discover that a child under 13 has provided us with personal information, we will delete such information from our systems.

Why no users under 13? Much like Audacity moving to ban users under 13 when they added tracking and Google Analytics. In many areas it’s illegal to track personal data about children for commercial purposes. If you want an easy surveillance advertising based income, you gotta get rid of users under 13.

A couple of these by themselves could be understood, but what’s the overall pattern here? Is this the pattern of a privacy centric company, recording advertising ID’s, location, IP addresses, content and using it to give to third parties and marketing?

I’m sorry if this doesn’t come across as supportive as the other comments here. Most likely this post will lead to personal attacks against me by people who feel that they’re defending you, all while encouraging you towards being controlled by “© Mycroft AI, Inc.”’. Yes, the code may still be open, but make no mistake, this is open code to use against privacy. If I didn’t care about you I’d just be silent. I’m posting this precisely because to me this looks like a good, ethical, skilled person being bought by a company that has long fed off misleading FOSS and privacy advocates. This is a clear attempt at swallowing the more private competition, ie you and Rhasspy.

I wish you well, I really do, but this is a very sad day, indeed.

Hopefully a more honest company makes you a better offer soon. Hopefully I’m wildly wrong. Reading their privacy policy though, I doubt it, so I do hope you’ll be careful to keep an escape plan handy in case you need it later. I would have happily thrown hundreds of dollars at a crowdfunding campaign for Rhasspy, but Mycroft will never get a single cent from me.

Anyway, thank you for all the good that you did for Rhasspy and the community. It was good while it lasted.

romkabouter · December 29, 2021, 8:40am

I hope you are wrong as well, I was not using Mycroft but reading your post makes me think twice (and some more)

I will be sticking to Rhasspy anyway and I hope it will still evolve, maybe slower or maybe other developer pitch in.

Damien · December 29, 2021, 11:37am

Congratulations for your job ! This should help you for a better life…

Thanks for all your works on Rhasspy, very usefull and really private by design.
Things I really appreciate : fully offline tts and stt, docker installation, custom wake word (I’m not fond on “ok Google/Alexa/jarvis/my lad…”), ability to change speech recognition (deepscpeech is the future, well, i Hope so).

Best regards,
Damien

synesthesiam · December 30, 2021, 4:03am

Thank you for the reply, @VoxAbsurdis. I’ve only worked at Mycroft now for a month, but I may have a little more insight than I did previously.

A lot of the problems you mention all seem to stem from the same issue: Mycroft’s (current) dependence on Google for their speech to text. If they send audio to Google (aggregated at least, but still), then their privacy policy must be at least as worse as Google’s. My hope is to get them off Google, and they are all for that

As far as users under 13, I actually totally get that. “Child” is legally defined differently around the globe, so some cutoff us surely needed (not sure who came up with 13). I learned about this problem when I found out how bad the open source face recognition software was for photos with my kids: there’s almost no training data. Turns out, anyone who collects a bunch of data on children (pictures, audio, etc.) is usually not doing it for the good of humanity

I had considered this, and I appreciate the offer; do you have any idea how I would sustain such a thing? I don’t really have anything to offer as a subscription (unlike, say, the Home Assistant folks at Nabu Casa). What would I have asked people to crowdfund?

I came across the Libre Endowment Fund, and thought something like that might have been a good fit. If anyone knows about something similar – a public fund for open source software – or a way to fund the development of open voice software for people with disabilities, I’d love to hear about it.

Jarvy · January 3, 2022, 5:07am

There are potentially a few options for open source crowdfunding for something like Rhasspy. From the simple GitHub Sponsor or Patreon approach (where you could have a devblog for folks who contribute). I’ve also heard of some folks doing a Kickstarter for a “year’s worth of development”. You don’t necessarily need something additional to offer, as the folks who would contribute likely just want Rhasspy, as it’s established and you’re always keeping it up-to-date (not to mention writing software that also has research benefits). Some folks have also been successful with getting grants for their open source stuff, but you’d have to find them first.

The downside would be increasingly more overhead (GitHub the least and grant-writing, likely the most) and getting up to a full developer salary would be tough. It’d likely require a bit of self-promotion. I’m not sure if we’re ready to value open source software properly as a culture, but I hope we get there.

fluidvoice · January 3, 2022, 10:20am

Yep, that was my read from the start. And sadly it’s been the norm in the speech recognition industry for decades now. Every ASR/TTS platform that’s any good gets bought by on of the GAFAM gang members so they retain control over the industry - and the effect is higher prices and more difficulty for independent developers to build voice apps - which is probably their main goal… to make sure only the big gorilla companies own the voice-apps. My guess is Mycroft’s goal/exit-strategy is to get acquired by a gorilla. And because of this, I don’t think any dev should waste time building apps on/with it, otherwise you’ll be wasting your time like the devs who built on Snips.

synesthesiam · January 3, 2022, 3:07pm

This reminds me of Jeff Geerling, who I follow on a few platforms. He seems to be doing great, but I don’t think I could stand doing all of the necessary social media promotion in addition to actual development.

Absolutely agree. One person can make such a difference, but for some reason there is very little support. In my time working for the U.S. government, I saw millions of dollars wasted (in my opinion) on projects that, even if successful, got us no closer to something actually useful.

Just to be clear: I went to Mycroft asking for a job; they didn’t come to me. I interviewed with 4 other companies as well: 3 of them said that if I were hired, Rhasspy was to be shut down immediately and everything I did going forward was going to be closed sourced. 1 company was currently open source, but the backend they planned to create was not going to be (remind you of Snips?)

I get the cynicism, but I’d suggest looking at the broader picture. Even if Mycroft gets acquired at some point in the future, everything is still open. Contrast this with Snips, where they published an amazing paper about their training backend, but it was only ever a promise that they would open source it – and they lied!

I still believe our two best defenses against the GAFAM gang are (1) don’t support any company that isn’t fully open source and, (2) build interoperable standards between voice assistants so that when they inevitably do get acquired or abandoned, we can just shift to a new project without starting from scratch.

DanielW · January 3, 2022, 8:37pm

@synesthesiam Congratulations on your new job. Topic wise it seems like the perfect fit.

Rhasspy is quite impressive considering that it mainly was made by one person. I have always been impressed buy the amount of work and care you have put into it. Thank you.

But it was also sad to see how little traction Rhasspy was getting. There is a huge OSS community for smart home (Home Assistant and others) and a lot of those users are using Google or Amazon for voice control so I hoped more would switch to Rhasspy. The end result is: It still very much depends on you. there are just noch enough users and therefor developers.

I hope that making Mycroft more OSS and 100% offline usable works out. Until then I hope Rhasspy lives on.

(I would have been willing to support it on Patreon btw. But I don’t think there are enough users to pay you a fair wage using Patreon)

What I would need to use Mycroft are: 100% offline usage, German TTS and Hermes or similar open usable protocol (I am using it to show feedback to voice commands on LED matrix screens).

My naive plan/hope/strategy would have been:

You’ve mentioned Nabu Casa: I think Rhasspy fits very well in their open smart home vision. They don’t really have a working offline voice assistant solution in HA. It would be great if they could support a few months of work on Rhasspy with a focus on out of the box user friendly usage with HA. (integration in the HASS OS, integration with entites and auto generation of slots and some form of repository for intents/automations/scripts). With that done they could market Rhasspy as default offline voice solution for HA. Rhasspy would get more users from the HA community and more users means more supporters (devs and possible Patreon supports). (but well the other issue is: Readily available hardware without much DIY work that you can put in a normal living room. (basically a Pi with a speaker and good microphone(s) in a nice looking case.) )

In any way: Thank you for your work on Rhasspy and good luck on your new job.

Jarvy · January 4, 2022, 12:01am

It’s fun that you mention that, as that is what I’ve been working on with Home Intent. Intended to bridge the gap between HA and Rhasspy by auto-creating slots/sentences/responses where users can just connect it to Home Assistant and it manages Rhasspy for them.

In the next month or so, I’m planning to get better satellite support and start looking into Hass addon store.

donburch · January 5, 2022, 5:07am

Congratulations Michael on the new job - it’s great you are working at something you love, getting appreciated for it financially, and closer with your family !

I’ve had a quick look at Mycroft, and am also of two minds. Both projects will benefit from a closer association. I just hope that it continues the way you hope.

What is your relationship with Nabu Casa ? If they can dedicate a programmer to ESPhome (15% of their user base), then why are not doing more to provide offline voice assistant to the 75% who currently use the big commercials ?

DanielW · January 5, 2022, 3:09pm

Oh nice. Didn’t know about it. Will give it a try when If ind some time.

Just a suggestion: I think it would be gerade to add somewhere early on the site that Home Intent is based on Rhasspy (and a link to this community) and is for use with Home Assistant. Or that it is the bridge between Rhasspy and Home Assistant or something like that. As it is now I would have thought “oh a new alternative to Rhasspy” if I randomly found the website.

Do have any plans to integrate something like “Apps” oder Scripts for functions that are more then just Home Assistant Intents? (Maybe just add a simple ui to enter python scripts that are using some the APIs developed by others in this community).

Edit: I should have read more in the documentation before askig about the script/app issue. The Component feature is for that. Then my question would be: What about CustomComponents and some form of repository for those.

Jarvy · January 9, 2022, 3:01am

Ah, I do actually mention it right away on the GitHub page. I’m actually not sure where people would head to first, but yeah, I’ll update the homepage of the docs to indicate it’s based on Rhasspy!

I do like the idea of a repository of custom components that are accessible via the UI! I don’t know of anyone who has written a custom component yet, but it’s definitely a consideration for down the line.

rolyan_trauts · January 9, 2022, 8:16am

My pet beef with Mycroft was an uneasy feeling it was the same as it presented itself far beyond of what it was capable.
Its only in the MKII that they actually process the incoming audio with DSP echo cancellation and beamforming and Rhasspy suffered the same as really in the presence of noise it was unusable.
The delay and cancelled crowdfunders and this thing of ‘patent trolls’ just had my alarm bells going.
They have a new CEO and we will have to see which way it goes but there is a strong possibility you could be right.

synesthesiam like all needs $ and has to work and if you have got to work you have to work.
I am not a fan at all of Mycroft because to me it still seems the goal is to present the semblance rather than the real thing and hence why I am dubious.
As for working for them his core skills are bang on the button and the guys got to work even though I am not a fan of the company as likely 99% of us are not doing what we want but doing our best to earn $

I have told synesthesiam what I think of Mycroft but don’t and shouldn’t have any opinion on what someone needs to do to earn $