The Future of Rhasspy

rolyan_trauts · February 16, 2022, 4:08pm

Dunno @synesthesiam phonetic pipeline based systems are ok, but newer ‘end-to-end’ ASR seem to be providing better accuracy nowadays even if it is a single all in model.
‘end-to-end’ aint the best description but that is how approx differentiate the 2 also ‘end2end’ also can be lexicon free and being possible to handle out-of-vocabulary (OOV) words so don’t need any re-training.
Such as https://github.com/flashlight/flashlight/tree/main/flashlight/app/asr

If only googles new tensor offline ASR was opensource but you can only wish.
Why fixed commands as are they not far more inflexible than a end2end asr with intent decoding by NLP?
I thought the infrastructure of Rhasspy was for low load whilst isn’t your target now a Pi4?

ASR would benefit from transfer learning due to dataset and model size where a local capture model can apply weights to a language model.
I think that is what Google are doing with their new ASR as supposedly it learns specific user intonation and word patterns.
I have never really concentrated on ASR as the input chain seems to have weaker parts of the pipeline on capture and initial keyword so never really progressed further up.
The quality and consistency to dataset of capture is really important and if its not right at the start further up the pipeline recognition will degrade.
So with the new audio board its an improvement even though beamforming alone is not what much state of art employs it is a step forward and guess many are wondering where this will take Rhasspy and Mycroft.

synesthesiam · February 18, 2022, 10:22pm

I have yet to see any end-to-end models that allow you to add new sentences/vocabulary on-device, and also run efficiently on a Pi. Oh, and don’t forget that more languages than English exist

To me, the hybrid approach (fixed commands + open local ASR) is what Rhasspy is all about: offline user-defined voice commands. The added flexibility of a fallback is great, but the overall point is to have the user train the system and not the other way around. I want to say a command in whatever way I want, pronouncing words the way I do, and never have it leave my house.

I am focused on the Pi 4 now, but with 2GB of RAM. So the flashlight models you linked (AM + LM) couldn’t even be loaded!

rolyan_trauts · February 19, 2022, 12:31am

Yeah guess your right Pi probably isn’t the best platform for AI lacking a decent GPU or NPU.
End-to-end if running with a lexicon then just add to lexicon, flashlight is a research framework and C++ so didn’t expect you to be running it, its just has parameters for all and was posted as an example as one.

I was opening things to discussion as the old system is getting quite dated to the rapidly moving voice AI scene and was wondering if you where working on something more current and flexible?
2gb is more than enough for many models, as far as I am aware there wasn’t any models in the link provided just some details of where Facebook research are publishing.
The old system could with a squeeze reside on the original zero and is sort of indicative, where a 2gb pi4 has considerabilly more scope even if GPU it aint great for AI acceleration and lacks a NPU.

I think its probably had its day but if you are going to take the effort to maintain the same then great.

Reason why Flashlight is of interest is that it is C++ though as its interesting as Rhasspy like infrastructure is now available on microcontrollers and I don’t agree the models are huge as like the ESP32-S3-Box demonstrates you can and is much more cost effective and has the added advantage audio processing lends itself to RTOS DSP of a microcontroller than application SoC.

I was purely wondering if you where working on anything new as this arena has been hugely fast paced and changed dramatically where industry leading offline ASR is embedded into mobile phones and simpler systems are utilising tiny low power devices such as ESP32-S3.

donburch · February 19, 2022, 9:40pm

Rolyan how are you going with your own ESP-S3-32-Box ? Is it ready for real world use ?

The ESP33-S3 hardware does have distinct benefits for AI applications, and the demo looks impressive (as demos are supposed to). The demo appears to oversell it as capable of acting as both Voice Assistant and full-featured Home Automation controller … yet I see a big contrast between the espressif/esp-box repository and the activity on Rhasspy or Home Assistant repositories.

I don’t recall anyone ever suggesting Raspberry Pi was suited for AI. What it is, is affordable and (until recently) a freely available general purpose platform. Rubbish it all you like just because it isn’t your ideal platform, but I don’t see Raspberry Pi going away.

rolyan_trauts · February 20, 2022, 7:19am

Look you can be a Pi fan and say it is affordable and yes ESP32-S3-Box works and because it has a audio pipeline containing AEC + BSS in tests it works better than a Pi with Rhasspy lacking simple audio processing.
Its not just Rhasspy as all the linux hobby projects have been missing essential initial audio processing that is an absolute must for what is considered basic voice AI standards.

It is not affordable as a voice AI as Mycroft clearly demonstrate with a $300 unit that offers little over a $50 unit and is completely inferior to $50 to $100 commercially available product and that is reality and yeah some hobbyists will build for fun but that is all they are doing.

There are loads of projects that the Pi does really well but the lower end of original zero to even Pi3 running Python for voice AI doesn’t work well because of specific reasons I often mention because I am being objectively honest and not just a fan boy.

You would not run a Voice Assistant and full featured Home Automation system because they are functionally distinct and benefit from running on distinct hardware that benefits them.
Cars don’t have toilets because generally its considered there are better places to take a dump and bloating a singular system is generally bad practise that often will land you in the shit.

But all the above is not the question or has anything to do with what I was asking as I was presuming because of Mycroft and as synesthesiam confirmed the focus is now a Pi4 2gb which does have far more processing power than the initial zero and asking if there is anything new in the pipeline of anything more capable than a system that had very modest roots.
With TTS we have seen this with larynx which really needs a minimum of that 2gb Pi4 64bit to really run well and all I am doing is asking if there is anything planned.

The ESP32-S3-Box was purely a demonstration that ASR models are generally getting smaller and I have no idea where synesthesiam thinks there are models that will not fit in a 2gb Pi4?
I mentioned flashlight to dodge my opinion that I feel VOSK is now a better option and other elements have evolved whilst the core ASR has pretty much stayed the same whilst elsewhere rapid changes are being made.

So how are you doing Donburch with your own Rhasspy Pi ? Is it ready for real world use ? As I don’t make any false claims about the ESP32-S3-Box as some others do with certain hardware and infrastructure.

The thread is ’ The Future of Rhasspy’ and I was asking as in certain respects it has stayed static.
It will be interesting what Upton says on the 28th The Pi Cast Celebrates 10 Years of Raspberry Pi: Episodes With LadyAda, Eben Upton, and More | Tom's Hardware as hoping we might get something like a Pi4A where the A is AI and minus USB3 the spare PCIe lane brings on board a Raspberry NPU as the PI is starting to lose huge ground in this area.
But that is just discussing the future and what are likely becoming essential requirements.

I personally feel the big processes of voiceAI TTS & STT can be shared centrally be it X86 & a GPU or what I have preordered Rock5 with 6Tops NPU that employs many ears of distributed room KWS to finally get real world use for low cost.
As if a application SoC is not a great platform for audio DSP processing then partition process to what it is great for and that is how I see Raspberry and satellite ESP32-S3 KWS and use both for what they are good for and not for what they are not.
Also waiting for the Radxa Zero2 which has a 5Tops NPU but until we get the cost effectiveness of a Pi with NPU currently unless light load the Pi is not a great platform for AI and that is just fact.

Dan Povey talk from 04:38:33 “Recent plans and near-term goals with Kaldi”
https://live.csdn.net/room/wl5875/JWqnEFNf

Tara Sainath End-to-end (E2E) models have become a new paradigm shift in the ASR community

Do you have anything to share that could be the future of Rhasspy or any cost effective VoiceAI?

synesthesiam · March 1, 2022, 3:41pm

Thanks for the link, it was great to hear the latest from Dan. I agree with him on the need for lexicons of some sort, and am happy that their new Kaldi stuff will stay on that path. It’s also becoming clear that I need to seriously consider using byte-pair encoding as an alternative to phonemization.

Please consider the bigger picture when making such statements. Why is that $50 unit $50 and not $300? Because they can manufacture a million of them and sell them at loss. Why? Because it’s more profitable to spy on you than to just sell a smart speaker.

I think the most important next step for Rhasspy is finding a way for others to more easily contribute updates to existing services or add new services. Changes are happening so rapidly that I obviously can’t keep up.

I’ve struggled for a while to come up with a better architecture that would allow for people to easily download new services, but there are so many unique use cases that I keep scrapping it

rolyan_trauts · March 1, 2022, 5:17pm

I have and the bigger picture is Googles next gen ASR / KW system is completely offline and is only online when you use a service as in grabbing the news, weather or play music, youtube or whatever.
There is no bigger picture when it comes to such a commercial difference that because they can subsidize product with services but much of the cost has nothing to do with selling them at a loss purely the huge economies of sale the big guys have.
Its likely we are not far off next gen smart AI with onboard NPUs like the Pixel6 giving approx 4 TOPs in 5 watts to enable offline, offgrid ASR, but maybe quite a few years yet and we will have to wait and see, but sadly no new announcements on their 10 year birthday from Raspberry.

Its interesting to watch you go into Mycroft sales speech as now I guess you have to but for me what $300+ does buy has some real cool alternative options that offer more, sound much better, look much better and work much better and many of them are less.
That is just my opinion and I am going to sit back and see how you guys do and how the reviews come in when released, but much of the cost of the MycroftII is down to the design and economies of sale chosen.
I just have a minimum level of expectation and because I have used the latest full Echo & Nest audio and have a reference and even though dev wise I tinker a Rhasspy or Mycroft would likely end up with a strop and the bin if for use.
I eventually went for 2x Nest Audio in a stereo pair that I managed to pick up for just over £100.
I don’t use them all that often but when I do its mainly music and news whilst I am doing something and those services are online anyway which for me is Spotify free and I put up with the adverts.
I think the Ech04 sound better and also have a zero latency aux in and went the Nest Audio because I think the recognition is slightly better and that is what matters to me with a voiceAI and the disparity with opensource is huge and my biggest problem and privacy becomes a tin foil concern when things run so bad.
But hey that is just me and I keep my interest up purely with developments and what is current with hardware and opensource and there is some very interesting stuff out there but for me it not Mycroft.
When my privacy is going to cost me $300 and not work to my expectations whilst I can not bother that Google & Spotify might have an inkling of my taste in music you can guess what I am going to plum for.

I am here because of my interest in AI and generally whats happening but your spy scare stories mean very little to me as I am still likely to use VoiceAI for online services.
The only thing offline is occasional alarms as they are in the kitchen/lounge of a relatively small flat and meant I could ditch the HiFi as actually for what I use they are good enough but again where the MKII is lacking.

synesthesiam · March 1, 2022, 5:47pm

I’m not trying to scare you, I’m just saying that spying, etc. is part of the total cost. Like with environmental externalities and poor labor practices, sometimes the final consumer price is not the only thing that matters.

I don’t have to, but I would certainly like to continue working on open source voice tech. It would be especially disappointing to have the Mark II (and Mycroft) fail because people who don’t value privacy over price go around complaining that it’s not an Echo.

rolyan_trauts · March 1, 2022, 5:56pm

Its not that its not an echo its the stupid choice that it is trying to be an echo and failing in just every area, so I might as well have an echo.

Why open source is trying to copy verbatim consumer electronics but failing when with the diversification of use is very obviously suited for client server and expense can be shared but it is not want for me its just Mycroft have chosen that path but in every aspect its inferior.
Why you have chosen a product model that is likely not efficient anyway has nothing to do with me wanting an echo but if I am going to buy something like that I am not going to buy something that is so inferior.

That is Mycroft’s problem and not mine and you can try your pitch as much as you wish but the choices they have made for me are bizarre and success and failure are in the hands of Mycroft.

synesthesiam · March 1, 2022, 5:59pm

By client/server, I assume you mean someone buying a server and having a number of satellites that use it?

rolyan_trauts · March 1, 2022, 6:19pm

You don’t even need satelites you just need KWS ears and a server nowadays can be a ARM board with NPU even a Pi with a Coral USB but even then a Pi starts to rapidly become less cost effective than it may 1st seem.
There is a huge amount of capable and cheap 2nd user equipment that is capable of multi threading.

As for struggling for infrastructure the problem has always been adoption of infrastructure without need mainly for branding.
A more loosely coupled modular system of feeding back to upstream projects of larger audiences has always been a better option for me and have been critical of the choices and bloat for a really long time but never bothered along that line as the start of simple low cost audio processing was always missing from the pipeline and we are beginning to see products that are finally filling that void.
That has been a huge hurdle as the very start of the audio process of voiceAI has been missing but yeah my Pi4 or old X86 refurb machine needs only a single unit to act as a ‘server’ and a room can have initial audio processing done on a $20 microcontroller of a KWS ear for each room.
So yeah a NUC, SBC or even your old desktop or laptop can be the basis of a central server at extremely low cost and cover numerous rooms.
Voice control and capture is extremely scalable because of its manner where short singular infrequent commands are often the norm.

Its very possible to add a coral accelerator to a PI 4 and broadcast audio to a wireless audio system than embed that functionality purely to call branding of your own.
Its likely you could do a better system for 3 rooms for half the price of a MKII that comes close to $1000 if you want x3 MKIIs.

And no not satellites as I have always argued they are bloat and want to get away from that bloated term, as all that is needed are network KWS mics (Networked Ears) and a single station.

jrb5665 · March 2, 2022, 10:39am

Before I comment on this topic I’d like to say a belated congratulations to @synesthesiam on your new job and I hope you can contribute to making Mycroft a real privacy based alternative to the other commercial offerings.

This has been an interesting discussion and you both have good points so I thought I might give another opinion which I think overlaps both points of view.

Personally I value privacy over price but only to a certain degree.
When they came out I bought 10 echos, of which I now use one as an alarm clock and timer, one other as a timer and one for general questions, the others basically are not used any more, largely for privacy reasons.
I have been struggling with Rhasspy for the last 2 years now because, while it is an excellent product for setup and flexibility, at least in my environment, I just can’t get reliable kws and accurate voice recognition at a reasonable price due to the quality of the microphones and the cost of building out satellites that then are fine in a close quiet environment and become almost useless when you try to use them in real world conditions.
I am running the base on a nuc with TTS also running there and mainly want to be able to control my home automation and music playback to sonos speakers from my local library, all which I can do now in my test lab but I would not put it in other rooms for the reasons I mentioned above.

Also I have been watching Mycroft since the first model and have been reluctant to go near it due to cost, capability, reliance on the cloud and it just doesn’t look professional.

I agree with @rolyan_trauts in what I would like to see:

Central processing/server based. If this was around the current price of the MarkII I would gladly pay. I would even pay a bit more if it could do the rest.
Low cost modules for microphones/satellites I could deploy to each room. By low cost I mean comparable to the echos and nets of the world or a little more (the privacy and flexibility would be worth it)
An easy way to integrate my own data and systems into it (i.e. slots)
A standard interface or protocol api/websocket/mqtt (pick one or more) that would allow things like node-red/home assistant or any other system I decide to build to integrate with it for I/O and control

I like almost everything about rhasspy except the satellites, I agree with @rolyan_trauts they are too bloated and the hardware they support is expensive for what should be needed and I agree with @synesthesiam about paying for my privacy.

If Mycroft could offer something like a base unit for the heavy lifting and packs of “microphone” units for the user to place in rooms I think they could offer a much more cost effective solution and using my own example I would seriously consider paying $1000-1500 for a base and 10 microphone units where I wouldn’t buy a single Mycroft Mark II for $300. I would even stretch my budget higher if enough features were offered on the Mycroft platform.

In the mean time I have echos for the menial tasks and keep trying to work out how to get rhasspy working to a level I would consider presenting to my wife rather than inflicting on her.

rolyan_trauts · March 2, 2022, 5:58pm

The best bet is probably the esp32-S3 but we are going to see a load of very capable low cost microcontrollers that are perfect for wireless KWS roles.
I am still have made no progress as they have released another product called a esp32-s3-box-lite and I really don’t care about screens, output, speaker its purely the input to be able to run a descent KWS and have a speech enhancement pipeline.
The original esp32-s3-box had a analogue loopback from the dac to sync the AEC ref on a 3rd ADC channel so the idea of just putting 2x I2S mics on a standard dev kit became a show stopper.
The new lite box has a 2 channel ADC so hopefully there is a software update on the way but also supposedly you can just buy the board alone.

I am going to keep calling them ‘KWS Ears’ to stress how simple needs are but why I keep focussing on cost is not just the cost of a singular unit for a room its that a room could contain multiples to form a distributed array microphone.
A softmax probability from a single KWS is a good enough metric if the highest value in an array to use that mic for the current ASR sentence. The ones that didn’t hear the KW are likely not hear the ASR sentence and its that simple as each ‘ear’ is completely ignorant of another’s existence.

Hopefully the ESP32-S3 boards will follow the same economies of sale that previous ones did and maybe get as cheap as $5 as you could have 2 or 3 in each room if you so wanted to as each additional mic can be placed to provide further isolation from noise sources and making far sources nearer.
I am not on a Espressif sales pitch but its the only source of free speech enhancement I know of AEC & BSS I don’t like how the base libs are blobs but hey, if another microcontroller comes along then hey but Espressif does have a history of being extremely cost effective wireless microcontrollers.

I don’t want the ‘KWS ears’ to be a Rhasspy, Mycroft, Sepia, Project Alice or one of a plethora of projects all doing the same I just want to set a basic ‘KWS ear’ system that is simple interoperable with all and doesn’t pander to any other projects protocols.

I couldn’t care less about branding or ownership its just a very simple websocket client/server queue that is file based on zones and acts as a bridge to the input of any ASR.
That is the only dictate as the zone file structure on the input matches the output to where ASR txt is dropped.

Audio is coupled by a linux asound loopback not some weird and wonderful protocol other than the websocket of the server side of the KWS bridge. Probably doesn’t even need the file system as likely the current sink of a loopback is more than enough info.
Its why I have been waiting for product as it will be built up from the audio source path with thought to always boil down to the lowest common denominator of simplicity and interoperability without bloat.

Its why I have stayed on the forum as there are many actors here @synesthesiam to say the least but I find it infuriating that many projects are applying their own methods purely to have ‘their’ own methods even though the Sepia initiative seem to be trying to address this.
This is Linux this is opensource and all the pipeline stages of VoiceAI are distinct and we should be able to partition and give choice to any as Linux and opensource does.

I am sort of critical to the MkII of Mycroft but what they have is an excellent skill server and wish they would concentrate on that as yeah I would love to tack that onto the end of my ASR of choice and so on.

But going back to ‘KWS ears’ I think opensource can do more, be better and more cost effective but its sheer stupidity to try and verbatim copy commercial offerings as likely your going to fail and there could be much better ways of doing things, for less and one of them is integration and reuse and interoperability.

I haven’t ruled the Pi either as Zero2 & Pi3A+ are both great products but until someone does provide effective AudioDSP utils I have a bottleneck for even the base function of an ‘ear’ its still a platform that easily installs network synced multiroom audio such as Airplay, Snapcast & I think squeezelite (is it synced?) That will take considerable work to port to microcontrollers where efforts with the original ESP32-S3 where slightly too constrained.
The 2mic & 4mic hat on the Pi are extremely cost effective and all is needed is the efficient code we run the rest of out Linux audio system on which isn’t python and its sort of sad as the hardware is capable.

vajdum · March 5, 2022, 5:23pm

Squeezelite can play multiroom usin Logitech Media Server / LMS and there is also a project using ESP32.

There is also a complete device with Stereo speaker, Microphone and port f.e. a display.

rolyan_trauts · March 5, 2022, 10:21pm

Yeah I have been thinking for some time that audio out of a Linux AI would be interoperable and about choice.
There is no reason to brand and embed audio out into a opensource voice ai as all is needed is an opensource network synced audio player.
I know Airplay & Snapcast use NTP to sync latency of network sinks in the same room and also with rooms. It doesn’t remove latency but just ensures the audio out is in sync to various clients over any network.

Snapcast is full blown opensource but is a tight fit into a ESP32 also I think someone has ported Airplay1 and I think Squeezelite is ported but I don’t know if squeezelite does NTP audio sync?

To use opensource and feed back up source and become part of the herd strengthens opensource whilst embedding slight changes into your own system could be considered leaching as it dilutes a herd and becomes harder to maintain via a smaller herd as its deliberately been made proprietary.

Esp32 has a strong analogy to the Pi3&4 as ESP32 is based on a LX6 processor architecture and the newer ESP32-S3 is based on LX7 architecture and can have more flash, ram and is over 10x faster than the earlier base model with various operations.
So even though a ESP32 is a tight squeeze that is a base model with no additional PS ram but the extra oomf of the esp32-s3 and that flash, ps-ram and internal ram has all increased and being new its options are very open.

I think its very possible to run KWS websockets & a network audio system on the same microcontroller with the advent of a S3.
Also its not necessary to embed audio into weak insubstantial speakers as it can be much more cost effective and superior audio wise to share a wireless or wired audio system as network ears only need to be ears and audio out can be from a room station, central station or a network speaker, even hdmi cec or IR can wake your TV and use its speakers.

Reuse and interoperability can be a superior system that is often more cost effective.

If anyone knows if squeezelite is network synced please say as out of the 3 its the one I don’t know that much about.

Google & Amazon with various cast in inbuilt sync functions are embedding deliberately to call there own for branding ownership and needs to be dodged.
Strangely with apple Airplay is more open.

https://forums.slimdevices.com/showthread.php?112697-ANNOUNCE-Squeezelite-ESP32-(dedicated-thread)

I say open but cracked which I am not to sure about the Google cast system.

airplay1 definitely is being used.

Also are there anymore opensource audio sync projects with good support herds?

On a Zero2, Pi3 or 4 very easy to implement and run.

Does anyone know with Pi HDMI cec if you can change source to and from the Pi.
I am pretty sure you can change to a hdmi source not so sure if cec on one hdmi can switch to a source of another.
Think its time to ditch my DVI monitor and start having a play.

AndreKR · April 15, 2022, 8:10pm

I’m a bit late to the party, but I though I’ll chime in as well.

For me the most important features of Rhasspy are:

Completely offline operation of course. In fact I’m using it in places that don’t even have internet, I used my phone to download the models.
Guided setup. The web interface is awesome. A config file would be fine as well, but the important part for me is that I don’t need to search through the forum to find out what options there are and where to find the current models and such.
Good recognition and acceptable text-to-speech quality. She (the default Larynx voice is female) does get it wrong sometimes when there’s noise, especially numbers (“set the timer to 5 minutes” often gets me 45 minutes) but in general it’s pretty good, especially for my very basic microphone setup. TTS ist also quite good, especially compared to alternatives like Mimic. (She does have a few impediments like she pronounces 14 as “fourtheen” or “partly” as “part-lie”.) I just wish Porcupine was a bit more robust, but I want to switch to satellites, so I’ll have to look into wake word detection anyway.
Speaking of which: Satellites
The ability to control pretty much the whole flow via MQTT. Dipping the music when the wake word is detected, cancelling the recognition with a button, etc. This could actually be improved, like triggering TTS or playing a sound file via MQTT. Or even switch between different training sets, that would open up the way to have an actual dialog. (“set the timer” - “how many minutes” (switch to number set) - “23”)

MDL · April 21, 2022, 6:13am

Maybe: https://www.ngi.eu/

rolyan_trauts · April 21, 2022, 7:22pm

I think Google are already far in front with this Google Seeks Help From People With Speech Issues - Disability Scoop

They have been working with disabled people with Project Euphonia for several years now, don’t know what product they do, but like Rhasspy in comparison to Googles new offline ASR there isn’t really in terms of results.
Disability software needs to be effective, low cost and accessible and not sure that it is opensource is a criteria of any importance and its near impossible for a singular developer to collate the datasets they have so you will always be inferior.
Its one thing Rhasspy & Mycroft have seemed unable to garner much and that is datasets Mycroft had a lot of web site blurb at one time but never anything as a dataset not even ‘Hey Mycroft’ seems to be available but might be because I am not a fan of Precise I never looked hard enough.

I dunno what they have planned but sometimes Big Data can be beneficial, opensource needs the datasets as until they have something like what bigdata has we can never compete.

For the disabled if it entraps or not to an Android ecosphere android.speech | Android Developers is prob not much of a consideration but still have to see how they release there new tensor chips that are basically ultra low power npus and its really hard to compete if they release there Pixel offline self learning ASR for disability as its really hard to recommend what is currently opensource to the disabled.

dblanc28 · May 3, 2022, 3:09pm

Hey @synesthesiam ,

I know this thread is a few months old now but I wanted to say congratulations as well. If you are happy then thats all that matters

In reading some of the posts here, I see a lot of people have issues with MyCroft’s marketing of being privacy oriented but then there practices being the opposite. I share these issues but I just wanted to point out the fact that they hired you. Someone they know will be pushing for staying open source and going completely offline. So although im not going to start using MyCroft yet, hiring you seems like a very promising sign that they will move in the right direction.

I have been using Rhasspy for a few years now and one of the things I love about Rhasspy is the ability to customize. I know that normally comes with being open source but with every component of Rhasspy, you have the option to use a local command. Which means if I want to use a TTS or STT or anything that is not integrated out of the box with Rhasspy, I can set it up myself with the time and knowledge. A skill store and dedicated hardware option would be great for ease of use and I am all for it. But as long as we do not lose the ability to customize in exchange for ease of use. So i guess what im saying is please make sure MyCroft stays open source haha, and the options to customize any component is available always.

I believe you mentioned that other companies you interviewed with required that you shutdown Rhasspy if they hired you. Honestly that makes sense to me as Rhasspy would be competition to the company you are working for. Seems like a conflict of interest. So that lead me to wonder why MyCroft didnt require the same thing. My hope is that its because they have seen what you have done with Rhasspy and truly do want to merge the two together.

If you say you are not abandoning the Rhasspy project then I whole heartily believe you. You have never given us a reason to doubt it. If there is anything this community can do to help keep this alive, there are a lot of people, including myself, that would be willing to help.

Thank you for all your hard work thus far and all that is to come

kicker10bog · May 3, 2022, 5:55pm

Congrats on the job!

Now let me say that I tried Mycroft before Rhasspy a few months ago and it was fun, but not nearly as easy to customize as rhasspy. For someone who has trouble saying certain sounds (particularly explosive sounds at the beginning of a word), the option to choose a wake work and set custom commands is amazing. It’s also easier to get it to do exactly what I want in Home Assistant using intents and writing custom services in pyscript on Home Assistant than it was to use the Home Assistant skill in the MyCroft store.

For example, my wake word is “grasshopper” and the command to turn on my lights “lumos” and off is “nox” which are both more fun and much easier than saying “turn on/off/out my lights.” Oh yes, I also set “out” to also mean “off”. I suppose I could write a MyCroft skill to do all that, but this is so much simpler.

So, please, keep rhasspy going or integrate that type of functionality into MyCroft if you can. It’s just such a good feature.