Echo dot => No Alexa and Yes Rhasspy

Florian · November 25, 2022, 6:22pm

Bonjour,

Je suis très très mauvais en anglais donc j’espère que quelqu’un sur le forum parle français.

Je viens d’installer Rhasspy sur une VM avec Docker. Cet équipement ne possède cependant pas d’enceinte et de micro.

Est-il possible d’utiliser des Echo d’Amazon et de les utiliser pour Rhasspy en supprimant tout d’Amazon dessus afin que cela reste purement local ?

En espérant que vous puissiez m’aider,
Merci par avance!

====
Hello,

I am very very bad at English so I hope someone on the forum speaks French.

I just installed Rhasspy on a VM with Docker. However, this equipment does not have a speaker and a microphone.

Is it possible to use Echo’s from Amazon and use them for Rhasspy by deleting everything from Amazon on them so it’s purely local?

Hoping you can help me,
Thanks in advance!

donburch · November 25, 2022, 10:56pm

We wish ! Google and Amazon have invested a lot to make very good hardware … but it is locked to their cloud services.
With their low prices for the hardware I wonder how they make a profit

–
Nous souhaitons ! Google et Amazon ont beaucoup investi pour fabriquer du très bon matériel… mais il est verrouillé sur leurs services cloud.
Avec leurs bas prix pour le matériel, je me demande comment ils font du profit

Hugh_Barnard · November 26, 2022, 4:14pm

A mon avis (je n’ai pas Alexa) non, sauf décortiquer pour les composants! Main, au moins pour tester, un casque + micro en attendant "mieux?

rolyan_trauts · November 26, 2022, 4:34pm

Its been strange watching Alexa as they where 1st, but they sort of came to a halt with technology and added more to there hardware in terms mics, speakers and amp.

Google dropped the number of mics to x2 but had a huge advantage in the IP of the ML they own and now process audio completely different under much less load than Alexa.
Where Google likely dropped beamforming for VoiceFilterLite so in the price war between the 2 Google since the 1st models has reduced manufacturer cost whilst Amazons have increased.
But this has took an interesting turn as the war was to garner services as in VoiceAI Amazon almost became another Microsoft with a platform monopoly.
Google is now also the leader in cutting edge offline ASR that runs ridiculously powerful ML on a few watts on there tensor chip totally offline.

Even though prices have comedown with black friday deals offering a Google Pixel6 for £200 and I was tempted purely just to test but that IP is budget levels and will get lower.
But also Google has been handing Amazon a beating in the cloud again with ML where the Google services offer far more for $/watt where Google across awider range is only losing $1.6Bn but doesn’t care with its $65.1 billion in revenue it brought in total. Where Amazon has been late to the party with there Graviton 3 Arm based chips.

If opensource is going to compete in anyway it needs to follow a similar infrastructure of heavy central ML reliance where my new Toy of a Rock5b has near similar performance to a Pixel 6 Phone.
Its an octa core A76/A55 @ 2.2/1.8 Ghz that approx gives 2Tops worth of ML perf in about 5watts, with a MaliG610Mp4 GPU that likely gives the same 2Tops of ML but with 1watt and also a 3x 2 Top core npu giving 6 Tops total which was on OKDO for £140 with free delivery with the black friday code.
Mics are just mics and you need multiple low cost micro-controllers to service multiple zone and feed a single low energy ML server that in terms of ML compute could be roughly 30x that of a Pi4.

If you want the best cutting edge ASR then buy a Pixel6 phone as Google are making little to no money on it but staking a claim in the future of embedded AI.

Apple also with there new M1 Arm based computers as the wattage/ml perf is out of this world they have even created a new Arm instruction set called an AMX-2 so unlike a NPU & GPU it works in the same memory space as it is the CPU, but with Apple prices. The Neon is a co processor that works in a different memory space so the copy to and back doesn’t happen with AMX-2 and that is likely the x2 over Neon.

Benchmark results · Issue #89 · ggerganov/whisper.cpp · GitHub is a really interesting benchmark thread where users with Macs, Graviton, Raspberries & myself have been posting results where the Macs & Graviton are obviously posting huge figures.

Florian · November 26, 2022, 7:09pm

Je vais voir pour cela.

Sur un autre forum, on m’a conseillé un raspi et un orange pi, je ne sais pas trop entre les 2 car je compte mettre pas mal de satellite, et une alimentation en PoE serait clairement un plus. Entre les 2 j’attends des avis pour prendre soit l’un soit l’autre tant qu’il permet le PoE.

On m’a également parlé d’un autre site où j’ai trouvé : 6 Mic Array for Pi - ReSpeaker
6 micro ce serait pas mal ! Mais on m’a dit qu’il y avait des soucis niveau driver… Donc… j’attends d’avoir des infos à ce sujet également ^^

rolyan_trauts · November 26, 2022, 8:13pm

6 mic is bad and seems TDM is bad on the Pi generally. What TDM does is swap out max sample rate for channels so the normal think 192khz max I2S stereo is 6 channel which could be 64Khz but they are doing it at a more normal audio 48Khz.
The problem seems to be there is no sync so anyone 1 of what is really x3 stereo pairs could be pulled as the 1st word pair.
So what that means simply is the channels come in completely random so if there was a beamforming alg available depending on alg it may not work.

If you want to buy one buy via paypal as likely you will want a refund like the one I have gathering dust on my desk.

The only hat with a working beamforming software alg that I know of is the 2 mic as I am the only one to create a working beamformer of any use on the Pi.

ProjectEars/ds at main · StuartIanNaylor/ProjectEars · GitHub but like all the other hardware beamformers its not very good unless you sync KWS & beamformer to lock onto a command sentence.
I never did implement that part.

Also with the resources of a PI the only type of beamformer that will run is a GCC-PHAT Delay Sum as tried others and slower than realtime.
This has further problems as Delay Sum needs specific geometry whilst the 6mic only design was to look like a beamformer as its geometry for Delay Sum is totally wrong and it can not run another.

Its one of those wonderful pieces of tech that was created not because it can but because someone could.

Florian · November 26, 2022, 8:26pm

rolyan_trauts:

6 mic is bad and seems TDM is bad on the Pi generally. What TDM does is swap out max sample rate for channels so the normal think 192khz max I2S stereo is 6 channel which could be 64Khz but they are doing it at a more normal audio 48Khz.
The problem seems to be there is no sync so anyone 1 of what is really x3 stereo pairs could be pulled as the 1st word pair.
So what that means simply is the channels come in completely random so if there was a beamforming alg available depending on alg it may not work.

If you want to buy one buy via paypal as likely you will want a refund like the one I have gathering dust on my desk.

The only hat with a working beamforming alg that I know of is the 2 mic as I am the only one to create a working beamformer of any use one the Pi.

ProjectEars/ds at main · StuartIanNaylor/ProjectEars · GitHub but like all the other hardware beamformers its not very good unless you sync KWS & beamformer to lock onto a command sentence.
I never did implement that part.

Also with the resources of a PI the only type of beamformer that will run is a GCC-PHAT Delay Sum as tried others and slower than realtime.
This has further problems as Delay Sum needs specific geometry whilst the 6mic only design was to look like a beamformer as its geometry for Delay Sum is totally wrong and it can not run another.

Its one of those wonderful pieces of tech that was created not because it can but because someone could.

D’accord, donc… si j’ai bien tout compris, on oublie clairement le 6 mic.
J’ai cru comprendre qu’il y a plusieurs types de 2 mic, lequel me conseillez-vous ? Auriez-vous un lien ?

rolyan_trauts · November 26, 2022, 9:37pm

To use on a Pi Florian?

As they are essentially all the same with a wm8964? something like that whatever the chip is its actually the same and they are all sharing the same drivers.
A clone one as to be honest I don’t think it matters.
Some say they Respeaker has better mics than the Keyes studio but have my doubts and think likely the same.

The geometry isn’t perfect but really not that far off as they are OK, but like all hats the onboard makes things sort of awkward as its better to have the ports facing the voice actor than horizontal as that will garner some natural reduction in incoming rear sound than horizontal.

Also again being onboard it also makes it hard to isolate against vibration and the audio output in fact near impossible with the AEC/NS we have.

Why I feel using Mic only satelites or broadcasting to a wireless audio system such as RaspiAudio LMS, Airplay or Snapcast is just better. The simple physics of some distance to the mic.
But many get them as the interest is to play with opensource voice systems than have a production ready system.

Likely if you are just going to have a Mic or ‘Ear’ as I call them I2S mic modules off aliexpress with the best params you can find can be used with the adafruit driver which I think I can fix but also never got round to (I think it records in 32bit they are not they 24bit and prob why people complain the are quiet)

Works with any I2S (Not PDM) mic I know.
They used to be really cheap on aliexpress but like everything have increased in price and I have forgot which have the best sensitivity and SNR but all that info is avail.
You can set to exact distance and likely mount and isolate much better in any direction to the board.
But you will not have any audio out unless maybe a hdmi audio extractor but they only work when they are plugged into a hdmi monitor or screen.

PS I have a tendency to use these for Pi GPIO 2x +/- rows that help to quickly make multiple connections to a single pin with dupont jumper leads, otherwise solder up a cable.

Cheap I usually just get x5 at a time and have them around

donburch · November 27, 2022, 12:27am

I find it hard to recommend any particular hardware for a satellite at the moment.

Raspberry Pi was popular for Rhasspy, mostly because it used to be easily available and fairly cheap - but now they are hard to find and expensive I believe Orange Pi is essentially a copy of Raspberry Pi. RasPi Zero is a good size for a satellite, and it works OK, but I think a bit slow. The Zero 2 W or a RasPi 3A+ are better options … but hard to get.
for microphones, I have a reSpeaker 2-mic HAT, reSpeaker 4-mic HAT, and Adafruit 2-mic HAT boards - but I agree with rolyan that their differences are minor; and their driver does not make good use of the hardware.
The Raspberry Pi IQaudio Codec Zero uses a different chip to the reSpeaker devices, and has a different driver. I have not used this board, so cannot comment on it.
My latest satellite uses just a cheap USB microphone and gives much the same result. Of course I am just a user of this stuff, but rolyan is obviously an expert in the audio field.
There are other multi-microphone units (such as https://wiki.seeedstudio.com/ReSpeaker-USB-Mic-Array) with firmware providing features like Voice Activity Detection, Direction of Arrival, Beamforming, Noise Suppression, De-reverberation, Acoustic Echo Cancellation … but at a price.
Several conferencing microphones are similar.
rolyan has pointed out the ESP32-S3 chip as being a much better choice for a voice assistant … but i believe we are still waiting for the software. I am hoping that with @synesthesiam joining Nabu Casa (who are also behind ESPhome) this might change next year.

rolyan_trauts · November 27, 2022, 12:56am

Yeah I should of mentioned what Don mentioned and almost any cheapo usb sound card (there are a few bad ones) and a unidirectional mic.
Also plugable does one that I think is now the only reasonable priced stereo mic usb.

What Raspberry are doing by turning thier backs on makers whilst they supply commercial is completely utterly crazy as projects are starved and look at other solutions.
ESP32-S3 hasn’t got enough Ooomf for a all-in-one voice assistant as Espressif tried that with there Esp32-Box and as ASR I will use the technical term of crap (Just too low powered, but they did cram it all in)
It makes a perfect wireless mic that could run a pretty hefty KWS model and have paradigm shift on a single home central brain, distributed mics in a room with a wireless audio player. For a single room its pretty expensive for multiple rooms its extremely competitive and have been banging on about it for now what seems ages.
Whisper.cpp Benchmarks

CPU	OS	Config	Model	Threads	Load [ms]	Encode [ms]
RK3588	Ubuntu20.04	NEON	tiny	8	226.48 ms	2681.05 ms
Raspberry Pi 4 - 2GB	OpenVoiceOS	NEON	tiny	4	743.37	10122.80

Also someone will have to try this with a Pi4 as don’t have one
If I compile with march=native -ffast-math but maybe the Pi also gains much the same and run just on the big cores

CPU	OS	Config	Model	Threads	Load [ms]	Encode [ms]
RK3588	Debian11	NEON	tiny	4	228.24	1177.55

The cpu alone is x3.775 Pi4 so it makes its $150 approx price quite good value compared to the 8gb Pi4 if you could get one, as will ignore the native compile which is x8.59.

With the right models it could service a whole house worth of wireless esp32-s3 mics.
Mainly because chance of collision is pretty rare due to the nature of voice commands.

So could an Odroid that is prob somewhere between the two (nearer the Pi4 than Rk3588).

Ameridroid also do the Rock-5b

If Aus prob China is your best bet with Allnet?

rolyan_trauts · December 5, 2022, 10:13am

PS there is a Strange Fruit avail at a ridiculously low price.

That is a real deal as the distro images are going to have to be real bad to make that a bad price.

sanyasa_sure · June 20, 2023, 1:48pm

Yes I agree to this as the much better approach today !

Espressif has come up with something called a AFE that performs niche voice processing on the voice commands and even accepted as a Amazon-Qualified “Audio Front End” Solution

https://www.espressif.com/en/solutions/audio-solutions/esp-afe !

The only thing I would suggest to use it as a wireless mic like Rolyan suggested is to

Connect a mic or two to ESP32-S3 board
implement Espressif AFE’s Wakenet first to wake up esp32-S3 board on keyword and capture voice
Implement AFE’s - AEC, BSS and NS on the captured voice
Send that refined voice to Rhasspy base station via a websocket !
Rhasspy will recognise the intent , and execute the command . It will also provide TTS output to say it has completed the command
Attach a small speaker to above ESP32-S3 board you used in step 1 (you just need a small speaker to play the output received from TTS )
There is also a websocket server that needs to be run on esp32-s3 board that listens for incoming audio from rhasspy’s TTS service and plays it though the speaker !

Wakenet and AFE SDK’s are already provided by Espressif, but the hardest part is to write to c or c++ code for steps 1 to 7 and flash it to a esp32-s3 board ! - I am currently working on this using a esp Arduino code (which is taking longer due to my current work commitments and travelling)

I bought a used 2011 Mac mini with core i5 & 16 GB Ram for less than 100£, erased Mac and installed Ubuntu on it. I have been using it as abase station for my Home Assistant, MQTT SERVER, Node Red and Rhasspy (4 Gb dedicated to Rhasspy alone).

Steps 6 and 7 can be implemented on a RPI zero. A DAC Hat with Speakers can be attached RPI and use it to play TTS (Text to Speech) received from Rhasspy . Moreover install Volumio or Max2Play on the Pi to make a it a HiFi wireless streamer to stream music using Apple airplay, LMS server or mopidy ! But instead if you are looking for a multiple wireless mics - skip this and attach a small speaker to esp32-s3 board itself ! Esp32 -S3 board itself costa around 10£ on aliexpress ! So we can use multiple of them as wireless mics across the house ! But yes someone one has to program steps 1 to 7 for esp32-s3 or wait until I come up with that solution in the meantime !

Alternatively using a Pi with usb mic and max9814 preamp mics as suggested by roylan several times over several posts in this community gives a good performance too albeit without heavy AFE ! But we need use to rhasspy’s preferred Wakenet on the PI and use it as a satellite over websocket !

rolyan_trauts · June 26, 2023, 12:20am

GitHub - toverainc/willow: Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative is interesting but likely optimistic.

Shame Esspressid doesn’t just make audio addon boards than just full audio devkits, but I2S mics are a relatively easy addition as said.

rolyan_trauts · June 30, 2023, 3:35am

Esspressif did quite a nice Espressif Microphone Design Guidelines - ESP32-S3 - — ESP-SR latest documentation
The esp32-s3-box devkit is typical of what esspressif does as they are technology demonstrators than products and why the above Willow sort of struggles as its overloaded as really the esp32-s3-box is an example with code and schematics not a usable product even if some do.
I actually think Esspressif prob do themselves a disservice here as modules that can be used on all dev kits, than specific single dev kits would have a more active dev arena.

There are some interesting things that you can do with the esp32-s3 as in the above Mic guidelines there is also a 3 mic version that adds an extra axis to the BSS alg of the ADF.
The ADF is really well documented but still think it needs a bit of a Guru and some testing and dev work of optimising between the fast small area of sram and the slower but much larger psram.
Its not much work but there seems to be far too much ‘I’ and little open collaboration but what stops, me attempting is that my opinion is the upstream frameworks we have available are totally the wrong infrastructure and seems pointless to even try even if the alternative is much simpler.
I am not sure why a more open dev roadmap hasn’t been floated across communities from the likes of espHome and others, why we have so many disparate single dev low user frameworks of ‘I’?

Wireless mics (Ears) are a lofi alternative that compete with hifi via laterial thought as not having a mic sat directly on a speaker makes audio processing much easier.
So you can drop AEC as its not needed and place your wireless audio in more standard room configs on more standard and better amps and speakers and get rid of these toy like solutions.
Also Wakenet is just a KWS and because its a closed blob, training your own is perferential but Esspressif offer a few ready made to help with Dev or give a team you can contract out to.
Wakenet use 2 models a CRNN & BcResnet where likely the better model of wakenet5 is hindered by licensed LX7 LSTM code.
I have doubts because the resource allocation of the wakenet blobs is merely to provide a tech demonstrator and not that great in use with the restrictiveness of a blob and have mentioned before a DsCnn could be a better streaming model not needing a LSTM if doubling the number of params over a CRNN.

The Rpi0 will run snapcast or squeezlite clients but don’t use a Dac hat when easily available $3 dac modules via duponts exist.
https://www.aliexpress.com/item/1005001993192815.html
Its a shame the $15 Rpi0-2 stock dissapeared as with the 2mic hat that $35 solution was probable, but why do it when a $10 esp32-s3 solution allows a much affordable multiple room placement for better coverage than single consumer grade clones that are lacklustre.