Not sure whether you are referring specifically to the reSpeaker 4-mic or HermesLedControl; or the concept of FOSS or electronic devices generally.
I agree that reSpeaker is not worth investing in - but I think for very different reason.
I was very disappointed to find that seeed had abandoned support for the reSpeaker range several years before, and you needed to downgrade OS to install the seeed drivers. I then found that HinTak has taken on the task of updating the reSpeaker drivers for newer kernels. Since then he has even been updating the official reSpeaker repository, and replied to an issue only 12 days ago. He is not actively developing, so it is likely that the driver doesn’t support 64 bit OS.
And pretty much all of the other multi-mic boards seem to be based on the same seeed driver
Similarly the HermesLedControl repository appears to have been updated within the last month.
Personally i think the “future” will be ESP32-S3 and similar chips which combine processor and ADC, with enough grunt to do the DSP on-chip.
Exactly that is what I was referring to. As far as I can see from the github issues, you need to compile a custom kernel to get 64bit support even somewhat working. Only respeaker that has “support” is the 2-mic because that does not depend on seeed drivers
Yes … then I asked myself if I need 64-bit on a Raspberry Pi that I am using as a satellite … and the answer was “no”. Probably on a Base station, but that doesn’t have a microphone/speakers, so not an issue.
I haven’t paid attention to Seeed’s USB interface reSpeakers - but other I’m pretty sure all the other models use the same driver.
Its a real shame that on the Pi there isn’t a good multichannel hat as the respeaker 4/6 mic hats show that it can be done and all that is missing is channel sync, so you don’t get the random channels you currently get with the Seeed drivers.
I find it weird that Seeed continue to sell a product that a couple of year ago they stopped supporting as EOL.
Still fixed geometry hats for software implementation especially with the geometries we have are fubar.
Aliasing with an endfire means a max spacing of somewhere in the region of 30mm and broadside 60mm max, whilst what was provided was something that looked like a mic array minus any audio engineering.
The mems mics on board are analogues and why they are on board and not just an ADC board with daughter mic boards on dupont jumpers or ribbon, so it can be used with any multichannel analogue signal is curious and makes it near impossible to provide any vibration damping and audio insulation.
Audio wise from drivers to audio engineering its devoid of any possible use and leds just sparkle to hide the fact its a piece of total c-rap.
The 2mic & the USB version are really the only audio working mics Respeaker do, but still have some pretty dodgy fixed geometries that on a hat directly on a pi provide near choice in mic positioning.
Maybe you should ask yourself again if you need 64-bit as every module from KWS, ASR to TTS use models that can be quantised to 8 bit that the 64bit neon co-processor of a Pi can operate x8 in one instruction whilst 32bit is a max of x4 and why on 64bit almost a x2 speed or load reduction is provided over 32bit with quantised models, but hey enjoy the bright lights.
Hello, I am new here, and even after reading several threads, I am completely lost on the choice of a microphone for a satellite intended to be in the main living room of the house (with potentially a little noise and often music)
frankly, I don’t know what to think and take as an audio input device anymore. In fact, I just want to know what to take at this time for the satellite to work best.
I came across some interesting comparisons (like here), but they are starting to get dated now…
and, I don’t know what to choose now?
between the different solutions and products that exist: …
There are microphones on all these devices, even the Anker soundcore v1, v2 and v3 (to be confirmed).
My main need is a good microphone (audio input) for the STT to work the best as possible even with other ambient noises or music.
But if there is an audio output as well, it’s better
After that, but maybe I shouldn’t dream too much.
If the whole thing is wireless (as could theoretically be done by devices like the Jabra Speak 510 to 750 or the Anker PowerConf from S3 to S500) with Bluetooth dongles, that would be the holy grail
So in my situation, the Anker PowerConf S330 is surely one of the best solution that we know is really functional by @C64ever (even if it is not wireless)
Dunno about the soundcore as prob been mentioned before but with my memory forgot (I did a quick google and looked like just a BT speaker), bluetooth often seems a struggle for users in Rhasspy as have noticed a few threads before.
I would stay away from BT and go for an easier install with USB as depending on docker/not docker and bluealsa or pulseaudio it can get a bit confusing and also the shared sdio combo wifi/bt on the pi can be a little temperamental as one thing I do remember with an airplay/spotify/Bt on a Pi project it kept disconecting and did what the documenation said and disabled onboard. I did and used an external dongle dunno what the problem is but with an external it suddenly worked flawless.
PureAudio Array Microphone Kit for Raspberry Pi 3 is just a stereo usb card with a premade 2 mic, with a closed source beamformer and KW that dunno as if anyone has ever intergrated with Rhasspy I have forgot.
ReSpeaker_Mic_Array is more expensive than the AnkerS330 but minus a speaker but functionally very similar if you add a powered speaker, do remember think it was @fastjack that it could be hissy and noisey.
PS3Eye mic is just a USB mic and has no algs to beamform or aec that you can really use with it whilst the others are ready made contianed units and its already built in.
Nothing works well with ‘other’ ambient noises or music, static filtering some do quite a good job, but conference mics are of a different design focus than smart assistant mics.
If ‘other’ means your playing media on your device its not as bad as the situation with ‘other’ devices playing media, but then again even the latest and greatest from the likes of Google and Alexa can be poor in that scenario especially the older models and Alexa.
Its really hard to say what is good as can only make a comparison as do keep testing the Google & Amazon units when they come out and likely not as good as them, but if that is good enough only your own experience can answer that.
The Respeaker 2mic hat is prob the most budget friendly, but you still need a powered speaker and the software and install can be too much hassle, whilst the Jabra/Anker its all there and just plug in and often that sways decision more than ‘barge in’ and ‘Word Error Rate’.
So its real hard and very subjective to say what is good and prob just easier to make a different criteria.
Jabra/Anker like units for minimal software and maker fiddling of just plug and play vs the Respeaker 2 mic or USB sound card for a more budget concious but far more complex software and maker build, but many actually really enjoy the maker stuff than just sourcing off the shelf.
That distinction is prob easier to make.
Didn’t read through all the replies here but as I recently switched my approach towards the voice/user interface bit I have a spare Matrix Voice (without ESP32) and would give it away for free (just the porto would be nice)
(it’s working all fine and all the Matrix drivers/software is still available + I got the LEDs work nicely - even from within docker)
Basically we are using a stereo mics with a preamp wired to a usb Soundcard on PI !
I haven’t had the time to implement it this week as I was travelling a lot in the last two weeks owing to work commitments but should be able to start on it today / tomorrow and let you know the status on it !
This is more like using a esp32-s3 board ! It has dual core processor with special optimisations for running ML Models ! Espressif the makers of esp32 has even implemented a pretty strong AFE solution that has all the good parts of a noise cancelling, BSS and AEC.
Their hardware has all the ML Optimizations to run their AFE Algs ! The Algs are there and you just need to enable them on the esp32-s3 board which calls for some skills in using their ESP IDF programming SDK ! AFE is even a qualified Amazon AVS solution for Alexa ! Read more on it below:
Basically we need to have a base server like an intel NUC on which you can run Rhasspy, Home Assistant and Node Red ! You can then use esp32-s3 board to implement KWS And AFE algs to read the mic input , pass it on to Rhasspy base using MQQT for speech recognition and intent handling to execute the actions with help of Home Assistant or Node red integrations !
And yes there is a Rhasspy 3.0 developer preview that has web sockets which sound a better solution than using a MQTT !
I am planning to use ESP IDF to use their AFE solution for mic inputs and KWS and pass the processed voice to Rhasspy base for the rest of pipeline actions !
While this will be my second diy that I plan to work in the near future, I will detail the approach below !
Set up Rhasspy 3.0 developer preview on a more powerful device, such as an Intel NUC. You can follow the Rhasspy documentation to install and configure the software. Setup websockets server
II. On ESP Device (KWS Server)
Set up AFE with wake word detection on a ESP32-S3 device using ESP-IDF
Use esp-skainet to continuously listen to audio and perform continuous Voice Processing (using AFE) for AEC, BSS/NS, VAD, WakeNet
Use WakeNet (part of esp-skainet) to perform Wake word detection. When wake word is detected send it to Rhasspy using Web Sockets. Websocket client is on esp32 and Websocket server is on Rhasspy in this case !
III. On Rhasspy / HA Base
Rhasspy to perform Speech recognition to convert speech to text , and then intent recognition
HA receives the recognized text or intent from Rhasspy.
HA uses the recognised text to trigger actions on its entities , such as controlling home devices or sending data to other devices.
Rhasspy also sends tts to a squeezelite device - may be another esp32 with dac / Pi with DAC. Configure Rhasspy to use TTS via WebSocket. In the “Text to Speech” section of the Rhasspy configuration page, select “Remote WebSocket Server” as the TTS provider and provide the IP address and port number of your ESP32 device.
IV. On esp-Squeezelite device or Pi with Squeezelite
Setup websocket server on esp32 to listen for incoming websocket connections
When a WebSocket connection is established, read TTS audio data from Rhasspy.
Play the TTS audio data through the ESP32 speakers / Pi Speakers
I would have to design a nice diy enclosure to put the esp32-s3 device with mics along with pi / another esp32 with dac running the Squeezelite.
And then connect the enclosure to my soundbar ! Infact I could use a digi hat on Pi that gives me digital audio output through an optical out ! I can connect it to optical in of my soundbar ! The soundbar has its own sets of dacs , amps and speakers anyway !
That way I can have a smart soundbar in living room that works as a Rhasspy voice assistant !
I prefer to use ESP32-S3-DevKitC board with i2s mics for this DIY so I don’t need to worry about the ADC channels ! Infact I can still connect a ADC board to my esp32-s3 to pass a reference signal from pi or another esp32’s dac to this ADC ! This way I can also try out the AFE’s AEC algorithm!
Also having the Squeezelite on PI / another esp32 device makes it a multi room audio player with LMS, Airplay and DLNA capabilities. I prefer to use a pi zero 2W with Max2Play to get above features like Squeezelite and also use it as a Bluetooth audio receiver !
Well I can also connect the enclosure to even a AV receiver instead of soundbar to make it a Rhasspy voice assistant along with multiroom Audio device and a Bluetooth receiver !
To be more specific, I haven’t seen any good Open Source options in my price range; and I reject getting locked into a multi-national companiy’s cloud offering.
Your approach 1 is reliant on availability of Raspberry Pis, then adding extra hardware. When availability of RasPi 0 2W returns to pre-covid prices it is a reasonable - but not particularly cheap or high quality - option.
I have been attracted to the ESP32-S3 idea since I first saw rolyan suggest it - one device with all the hardware on-board, and just enough grunt to run the wakeword detection. But alas reality is still to live up to the promise. I now understand that the desirable software components are closed source and requires the Espressif development environment - which is probably seen as a barrier for FOSS developers.
Personally I am mostly a user - if I can buy a device cheaply that does the job I want, I don’t care so much what technologies it uses internally … as long as it works, and continues to work even if the manufacturer goes broke of changes their policy. Imagine if Ford or BMW executives decided to remotely disable all their old model cars to “encourage” customers to buy the new model !
Anyway … it looks like your steps I and III run on the same machine - a server doing conceptually the same tasks as a current Rhasspy Base system. Great, I see this is the way to go.
Would steps II and IV run on the same ESP32-S3 machine ? If you are adding a second ESP32 or a RasPi to the mix then where is the cost benefit ?