What about the PureAudio Array Microphone Kit for RPi3 on Adafruit?
Its Ok for a 2 channel broadside, but is just a 2 channel adc soundcard and stereo mic with software.
I used to use a edimax dreambass as 2x adc usb sound card for $15 with 2x electrets as being round they are real easy to drill and fit in a grommet for an enclosure.
They seem to be EOL and only thing I know now with 2x ADC is https://www.aliexpress.com/item/4000589957704.html or a bit less bulky and similar https://www.watercoolinguk.co.uk/p/AXAGON-ADA-17-USB-20-HQ-sound-card_76055.html
Hello, I am new here, and even after reading several threads, I am completely lost on the choice of a microphone for a satellite intended to be in the main living room of the house (with potentially a little noise and often music)
frankly, I donāt know what to think and take as an audio input device anymore. In fact, I just want to know what to take at this time for the satellite to work best.
I came across some interesting comparisons (like here), but they are starting to get dated nowā¦
and, I donāt know what to choose now?
between the different solutions and products that exist: ā¦
- ReSpeaker_Mic_Array_v2.1
- Playstation 3 eye
- PureAudio Array Microphone Kit for Raspberry Pi 3
- Jabra Speak 510
- Anker SoundCore 2
- ā¦
@C64ever Might give you a review as for Ā£69.99 the Anker S330 might be best value for money for a āspeakerphoneā that you can just plug in.
The Anker soundcore is just a BT speaker as far as I know, but you have a mixture of mic only, speaker only and then the Jabra Speak 510 so its hard to understand what you are looking for?
There are microphones on all these devices, even the Anker soundcore v1, v2 and v3 (to be confirmed).
My main need is a good microphone (audio input) for the STT to work the best as possible even with other ambient noises or music.
But if there is an audio output as well, itās better
After that, but maybe I shouldnāt dream too much.
If the whole thing is wireless (as could theoretically be done by devices like the Jabra Speak 510 to 750 or the Anker PowerConf from S3 to S500) with Bluetooth dongles, that would be the holy grail
So in my situation, the Anker PowerConf S330 is surely one of the best solution that we know is really functional by @C64ever (even if it is not wireless)
Dunno about the soundcore as prob been mentioned before but with my memory forgot (I did a quick google and looked like just a BT speaker), bluetooth often seems a struggle for users in Rhasspy as have noticed a few threads before.
I would stay away from BT and go for an easier install with USB as depending on docker/not docker and bluealsa or pulseaudio it can get a bit confusing and also the shared sdio combo wifi/bt on the pi can be a little temperamental as one thing I do remember with an airplay/spotify/Bt on a Pi project it kept disconecting and did what the documenation said and disabled onboard. I did and used an external dongle dunno what the problem is but with an external it suddenly worked flawless.
PureAudio Array Microphone Kit for Raspberry Pi 3 is just a stereo usb card with a premade 2 mic, with a closed source beamformer and KW that dunno as if anyone has ever intergrated with Rhasspy I have forgot.
ReSpeaker_Mic_Array is more expensive than the AnkerS330 but minus a speaker but functionally very similar if you add a powered speaker, do remember think it was @fastjack that it could be hissy and noisey.
PS3Eye mic is just a USB mic and has no algs to beamform or aec that you can really use with it whilst the others are ready made contianed units and its already built in.
Nothing works well with āotherā ambient noises or music, static filtering some do quite a good job, but conference mics are of a different design focus than smart assistant mics.
If āotherā means your playing media on your device its not as bad as the situation with āotherā devices playing media, but then again even the latest and greatest from the likes of Google and Alexa can be poor in that scenario especially the older models and Alexa.
Its really hard to say what is good as can only make a comparison as do keep testing the Google & Amazon units when they come out and likely not as good as them, but if that is good enough only your own experience can answer that.
The Respeaker 2mic hat is prob the most budget friendly, but you still need a powered speaker and the software and install can be too much hassle, whilst the Jabra/Anker its all there and just plug in and often that sways decision more than ābarge inā and āWord Error Rateā.
So its real hard and very subjective to say what is good and prob just easier to make a different criteria.
Jabra/Anker like units for minimal software and maker fiddling of just plug and play vs the Respeaker 2 mic or USB sound card for a more budget concious but far more complex software and maker build, but many actually really enjoy the maker stuff than just sourcing off the shelf.
That distinction is prob easier to make.
Didnāt read through all the replies here but as I recently switched my approach towards the voice/user interface bit I have a spare Matrix Voice (without ESP32) and would give it away for free (just the porto would be nice)
(itās working all fine and all the Matrix drivers/software is still available + I got the LEDs work nicely - even from within docker)
So in my situation, the Anker PowerConf S330 is surely one of the best solution that we know is really functional by @C64ever (even if it is not wireless)
Yep! You canāt go wrong with it. Going on 3 weeks now with this setup and both of mine are still working great.
Hey @copitz
I donāt suppose you still have that spare Matrix Voice around, do you? It might be ideal for a little project that Iām working on at the moment. Iām based in the UK. Many thanks.
Well I have been in a similar situation a while ago researching in the best mics with dsp that can do NS and perform well in noisy environments like with a TV running in Living Room !
I echo with what @rolyan_trauts @donburch and @romkabouter say ! There is no good integrated mic solution that can perform similar to Amazon echo / apple / other commercial offerings !
I am working on below two DIYs at the moment !
Approach 1
Please refer to HiFiBerry DAC+ADC or RaspiAudio Mic+V2 - #43 by rolyan_trauts where I am working with @rolyan_trauts based on his recommendations!
Basically we are using a stereo mics with a preamp wired to a usb Soundcard on PI !
I havenāt had the time to implement it this week as I was travelling a lot in the last two weeks owing to work commitments but should be able to start on it today / tomorrow and let you know the status on it !
Approach 2
This is more like using a esp32-s3 board ! It has dual core processor with special optimisations for running ML Models ! Espressif the makers of esp32 has even implemented a pretty strong AFE solution that has all the good parts of a noise cancelling, BSS and AEC.
Their hardware has all the ML Optimizations to run their AFE Algs ! The Algs are there and you just need to enable them on the esp32-s3 board which calls for some skills in using their ESP IDF programming SDK ! AFE is even a qualified Amazon AVS solution for Alexa ! Read more on it below:
https://www.espressif.com/en/solutions/audio-solutions/esp-afe
https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/audio_front_end/README.html
Basically we need to have a base server like an intel NUC on which you can run Rhasspy, Home Assistant and Node Red ! You can then use esp32-s3 board to implement KWS And AFE algs to read the mic input , pass it on to Rhasspy base using MQQT for speech recognition and intent handling to execute the actions with help of Home Assistant or Node red integrations !
And yes there is a Rhasspy 3.0 developer preview that has web sockets which sound a better solution than using a MQTT !
I am planning to use ESP IDF to use their AFE solution for mic inputs and KWS and pass the processed voice to Rhasspy base for the rest of pipeline actions !
While this will be my second diy that I plan to work in the near future, I will detail the approach below !
Details on Approach 2
I. Setup Rhasspy / HA Base
- Set up Rhasspy 3.0 developer preview on a more powerful device, such as an Intel NUC. You can follow the Rhasspy documentation to install and configure the software. Setup websockets server
- Setup HA
- Setup NodeRed
II. On ESP Device (KWS Server)
- Set up AFE with wake word detection on a ESP32-S3 device using ESP-IDF
- Use esp-skainet to continuously listen to audio and perform continuous Voice Processing (using AFE) for AEC, BSS/NS, VAD, WakeNet
- Use WakeNet (part of esp-skainet) to perform Wake word detection. When wake word is detected send it to Rhasspy using Web Sockets. Websocket client is on esp32 and Websocket server is on Rhasspy in this case !
III. On Rhasspy / HA Base
- Rhasspy to perform Speech recognition to convert speech to text , and then intent recognition
- HA receives the recognized text or intent from Rhasspy.
- HA uses the recognised text to trigger actions on its entities , such as controlling home devices or sending data to other devices.
- Rhasspy also sends tts to a squeezelite device - may be another esp32 with dac / Pi with DAC. Configure Rhasspy to use TTS via WebSocket. In the āText to Speechā section of the Rhasspy configuration page, select āRemote WebSocket Serverā as the TTS provider and provide the IP address and port number of your ESP32 device.
IV. On esp-Squeezelite device or Pi with Squeezelite
- Setup websocket server on esp32 to listen for incoming websocket connections
- When a WebSocket connection is established, read TTS audio data from Rhasspy.
- Play the TTS audio data through the ESP32 speakers / Pi Speakers
I would have to design a nice diy enclosure to put the esp32-s3 device with mics along with pi / another esp32 with dac running the Squeezelite.
And then connect the enclosure to my soundbar ! Infact I could use a digi hat on Pi that gives me digital audio output through an optical out ! I can connect it to optical in of my soundbar ! The soundbar has its own sets of dacs , amps and speakers anyway !
That way I can have a smart soundbar in living room that works as a Rhasspy voice assistant !
Hardware preferred
I prefer to use ESP32-S3-DevKitC board with i2s mics for this DIY so I donāt need to worry about the ADC channels ! Infact I can still connect a ADC board to my esp32-s3 to pass a reference signal from pi or another esp32ās dac to this ADC ! This way I can also try out the AFEās AEC algorithm!
Also having the Squeezelite on PI / another esp32 device makes it a multi room audio player with LMS, Airplay and DLNA capabilities. I prefer to use a pi zero 2W with Max2Play to get above features like Squeezelite and also use it as a Bluetooth audio receiver !
Well I can also connect the enclosure to even a AV receiver instead of soundbar to make it a Rhasspy voice assistant along with multiroom Audio device and a Bluetooth receiver !
To be more specific, I havenāt seen any good Open Source options in my price range; and I reject getting locked into a multi-national companiyās cloud offering.
Your approach 1 is reliant on availability of Raspberry Pis, then adding extra hardware. When availability of RasPi 0 2W returns to pre-covid prices it is a reasonable - but not particularly cheap or high quality - option.
I have been attracted to the ESP32-S3 idea since I first saw rolyan suggest it - one device with all the hardware on-board, and just enough grunt to run the wakeword detection. But alas reality is still to live up to the promise. I now understand that the desirable software components are closed source and requires the Espressif development environment - which is probably seen as a barrier for FOSS developers.
Personally I am mostly a user - if I can buy a device cheaply that does the job I want, I donāt care so much what technologies it uses internally ā¦ as long as it works, and continues to work even if the manufacturer goes broke of changes their policy. Imagine if Ford or BMW executives decided to remotely disable all their old model cars to āencourageā customers to buy the new model !
Anyway ā¦ it looks like your steps I and III run on the same machine - a server doing conceptually the same tasks as a current Rhasspy Base system. Great, I see this is the way to go.
Would steps II and IV run on the same ESP32-S3 machine ? If you are adding a second ESP32 or a RasPi to the mix then where is the cost benefit ?
Well yes steps 1 & 3 are on an intel NUC / any other SBC but I would still recommend a used NUC . I bought a 8 Gb , core i3 one for 60Ā£ here in Uk over eBay ! I also have a old
Mac mini from 2011 lying around ! Or even if you have an old desktop then yes use it ! Well this is the part that actually Amazon / other commercial giants run in the cloud ! So we are actually using a used intel NUC or old desktop to run the rhasppyās speech recognition and intent handling on it !
Well step 2 runs on a esp32s3-devkitc board which is literally 10Ā£ - 12Ā£
And step 4 needs to be run on a separate esp32 device or a Pi with dac hat - reason is I want to use the first esp32 device based on esp32-s3 entirely for running voice processing Algs based on esp-skainetās AFE and wakeword detection !
So for step 4 you can use an esp32 wrover dev board or an esp32 audio kit , you can get either of these for less than 10Ā£
You can also use a pi zero 2W / pi zero 3A+ if you can get them but entirely optional unless you want 32bit / 384khz resolution coming off from a dac hat on pi ! You can also use a 5Ā£ dac breakout with hiRes output and connect to pi !
If you see the total cost factor for step 2 & step 4, using 2 different esp boards as I mentioned, it will be atleast 20Ā£ or a max of 30Ā£ if you include dac & other req components such as mems mics !
I can post links of components you can use when I do that diy and post the details. I used C a decade ago but just polishing my skills on esp idf framework ! Once done I will do the diy and post git repos that you can simply flash to the esp32/s3 board for step2. For step 4, there is already a git repo Squeezelite- esp32 that you flash on a different esp32 board !
Hello sanyasa sure,
still following this thread here and cool to see you are working on a solution!
There is just one thing I donāt completely understand yet how this will work with your solution:
You mention in your āstep 2ā that you would do also AEC on your ESP device which does the wake word detection, but at the same time you have in your āstep 4ā a separate device which obviously does the audio output like TTS, but also multi-room audio.
So how exactly does AEC work then?
Letās assume you play a song in your multi-room audio system, played by the separate device in āstep 4ā, how does the wake word device in āstep 2ā know what is played to be able to do AEC?
I previously understood, also from @rolyan_trauts, that AEC would be an important step and I would expect wake word recognition to be far worse if the currently played music cannot be āfiltered outā correctly.
As a result, I always thought it would be important to have both, wake word processing and mutli-room audio output, on the same device?
Best regards
Andreas
Good question ! I have given it considerable thought too and at first glance I thought AEC not needed at all since wake word processing and mutli-room audio output are on seperate devices !
I also agree both, wake word processing and mutli-room audio output should be on the same device using a hardware loopback or virtual Alsa loopback when using a Pi as a Rhasspy Satellite !
But thereās more to AEC !
Acoustic Echo Cancellation (AEC) can be implemented using hardware loopback or reference signal. Here are the differences between the two approaches:
- Hardware Loopback: In the hardware loopback approach, the audio output from the speaker is directly routed back to the input of the microphone through hardware connections. However, this approach may not be suitable for all applications as it requires specific hardware support like an extra ADC Channel !
This is the reason why wake word processing and audio output should be on the same device using a hardware loopback so a single clock controls Adc input from loopback & dac output ! Both signals willl be in sync n so comparison is made to cancel audio signal and seperate the voice command !
- Reference Signal: In the reference signal approach, a known test signal is played through the speaker and recorded through the microphone. The recorded signal is then used as the reference signal for AEC processing. ESP AEC uses this approach ! Instead of a recorded signal we feed the output from dac on other device that we use in step 4 to Adc on esp32-S3 device we use in step 2. For this we need to connect a 1/2 channel ADC board to esp32-S3 !
I still need to workout on this when I do the diy ! But in fact it is @rolyan_trauts who suggested me to use the reference signal when I initially pitched my idea to him first ! Of course this is needed when we want to use all below three in same enclosure
- esp32-s3 device in step 2,
- another esp32 or Pi with dac in step 4
- set of speaker(s)
In such case, sound from speaker of one source causes echo in mic input of another source when placed in same enclosure.
This can happen due to acoustic coupling or crosstalk between the speaker and the microphone.
Acoustic coupling occurs when sound waves generated by the speaker propagate through the air and interact with the microphone.
To reduce or eliminate this effect, you can try using sound-absorbing materials in the enclosure, positioning the speaker and microphone in different locations within the enclosure, or using directional microphones that are less sensitive to sounds coming from certain angles.
Acoustic Echo Cancellation (AEC) can help in this situation where sound from a speaker of one source causes an echo in the mic input of another source when they are placed in the same enclosure. Other techniques such as acoustic treatment or physical separation of the speaker and microphone may also be necessary to achieve optimal audio performance.
But yes assume you are not putting speakers in the enclosure and indeed connecting to a soundbar / AV Receiver then AEC may not be required as ESP AFEās BSS & NS would give good results !
But yes this is something I have to test and ascertain when I do the diy !
I am guessing that āMulti-room Audioā is a core requirement for you, and hence your step IV. I assume that your Multi-room audio is for playing music (which you probably stream from spotify or similar ?).
For me it would be a ānice to haveā; but my reference audio source is TV/soundbar from my nVidia ShieldTV media centre.
I will certainly be following your project with interest. wishing you best of luck !
Step 4 is mainly for Audio Out / responses of Rhasspy when it completes execution of commands / TTS from Rhasspy ā¦ the best way to get audio out easily is using existing Squeezelite-esp32 repo ! Ironically that repo also provides multi room functionality too which is a nice Add-on.
If I do not use Squeezelite-esp32 I need to program using ESPās audio pipeline framework on top of esp idf SDK , to get the audio out !
Itās just that itās a bit of more effort and I would rather focus on connecting mics to esp32-s3 device and implement AFE for voice processing Algs and Esp-Skainetās wakenet for wake word recognition ! The focus is more on creating mic array with advanced ML voice processing Algs that esp provides such as BSS, NS & AEC ! There is currently lack of availability of such mic arrayās commercially which we can integrate with Rhasspy !
Perhaps to get the audio out I might plan to use esp audio pipeline without using Squeezelite repo and get rid of multiroom functiinality ! That may not be in near future ! As I am planning a wireless 7.1 surround sound project after esp32 Rhasspy diy, that involves sending 8 channels of audio data from a second hand 7.1 dolby Atmos AV receiver to 8 speakers over WiFi without using any cables to connect the speakers to AV system !
Might end up using a cheapo esp32 board on each of those speakers and that time I definitely need to work on esp audio pipeline framework ! And perhaps I will use that knowledge to reprogram Rhasspy esp32 diy to use that framework for audio out rather than using Squeezelite solution for audio out !