Hello everybody,
what’s about creating an own: Category HARDWARE
So we can discuss about different raspis, mics, speakers and cases.
I think, there is still no perfect, which means cheap AND good solution for the satellites.
I’m using zeros and respeaker 2, but I’m sure there is something more.
Raspberry Pi 3 Model A+
The new Raspberry Pi 3 Model A+ looks good as an alternative to the zero. It is more expensive than the zero but with 4 cores. I will buy one to test it.
The right mic
For now, I use respeaker 2.
Does anybody know something about the quality compared to ReSpeaker 4-Mic Array.
Or maybe others?
The good thing in the small version is the possibility to connect speakers to. I think, this is not possible with the 4-Mic.
Later on, I will create a case for the raspi, the mic and a small speaker.
I tried the new rpi3a+ and I am think it reacts much better than the zero.
I Germany you can find zero for minimum 16€ and the rpi3a+ costs arround 22€
I created a case for it. Do you can find now three cases for a satellite:
@JGKK for me the Zero just doesn’t cut the mustard in comparison to the $10 more that gives 10x Zero performance on a single board.
Like Eben Upton I think the Pi3A+ for $25 is prob the best product Raspberry do and its interesting at the moment what might happen as the Zero architecture is getting very long in the tooth.
The CM4 module arrived with the base unit being $25 and maybe early next year we may see a Pi4A.
Maybe they can drop $5 off the Pi3A+ and that might continue @ $20 who knows but performance wise the Pi3A+ is a good fit whilst the zero is possible but the reality is it will be creaking under load.
I hate the drivers of the respeaker HATs but the 2mic as a low cost Hat is generally the best but for me I can get better results with a USB soundcard and mic.
We don’t have working opensource beamforming on Arm linux so any array is relatively pointless unless its built in.
We do have working AEC algs that run fine on a Pi3 but several 4 mic arrays don’t have audio out on board or share the same clock for AEC to work.
If you go for the 2mic then the driver here might be a better option than the awful respeaker one.
We are just waiting for Raspberry to catch up with the new 64bit OS and actually include the kernel headers even though there are scripts to enable.
I actually have used the 2 mic, the 4mic pi and the 4 mic usb from respeaker a lot over the last 3 years in personal and work projects. This post for me was mostly good because of the printable case as i was looking for a printable compact satellite case for a while now and I love the all in one form factor with a small footprint.
I fully agree that the zero is not an option anymore as it’s impossible to run things like precise on it and the single core just is not up to it. That’s why I love the pi 3a+.
For me most of your concerns are not a problem as I don’t use any of the assistants i build for media playback.
I actually found the the performance of the 2 mic hat to be adequate for most small and medium rooms and have been using since the early snips days before Rhasspy or Voice2json which is my fortes even came into the game. I also used it with my own homebrew pocketsphinx solution where I actually found that it worked better than the 4 mic pi hat (actually a thing that I find very interesting is that certain language models will work better with certain microphones as they seem to be closer to the trained source material or something).
I love the 2 mic also for the easy combination with the tiny 3w mono speaker as I only use the speaker for feedback and this in @kaykoch s case gives me a no solder all in compact machine that’s easy on the eyes and will work well for my usecases.
I own and have used the 2 mic, 4 mic linear and 4 mic pi and didn’t really like any of them for what I could do with a cheap sound card and active mic module that leaves my gpio open for other use.
4 mic usb just seemed out wack with the cost of the bill of materials, so never bothered with it.
They where an interesting exercise that now occupy my spare parts bin but as you can see I am not a fan.
Yes but I am, I like the the 2 mic for what it offers apart from the mic. Cases I can readily print like the one above (very important as my girlfriend wouldn’t allow me to have anything out in the open which looks too diy), easy installation, The speaker that can be connected that fits in the case, The very easily programmable multiple rgb leds and the low cost for something like 9€ that I have by the next day because it’s so easily available. I agree that the usb 4 mic is a steep sell for what it is and would love to see something with similar or better performance for a cheaper price.
All in all right now especially the 2 mic pi hat is just a very round Package for me. You buy one sub 10€ component that is here the next day and doesn’t need to be ordered from China and you have a mic, a small amp, an rgb led array and a gpio button no diy needed. For my usecase where I have no media playback coming from the satellite it’s just a good fit for just that, compact easy to assemble all in one additional satellites based on the 2mic, a pi 3a+ and a the small speaker for a total of about 35€.
Yeah we all have different opinions and requirements and what I was doing was looking at what I could do in comparison to commercially available units.
Because recognition via Google or Amazon is server based the recognition accuracy over my tests was far superior.
The beamforming and DSP that is embedded in their silicon is just missing and from echo, workplace noise we are all playing at copying a poor assimilation of the big data companies I want to beat.
Its also not $35 as by the time you get SD, PSU, case and ancillaries you quickly find yourself quite a bit above commercial product that is superior in every way apart from supposed privacy.
That due to popularity there is a rake of 2nd user models starting about $15.
That is what got me thinking about audio as I say I want to beat big data, have that privacy but this emulation of one of there products with honesty is lack lustre to say the least.
I wouldn’t use that opensource product at home, definitely not for work and the only place it has is an enjoyable builders experiment into the world of AI assistants.
In fact I am not as probably doing the same again as now playing with Peltier modules in a quest to make a near as possible silent dehumifier and likely by time I finish I will come to the conclusion the commercial guys just have me beat due to economies of sale.
The media thing wasn’t because that is how I use an assistant it was my thoughts on where I could beat big data where they are obsessed in consumer products that I can create better and also lower costs by diversification of use.
Part of the rationale for being a wifi sound system for a room is that would give me the inputs to many domestic sources that kill recognition stone dead such as hi-fi & tv.
I started seeing distributed wide array microphones being separate to true stereo audio and that a home processor is likely to be more 2001 HAL like so we can share a GPU.
In opensource we are desperately short of various audio processing algs, DSP and AI accelerators and you can build something up, but after you have there are some like me who might be more critical of what you actually have.
Its interesting and a good builders project but the reality is you can not do it for $35 and when finished apart from hobbyist/enthusiast maybe of questionable use.
So yeah we all have different opinions and some are far more critical but from the simple fact a wired mic module is far more flexible due to the flexible wires it connects on.
I don’t like the HATs at all as hard GPIO connectors just makes a whole lot of unessacarily for no advantage as if you are going to print a case you can create something far superior if you actually get choice of placement.
I actually think opensource could beat commercial offerings if it stopped copying and redesigned on diversification of use as the commercial stuff as each manufacturer wants ‘its own’ product isn’t very cost effective or interoperable and as said could be beat.
Is it that little box with a micro speaker, nope not for me and each to his own and like I say it does have everything but unless any of its actually any good what do you have?
You are right it really does depend on your use case. I had an amazon echo dot four years ago before I started my journey down the rabbit hole of diy assistants. When we had one I quickly learned that the main skills we use in my household are the weather now, today and tomorrow, multiple timers for cooking, the time and most importantly switching/dimming all the lights and turning scenes on.
We have been happily using my home made assistants for those tasks now for three plus years. The only request of my girlfriend was that it had to be able to tell random facts and jokes, which it can. So you see for us its more a voice interface to our home than a smart assistant. Maybe you just aspire to a higher standard but for me using it in real life for the last three years the respeaker mics where never the limiting factor. The biggest improvement i saw where always on the side of the asr models and systems.
The lack of beamforming for recognition with background or other predominant noise is a problem due to effective opensource Arm algs being missing.
The only thing we have got is the pulseaudio webtrc and it just doesn’t work hence why its being dropped upstream.
That and rather dated AEC is sort of strange for Linux and opensource as it is a mainstream function that we lack.
So for a while hardware wise I believe distributed mics become separate to audio out as both those are to a separate central processor for rooms.
The architecture and protocols has grown from a “home assistant” but this is a “voice” assistant and that all is required is a latency adjusted accurate network time audio RTP.
Its why I played with snapcast as its basis is practically all there but just needs a few tweaks and additions where assistant audio is separate from control protocol.
Streaming audio in a UDP broadcast embedded in MQTT can work but like the hardware its sort of same in that it has everything but is it any good?
Kaldi and other ASR have worked well for sometime getting the models to work on lesser hardware with smaller models has been much of the work to run on something like a Pi.
Tensorflow 2.0, Pytorch and various others can use far less process power now but the option was always there for a different type of infrastructure of a home cloud AI.
Everything is there but from infrastructure, system to hardware in implementation on a Pi for me I am still asking that question is it any good and probably being more critical than others.
All my gear is in my spare parts bin as decided nothing really quite hits the mark for me and was interested if other projects such as https://speechbrain.github.io/ might add some of the missing elements, guess we will have to wait and see.
Rhasspy & Mycroft where really interesting and enjoyable, so was the Echo & Google home but currently use none.
Well I think this is the big difference between us here. I have been using diy voice interfaces for three years in day to day life now. I don’t use Rhasspy so I can’t comment on the specifics of its architecture. All my assistants have been based around mqtt and nodered as the backbone from the get go. This is why I’m the one developing the node-red integration for voice2json which is in a very usable beta right now Im not an audio or hardware engineer so I will stick with what I do know things about and so for me the most complete and easy hardware will be the one I pick at any given time.
Thats the problem as for something that is extremely audio concentric we have many developers just dodging and blind sighting to actual needs as a ‘Voice’ AI does.
There is a rake of AI projects that have the middle ware but are lacking mainly around the audio input and KWS.
MQTT is a great lightweight messaging protocol but as an audio carrier it sucks big style.
MS is sort of frustrating as used to be a good developer but the effort now is just too much.