Hardware Advice? General Advice?

Hi, I’m completely new to this. I’ll state what I’m trying for.

I’m being threatened with an Amazon Echo, but I know all the damn thing will be used for is a voice activated radio. I’d like to cut that one off at the pass, but to be honest - I do like the idea of branching out into some home automation later.

But for now, I would like just that, a voice activated radio/streaming device where everything should run locally on a single device, without having an open mic broadcasting to some cloud service somewhere.

Software Questions:
Is it actually possible to integrate a music streaming service? Does anyone here have experience with that?

Hardware I’m eying:

Raspberry Pi 4 Model B (4GB variant)


Is this powerful enough a computer to run a standalone instance of Rhasspy with a ‘largely vanila configuration’, with PicoTTS as the text to speech service?

Speaker and 30W; 4 Ohm Impedence.

Microphone
https://wiki.seeedstudio.com/ReSpeaker_4_Mic_Array_for_Raspberry_Pi/
https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/

I already own this speaker so I’d like to use it, but I would like to use a respeaker mic array.
It’s 30W speaker, which means I’ll need an amp. But I can’t use an amp hat for the pi, as the pi only has one i2s bus, which will be needed by the respeaker.

If I choose the more expensive USB variant will that solve my problem? (aka, can I then use an amp hat?)
Or will I still have a headache somewhere when I’m configuring alsa?
If it doesn’t, and I still choose the more expensive one, can I amplify the signal coming out of the 3.5mm jack on the ReSpeaker? Or will that just introduce noise?
Will having a speaker so close to this microphone array cause me problems when trying to detect a wake phrase?

I gather that this hardware is probably misspec’ed, I’m open to suggestions.

My technical background:
I’m a professional game developer, so debatably I can program.
I’m an amature electronics hobbyist, bunch of small arduino and libopencm3 projects. Never touched audio though. Never used a raspberry pi as anything other than a CUPS server.
I daily drove linux for a few years, I’m fine with a command line, I’m fine with technical documentation for the most part.
I’m alright at modelling things in Fusion360, which is my plan for an enclosure, which will be 3D printed.

Hi there and welcome to Rhasspy :blush:.

I think I can help you with some of your questions.

It is completely possible. I’m using TuneIn API to get the radio station broadcast URL from the station name (names are defined in advance in a Rhasspy slot) and mplayer to stream it.

The RPi4 should be powerful enough (CPU wise) to run both Rhasspy and an audio streaming service. 4GB should be plenty enough RAM :+1:.

If you plan on playing audio and do vocal commands, you will require what is called AEC or Acoustic Echo Cancellation.

For AEC to work, you’ll have to output audio using the same hardware clock as the audio input or you won’t be able to control anything once the playback starts as it will saturate the mic.

There are multiple topics on this community forum about AEC. It is a pretty complex problem.

2 options:

  • Hardware AEC: The Respeaker Mic Array V2 is doing AEC natively using an onboard XMOS chip. Audio in and out using this device will work (same hw clock). An audio amp will have to be plugged on the speaker pins or the 3.5mm jack. Beware that the audio playback sample rate of the board is 16KHz max.
  • Software AEC: Using a combination of MEMS mics and an DAC+amp on the RPi GPIO and a software AEC have been reported to work (not as good as hw AEC though) but requires more configuration and will use more CPU (no problem with a Pi 4 tough).

Yes. Without AEC not a chance. With AEC and audio playback above a few Watts, I do not think the ASR will recognize the vocal commands. You’ll have to lower/mute the audio playback for the ASR to work when the wakeword is detected so it highly depends on the wakeword detection capacity (Amazon Echo wakeword detection is absolutely incredible). If you can get the wakeword to be detected reliably during audio playback with AEC then the rest will be quite easy.

The speakers and mics orientation, position and insulation will be paramount. 3D printed case will have to be specifically designed for this and measures will have to be taken to further insulate the mics from the speakers.

I highly encourage you to read some of this community’s topics about « AEC » to get a good overview before attempting this kind of setup.

Hope this helps :blush:

1 Like

With my PSeye on the Pi 3B+ master and the 2 mic seeed hat on the Pi Zero Satellites, its quite tolerant. I can mostly control rhasspy while a tv show is running if it is not too loud at all or there is a conincidence that they say something it makes an intent from. I am really surprised how good this works having the PSeye 50cm left of my center box in the living room or the satellite behind the laptop playing a movie in the bedroom.

Of course it will be best to isolate it as good as possible, but I did not get the idea that I should do something here yet and did not get into trouble. I would say try and improve if necessary.

Thanks for the trove of information!

Isn’t 16KHz landline quality audio?
Their product page mentions a 48KHz max sample rate, and there’s this issue on git, opened 2 years ago. Looks like they had a go at upping this in firmware, but it seems buggy.

Not so sure that’s actually fit for my purposes then, was lead there by this blog post on microphones for the ill-fated snips.ai

Matrix Creator (or more likely Matrix Voice as I don’t need all the stuff the creator has) looking like the runner up. With the PSEye as a contender simply on value.

Do you happen to know if there’s a similar limitation with the audio output of this board? If there is I think I’ll go PSEye and the software AEC approach. Can’t have it sounding like it’s playing telephone hold music.

Note that the Matrix Core/Voice does not do AEC at all (contrary to what is stated on their website). The only hw AEC board that work (and does not cost a fortune) is the Respeaker Mic Array V2.

For software AEC, you can look at @rolyan_trauts posts for some guidance/ideas. He worked extensively on low cost software echo cancellation with good results.

If you use a RPi 4 (with lots of CPU) I suggest to look into PulseAudio that provides an echo cancellation module based on webrtc that works pretty good and allows output above 16KHz by doing some clock syncing magic with webrtc adaptative echo cancellation stuff.

Hope this helps.

2 Likes

Strangely I never got pulseaudio webrtc to work well, if you managed @fastjack please do tell :slight_smile:

I found https://github.com/voice-engine/ec to be far the best and its really strange as if you use the alsa plugins that share the same libs of SpeexDSP it is awful via the Alsa plugins but fine with the above repo and also same seems to happen with Pulseaudio.
WebRTC is supposed to be great but from what I have tried on the Pi and pulseaudio its pretty poor.

I am like @fastjack maybe even more critical of some of the soundcards and supposedly hightech EC usb mic arrays as there seems to be always a touch of snakeoil and with some they are soaked in it.

The secret is to get mic and audio output on the same card so there is no clock drift.
The cheapest is to use 2x I2S mic modules and the Pi3.5mm, but the 3.5mm is infamously bad for HiFi quality.
Most usb soundcards will do the job but they are practically all mono mic in which actually isn’t as bad as it sounds as couple it with an active mic module they are still quite sensitive.
I have only found a single stereo mic USB and its cheap Enermax AP001E DreamBass, but the ‘Dreambass’ is too pronounced for my liking.
Syba supposedly do one but both of the 2 I purchased turned out to be fake with a C-Media mono mic in.

Also you can get a Hdmi2Hdmi+Audio for less than $10 and couple that with 2x I2S mics and that is far superior to the 3.5mm of the Pi.

There are some respeaker cards that have mic in and audio out on the same card and they will work with a vanilla setup but boy the drivers are poor and its almost a dead cert you will end up in a dependency hell or they just stop working.

With the above https://github.com/voice-engine/ec it can be quite heavy on load but will run well on a Pi3A+ as it only runs when audio is playing so via diversification of current process will work well with a lesser CPU than a Pi4.

PS the SpeexDSP in the raspbian repo are quite old and installing them and then grabbing the lastest from gitlab and compiling is what I always do.
I have been meaning to do some new tests with the new super 64bit Raspbian OS but until they include kernel headers it can wait.

If you need enough EC so that ‘barge in’ under media play works quite well without silly cost and hardware then something in the above and that repo is about the best hardware/software combination I have found.
Media playing is such a common requirement that unless you want a Gump Rhasspy, as without it its not likely to hear you say stop.

https://www.scan.co.uk/products/enermax-ap001e-dreambass-usb-soundcard-plus-earphones-genie-with-integrated-80-hz-plus6-db-bass-boos

Cheap I2S mems mics

Supposedly better I2S mems mics (adafruit clones)

With the brilliant adafruit drivers and wiring tutorials here

Also its up to you how many active analogue mics you put on an input but cheap ones are

Fancier ones are

mems analogue ones are

I’ve just ordered 2 MEMS Adafruit clone mics on AliExpress to test a stereo mic input and Raspberry 3.5 jack output to put voice-engine/ec and pulseaudio to the test.

I also like the idea of not being stuck with specific expensive hardware so I can easily fix faulty parts if needed :wink: .

I’ll report back if I can get something up and running once the package arrives.

1 Like

Yeah if its any help I found butchering some 12" Pi jumper leads worked well with stereo you just need 5 wires if you loop out gnd one side and vcc the other on the L/R select.
Managed to lightly solder a loop and force it into the back of the corresponding female connector.

I actually quite like because unlike hats it then becomes quite easy to place and position your mics in a enclsoure as they are just a tiny board on the end of a wiring loom.

This sort of thing, you end up using more than 5 but the connection to the pi is just 5.

1 Like

Great minds think alike :wink:

This also leaves GPIOs free for a LED ring or a display to get fancy. :man_dancing:

1 Like

Exactly another reason I don’t like some cards even if I2c can be handy, far better to have easy access to all the gpio not used.

The usb and powered analogue mic modules works really well also even the electret microphone are sensitive and you can adjust gain via gpio with gnd/vcc/float to a single pin.
Again completely leaves the GPIO apart from vcc/gnd pins used.

AGC seems to be a bad idea with EC as it seems to confuse the algorythm, not really sure about the math behind it.

Its not because its cheaper as for me it has many advantages and now see it as a prefered hardware choice with the Pi.

The ultra-cheap I2s far field could of been better and was far less sensitive than the active analogue going to a usb soundcard.
I meant to do a routine using gpio to select the gain on the analogue but never bothered as had the gain slider practically at zero.
If you can pickup one of those edimax usb soundcards in France I suggest you do as cheap stereo mic in and worth the test with analogue modules also.

I am always confused with the mems in terms of orientation and don’t think they are all the same but many seem to want to be at 90’ of the orientation you would have with a facing electret,

To clarify a little here.

The general options you’re recommending are;

  1. A comodity USB sound card, e.g. Enermax AP001E DreamBass with I assume some off the shelf stero-microphone plugged into the stero jack?

  2. A card that attaches to the pi via the gpio headers, (for the respeaker cards specifically, a respeaker 2 mic array hat, or a 6 mic array hat seem to be the ones with an output jack or speaker out.)
    You say driver support is shoddy, how about for similar products? If I no longer care about hardware AEC, any idea how the Matrix Voice stacks up?

  3. Daisy chain a few i2s microphone breakout boards, then split off the audio channel from the hdmi. Could you clarify the advantages / trade-offs with the microphones you mention? Active analog vs… I can see mems comes in both forms, so passive microphones? I’m not clear on the distrinction of these components here.

I’m not personally fused about the GPIO headers being consumed by the sound card hat. If I want expansion, I’ll do something shoddy like ram an arduino nano in there. As a lot of these boards still free up i2c, so I could communicate easily enough.

Any comodity USB sound card but rather than some off the shelf passive mic use active analogue modules as the Pi has the power sources.
Enermax mentioned as its stereo ADC but mono is just as valid as is also how many mic modules you use on a input.
Most soundcards get a bad reputation not because of the mic input but because the passive mic used is relatively insensitive.

You can get relatively sensitive active mic modules , some are much better than others but all are not much more than a few $ and really lend themselves to those who are going to DiY an enclosure.

The respeaker card drivers (not usb versions) lock you down to a specific kernel version and why its shoddy is actually there would seem to be no real reason apart from they have chosen to do so.
The simplesoc drivers they use are very similar to the free drivers adafruit use and its bit of a mystery to me why they are so problematic with common linux software like pulseaudio.

There is also a slight bit of snakeoil with all the far field microphones where command voice has to be the predominant volume.
The work great in distrubtuted noisy environments where the command voice is predominant or from a distance in silence.
Across what can be very common noise sources such as TV or HiFi where the command voice becomes less predominant far field recognition reduces drastically.
They do work but tend to be high value comodity product that don’t work as well in common situations as many presume.

Prob the 2mic Respeaker is the best as its cheap and if the drivers bug you it was cheap.
USB wise they tend to completly throw out of whack where I would see a completed voiceAI land price wise.
Matrixvoice never tried as think for price its a crazy solution.

Cheapest route is the Pi has an I2S interface and x2 I2S mems mics can be sourced for a couple of $ download the adafruit drivers and there you go.
Analogue use on a soundcard, I2S use if you don’t have a sound card. There is another I2S interface on the PI but I think its purely a current driver problem that they conflict with things like 4x I2S mics or 2x and a DAC.
I don’t know the full in and outs of I2S spec but you can not ‘daisy’ chanin I2S mics like you can parallel analogue.
On the Pi it send left/right in a low bytes / high bytes so as well as the interface clk it also has a channel clock where x2 I2S mics will multiplex in the low/high bytes of the I2S word.
With a single mic one is just missing.

Passive mics just send a very low level signal to a soundcard that is then amplified and often the gain and level on the sound card doesn’t give great gain and the signal is very susceptible to noise (actually the noise is just amplified geatly as the signal is)
Active mic is just a little gain circuit at the mic source where the oppisite is true, less likely to pick up noise as the signal is much bigger and will need much less amplification on the soundcard.
I said above some are better than others but its more a matter of getting a reasonable match in signal level and soundcard amplification.

So here is a recap and because the high end offerings are not all that spectacular in terms of price hike some see the lower cost offerings as acceptable and even preferable.
Prob easiest is any USB soundcard with and active mic as the passive sensitivity can be lack lustre and low.
Anything other than mono on USB audion seems to be extremely rare apart from the edimax that does give the possibility of 2 channels and any number of summed mics on a channel if you wish.

If your not bothered about HiFi quality but just want resonable voice output then the 3.5mm is acceptable and all you need is 2x I2S mics and it creates a very cheap and effective system.
If you then think that 3.5mm audio wise is a bit damn awfull you can still have an upgrade to a hdmi2hdmi+audio without making your I2S mics a waste of time and money.

This is just opinion but self contained singular consumer array microphones are probably a misnomer as essentially the where the consumer encapulation of wide array conference microphone systems into a single consumer unit via some heavy DSP lifting that many still struggle to do well with what is available.

Due to the speed of sound its pretty obvious the distance it takes between mics on a small consumer array units a couple of inches in diameter is much different to the distance a wide array conference room as it quickly changes from sub millisec processing time to a few millisecs.
Then you get predominant noise interference where a wide array has a huge advantage as 1 mic is likely to be nearer to noise and another is nearer to command voice and cancelation and detection is hugely simplified by positioning.

We have a current fad of trying to shoehorn audio capture in singular consumer devices where it can even be more costly than several wide array capture points and in many ways it doesn’t make any sense and its likely we will see more satellite systems become the norm as they are better by pure and simple physics of sound, distance and positioning.

Respeaker 2 mic.

Is a great little piece of hardware all-in-one button, mics, i2c, audio out and a couple of leds.
Brilliant piece of hardware such a shame about the drivers as without doubt if they sorted them its a killer $10 HaT.

It allows you to stick it on a Pi and go. Only problem after that for many is how to get it into an enclosure effectively. Brilliant if your going to just stick on a Pi but from a point of an enclosure you left scratching your head of why they created it so.

Some of us like the more DiY options as even if it does take a soldering iron and a bit of building its actually far more felixible and dependending on skill levels can give some really polished end product.
Many when first viewing see cards and high end modules as far superior but get to the end of a project or 2 this can often change.
Same with EC many don’t think its needed then after completeion some media playing activity comes to the fore and getting the thing to stop apart from walking up to the mic and screaming “Stop” does little.

I have a 2 Mic and 4 mic respeaker on my desk gathering dust and yeah they where great but that is all they are likely to do now (gather dust) they are a great all in one starter but very inflexible in terms of a finnished enclosure unless that is all you want as apart from the driver the 2 mic is great.

1 Like

@rolyan_trauts I think they sorted out the kernel issue for the Respeaker 2 Mics Hat (the fix has been pushed into the Raspbian kernel a few months ago). When I installed the drivers, it skipped the specific kernel headers download.

Hopefully as pinning to an old kernel version was far from good.
I may brush off mine and give it a try, might not though.
Haven’t got a working AI at the moment don’t like the KWS options and a few other things with 2.5.
I do have x2 enclosures with snapcast 50watt amps and 8cm speaker running over wifi.
I say running don’t really use them, but they may stay assembled for a while.

When we get an alternative KWS prob will become a voice with x1 being a Pi4 and the sever with my fave of the Pi3A+ as a satellite mic in a stereo pair and part of my wide array microphone.

Not done much but check the forums for whats new and to reply on the likes of the above.

@andywm PS my stereo is a bit lobsided as was more of a test than anything.

One side has a

The other.

Did not pay that much for the tectonic speaker think it was under £20 but link just shows model (might of been £25 can not remember now).
I was just doing some tests on quality difference and actually the cheap visatron and cheaper amp seems very little difference to the more expensive ones.

There are quite a few Class D 50watt amps out there all pretty good even the couple of $ cheap china ones produce pretty much the same audio quality as its the same chip.
The ratings are all FuBar as often rated as 100watt but that is with 10% THD into 2 ohm and good luck finding a 2 ohm speaker and sod running with 10% THD.

If you run off 24VDC into 4 ohm its a good match to the approx 30watt RMS speakers that are avail and the TPA3116 amps are prob the best choice and theres a lot of versions avail.
I like the above 2 as they have a standby input as there is a slight hiss with no media from them that you can enable standby from gpio to stop that.

It was a toss up with the Tectonic or a dayton audio but wanted to give a BMR a go. Prob now for keep with the cheaper speakers and TPA3116 as better amps there was quite a jump up in cost and better class produce far more heat, heatsink far more space.
Everything is a compromise but TPA3116 and those reasonabilly cheap Vistron cones are actually really effective for what they do.

Memory came back as RS do them far cheaper than anyone else and was under £20.

https://uk.rs-online.com/web/p/speaker-drivers/8765297/

Cheers for the infomation.

I’m just going to get the ball rolling and order some stuff.

I have a power budget of 60W, as I want to run the thing off one of those USB C fast chargers. I ordered a TPA3116D2 amp board. For now everything will be 5V, but I want to get a USB C power distribution board, like a ZY12PDN, to give me a higher voltage rail for the amp to bring the current down a bit.

Going to stick with a single speaker for testing (and I don’t have budget for another 30W speaker), the Dayton Audio RS75-4 3 I mentioned in my original post. I may revise this if mono sound sounds awful…

Mic wise, I’ve gone for the ReSpeaker 2 Mic board for initial testing, just to give me something to write an initial configuration against. But I’ve also ordered some mems mics (still passive, will look into analogue active mics), and a Enermax AP001E DreamBass to evaluate. I want to also try HDMI audio if that proves to be an annoyance, but I’d like to find a digital to analogue converter for the signal that isn’t a giant box. Surely there’s a nicer solution for that.

You will never get the wattage with 5v. 24V gives a max of just over 40watt, 12 obviously is half that.
I use a 24Vdc and just a simple Buck regulator for the Pi.

Again cheap as chips and also its 5.1v if via a type-C.

I already had some CCTV 5.5mm barrel terminals and some type-c adapters so a little unorthodox but works.
I just set the buck to 5v with no load plugged it it and upped the volts a touch to 5.1.

But your on the right track its prob a choice of Usb or hdmi and USB is prob better apart from the dreambass for some might not be so dreamy.
Prob something in the region of 100nf DC blocking cap (in series) would attenuate the bass if it really annoys.

Mono sound sounds great as even with x2 speakers if there isn’t physical spacing it will be mono with x2 speakers mixing both channels as that is how our ears work.
I just have x2 units one a master and another slave running snapcast in a stereo pair, but these 2 speaker jobs in a tiny single enclosure are just a misnomer.

Your single speaker will be fine and those dayton audio are supposed to be great.
If you don’t have 24V you will get no where near 60W.

As a tester cause they are so cheap try one of these on 5v and then give it 24v.

60W is my budget, it’s the max power the fast charger can deliver. The speaker is only 30W.

Yeah, I do have a handful of LM2596 based bucks actually. Let’s see then;

Speaker Impedance is 4 Ohms
We need to provide 30W for max volume.

Power is P = I*V, and Ohm’s law is V = IR.

So our target voltage is P = V^2/R ==> V = sqrt(PxR)
V = sqrt(30x4) ==> 10.96 V

and current is I = sqrt(P/R) ==> 2.74 A max current draw.

So I’m currently providing about half the power I need, which is fine for testing I think…
Got the amp today, it’s bloody loud enough already.

I don’t currently have any microphones, so I can’t test any further. But I might at least dd an OS onto an SD card for the pi. Speaking of which, when did the Pi switch to these micro connectors for everything. I don’t own a single device I can plug into micro-hdmi, nor do I have micro usb keyboards. SSH will be fine, but they’ve definitely complicated the out-of-box experience with these niche connectors.

Yep a pain in the A changing the format. The Pi4 really is a set top box chip and it had another HDMI so they used it, but changing the connectors has had same effect on me.
To be honest my fave for $ is the Pi3A+ and at least that still keeps original connectors.
Pi4-2gb is great value also.

If you got some I2s mics I have found the cheap seem to be as good as the GY-SPH0645 ones.

If you want Pi3 perf with audio out and x4 i2s mics with dsp VAD then that Rockpi-S is a killer price @ $13.99.

There was a great post on editing the DTS.

Just remember she has a custom rom so if using the official check the dtb name.

I might have a go with the usb audio gadget as with the ec software even if not a kws satellite these will prob make great wifi speaker/mics if I struggle to get usb audio gadget up and running.
Need to add a softvol and up the volume a bit and also do the same with the dtb and enable the VAD.

We have had a lot of conv always surrounding the Pi but for 4mics with stereo out the inbuilt codec and i2s is by far the cheapest option.
Its actually 8mic as it has 4x Din for I2S L/R pairs but the DSP VAD only works with 4 max.

I lost interest for a bit but think I am going to spend some time with this and do a write up as Quad-core Cortex-A35 with all the audio just connect to amp and I2s mics.
It does have 8x ADC but given up trying to work out the wiring and setup, hence the I2S.

1 Like