Complete Hardware Guide for Rhasspy - Beginners

Ok ! I saw there are many hardware noobs like me who have some level of knowledge on Linux and software but lack in hardware understanding !!!

This guide is for folks like myself who try to build Rhasspy Voice Assistant on Pi , need help in choosing the right audio hardware for Rhasspy and make DIY connections on PI etc …

A big Shoutout to @rolyan_trauts without whose help I may not have dived into this adventurous word of choosing right Audio solutions to build hardware for Rhasspy ! He is a sound engineering expert known across this forum for his very pragmatic advices on choosing right audio hardware within a budget !

Please feel to add your inputs or correct my mistakes and I am happy to include them in my posts in this thread ! Yes please let me know if you want me to continue posting about hardware basics … your encouragement and curiosity to collaborate is all that matters to gain collective knowledge on Audio & Hardware Basics !

2 Likes

Ok here comes the very first post !

Topic 1 - All about microphones

Active vs Passive Microphones

An active microphone contains an internal amplifier and requires a power source, such as a battery or phantom power, to operate. The amplifier inside the microphone boosts the audio signal, which results in a stronger and clearer output signal. the amplified signal can withstand the signal loss that often occurs over long cable runs.

A passive microphone, on the other hand, does not have an internal amplifier and does not require a power source to operate. Instead, the microphone converts sound waves into an electrical signal that is then transmitted to an external amplifier or recording device for further processing.

When it comes to voice assistants, such as those found in smart speakers, the type of microphone used is usually an active microphone.

Active microphones are preferred for voice assistant applications because they offer greater sensitivity and signal-to-noise ratio, which are important for accurately capturing voice commands and reducing background noise

In addition, active microphones are often designed with features such as directional sensitivity, beamforming, and noise cancellation, which help to optimize voice recognition and improve the user experience.

Passive microphones are less commonly used in voice assistant applications, primarily because they lack the built-in amplification and noise reduction features of active microphones.

While passive microphones can still capture voice commands, they may be less effective at distinguishing between speech and background noise, which can lead to errors in voice recognition and a less reliable user experience.

Diaphragm

A transducer is a device that converts energy from one form to another. Usually a transducer converts a signal in one form of energy to a signal in another.

A microphone diaphragm is a thin membrane that moves in reaction to external sound pressure variation. A microphone diaphragm is a key transducer component in converting acoustic energy into electrical energy.

There are 3 main types of microphone diaphragms:

  1. Moving-coil diaphragm (dynamic)
  2. Ribbon diaphragm (dynamic)
  3. Front plate diaphragm (condenser)

Refer to above link for more information on microphone diaphragm !

MEMS Microphones

MEMS (Micro-Electro-Mechanical Systems) microphones, on the other hand, use a tiny diaphragm that is etched onto a silicon wafer using microfabrication techniques. The diaphragm is suspended by thin, flexible arms that vibrate in response to sound waves, generating an electrical signal.

MEMS (Micro-Electro-Mechanical Systems) microphones can be either active or passive.

Active MEMS microphones have an integrated amplifier circuit and require a power supply, such as a battery or phantom power, to operate. The amplifier circuit provides gain to the microphone signal, which makes it stronger and easier to process. Active MEMS microphones are often used in applications where high sensitivity and low noise are required.

Passive MEMS microphones, on the other hand, do not have an integrated amplifier and do not require a power supply. Instead, the microphone element produces a small electrical signal that must be amplified by an external circuit or device. Passive MEMS microphones are often used in applications where low power consumption and a small form factor are important, such as in hearing aids and other medical devices.

In general, active MEMS microphones are more commonly used than passive MEMS microphones due to their higher sensitivity, lower noise, and greater ease of integration into electronic systems.

Electret Microphones

Electret microphones use a polarized electret film as a diaphragm that produces an electrical charge when it vibrates in response to sound waves.

An electret microphone can be either active or passive.

A passive electret microphone relies solely on the charge stored in the electret diaphragm to generate an output signal. It does not require any external power or amplifier circuitry, and its output voltage is typically quite low.

An active electret microphone, on the other hand, includes an amplifier circuit that boosts the output signal from the electret diaphragm. Active electret microphones require a power source, such as a battery or phantom power, to operate. The amplifier circuit in an active electret microphone can provide higher output voltage and improved signal-to-noise ratio compared to a passive electret microphone. Active electret microphones are commonly used in consumer electronics applications

Electret and MEMS microphones in voice assistant applications

Both Electret and MEMS microphones are commonly used in voice assistant applications

Electret microphones are often used in consumer-grade voice assistant devices because they are inexpensive and can provide good enough voice recognition performance for many applications. They are also relatively easy to integrate into electronic systems and require minimal power. Electret microphones are well-suited for use in devices that are not battery-powered, such as smart speakers and home automation systems.

MEMS microphones, on the other hand, are preferred for more high-end voice assistant applications, such as those found in smartphones and wearables. MEMS microphones offer high signal-to-noise ratio and superior voice recognition performance, which is essential for accurate and reliable voice control. They are also extremely small and low power, making them ideal for use in compact, battery-powered devices.

Type of microphones used in Alexa, Siri and Google Voice Assistants

In Alexa, Siri, and Google Voice Assistants, both electret and MEMS microphones are used depending on the device and its specific design requirements.

For example, in Amazon Echo devices, which use Alexa voice assistant, both electret and MEMS microphones are used depending on the specific model.

For instance, the original Amazon Echo used a 7-microphone array that included both MEMS and electret microphones, while the Echo Dot uses a single electret microphone.

The newer Echo and Echo Plus devices use a 7-microphone array that includes MEMS microphones.

Similarly, in Apple’s Siri-enabled devices, such as iPhones and HomePod, MEMS microphones are used for their high sensitivity and low power consumption. Apple’s AirPods also use MEMS microphones for voice recognition.

In Google Home devices, which use Google Assistant, both electret and MEMS microphones are used depending on the specific model.

For example, the original Google Home device used two MEMS microphones, while the Google Home Mini used a single electret microphone.

3 Likes

How many and what type are in the newer Nest Audio as even from pics I can not work it out?

Mems suposedly have higher tolerances but to be honest I think selection is much due to size and ease of process off a reel.

That’s just an approximate idea based on some specifications and data sheets of voice assistants I have seen long ago ! No idea about the newest Nest ones ! However trend is they use combination of both Mems & Electrects in high end and just a single electret in Mini / dot !

I think the original Google Home had 3/4 forgot if mems or not and Alexa something crazy like 6. I have forgot to be honest but am aware that actually they have reduced mic qty, whilst Amazon seems to be still using very similar tech as they used on release.
Google have dropped to x2 I think as they possibilly don’t even do beamforming at all and just rely on targetted voice extraction.
I have a hunch newer Nest have some clever algs and silicon that is part of thier voice-filter-lite and they started offloading cloud function to local process, as servers cost money.
Amazon haven’t and I think its one of the reasons Alexa generally is leaking money like a sieve.

1 Like

That’s an interesting Relevation to understand why money is going down the drain in Alexa org ! I do have couple of friends heading the Alexa org in UK and it’s always full of chainring dynamics ! But yeah they rely lot on their Aws cloud servers instead of local processing ! May be that’s how they keep the processor and RAM low on Alexa devices !

Amazon are not making money from there services maybe apart from Amazon prime music / vids, but generally many of there ideas such as the ‘button’ and automatic Amazon shopping have been a fail.
I think on the Nest units only KWS is local and even a model might be generated server side, but they are slowly pushing more and expecting when price is right next gen ‘Nest’ may contain the ‘Tensor’ TPU and similar process of a model may be generated server side but ASR is processed locally on users energy and the units generally still force or make preference Google services.
I have forgot the CPU Amazon have created as I do with most things but from hardware to services offloading process and energy Amazon has been very slow to react and leaking badly whilst Google overally is boasting healthy profit even if much of the voice services in Google terms is posting a small loss.

I’m running Whisper & Piper in docker containers on a QNAP NAS, integrated with my Home Assistant using the Wyoming Protocol, and it actually works pretty well when I run the Home Assistant FrontEnd in the Edge Browser under Windows 11 on my Dell portable i7 PC.
But on my Galaxy S23 Android Mobile using the Home Assistant companion App, it is almost impossible to get a correct Speech-to-Text hit. Just to clarify, I run both clients on my local 5GHz WiFi network. I just assume that my mobile phone has a high quality microphone as I’ve never had any issues with other apps (Google or Alexa). So what can it be that causes one to work well and the other to fail almost 100%