Audio output to Gstreamer?

Hi, I have recently started setting up Rhasspy. light control is working, but like many others, I also want to play Audio. For that, I use gmediarenderer, a GStreamer plugin that adds a DLNA-renderer. As I do not need multiroom, synced audio, this seems by far the most simple, compatible, and stable solution.

Now the issue is, that aplay commands by rhasspy seem to error while gstreamer is outputting sound (easily testable by running gst-launch-1.0 audiotestsrc ! audioresample ! pulsesink and then trying TTS in Rhasspy). But as Gstreamer has no issue playing multiple sources, I thought I’d simply direct audio from Rhasspy to Gstreamer. Alas, that is where it ended for me.

I tried several parameters for Local Command and program /usr/bin/gst-launch-1.0, like filesrc location=/dev/stdin ! decodebin ! pulsesink but all of them gave a non-zero exit code error and none played any sound.

I’ll take both help about how to get Rhasspy to output sound to a local gstreamer, as well as an explanation why I shouldn’t do what I’m doing and do something else instead :wink: Almost all my Linux experience is on a server, audio output was never something I had to handle.

edit: I got it to not throw an error with this:
fdsrc fd=0 ! wavparse ! audioconvert ! autoaudiosink

Sadly, it also does not play the audio. I checked if it should work by doing cat StarWars3.wav | gst-launch-1.0 fdsrc fd=0 ! wavparse ! audioconvert ! autoaudiosink which correctly plays.

edit2: I went low level and tried this with a downloaded recording. It also plays properly :frowning:

 inf = open("play-recording.wav", "rb")
 data =
 inf.close()["gst-launch-1.0", "fdsrc", "fd=0", "!", "wavparse", "!", "audioconvert",  "!", "autoaudiosink"], input=data)

Maybe there are some good tips in this topic

You should be able to use an alsa mixer to output both sound sources, but you might have the most luck if you do it with the same user in the same user session.

Thanks, I’ll have to try the executable shell script as someone said the command didn’t work for them either.

@WallyDW I sadly do not understand what you mean :confused:

Alsa is the low-level sound driver in Linux and it usually have a mixer device that can handle multiple inputs.

Alsa have issues with sharing the device between different users and also the same user, if tht user is logging into different sessions.
Gstreamer might actually have the same issues.

1 Like

I’m trying to understand the problem. It looks like you fixed the issue but I’m confused by “it also plays properly” with a frowny face.

I was able to get both your gstreamer test to pulseaudio (“gst-launch-1.0 audiotestsrc ! audioresample ! pulsesink”) and Rhasspy TTS using your local command (“gst-launch-1.0 fdsrc fd=0 ! wavparse ! audioconvert ! autoaudiosink” or “gst-launch-1.0 fdsrc fd=0 ! wavparse ! audioconvert ! pulsesink”) playing at the same time (although the tts was a bit crackly), so I imagine there is a permission issue somewhere in your stack. Pulseaudio runs in user space, so permissions can get tricky, but I’m sure GStreamer can handle it.

Please let me know if you got it solved, and if not I’ll try to figure out how to troubleshoot.

As far as I know ALSA isn’t user based Pulseaudio is but Alsa is system wide where even a user .asoundrc is just a user overide of /etc/asound.conf but guess if you create a pcm in .asoundrc then that could be unique to that user.

If it was a user permission and you didn’t have access you probably wouldn’t hear it or something is terribly flawed with linux permissions.
Also default alsa doesn’t share devices and again if you try to use one in use it will fail as busy.
You have to set up with dshare/dsnoop for multiple access to sources/sinks.

Usually if its crackly then its a conversion problem which are often caused by specifying hw: alsa devices than the equivalent plughw: which does auto conversion for you, but also could be something like pulseaudio and what your using clashing over a single access device not setup with dshare/dnsnoop (dmix) I should say dshare is a parallel sink bit like a pulse monitor if I remember rightly .

Sometimes a common prob is the confusion caused by docker as the device is shared by the docker run but inside the container no asound.conf exists and you can be doing tests on the host which work because asound.conf does exist on the host but unless you create one or share it also there will not be one in the container, which has completely confused me a couple of times.

Correct ALSA is not user based and that is what makes it difficult.
The problem is that the devices have permissions too, but with devices a 644 is not user, group, everyone, but instead user in current session, user in other session, everyone else.
This means that even though you use a mix or share device, then you will not be able to access it from more than one session with the standard permissions.
I have fought it for a long time trying to get Rhasspy and a Snapclient to run on the same machine both with their own users.
I made it somewhat, because I could actually make them play at the same time, but it was a hell of a fight to get the permissions set on the device. The permissions on the device is reset quite often, so you need to know when it happens and recorrect it again.
Multiple users and session is possible, but I would not recommend taking that road due to the complexity.

I had to scrap my solution anyway, because it turns out a mix/share/snoop device have no timing feedbacks, so Snapcast can not sync with such a device.

I know now that the best way to share a sound device between users or sessions is to use sockets or pipes, but it is still a mess.
Pulseaudio can do it this way and pipewire seems to have a little better options, but it nowhere near as easy as it should be.

I think the problems get worse because with have such a mixture the fire fighting solutions we have make things worse as we could be using pulseaudio the whole way through and apps that are alsa only can use the pulseasla interface.
I have a really bad memory but out of any item in linux to setup that causes me the most problems its linux audio but actually you can share alsa devices with no problem if the IPC is setup correctly.
Alsa is the most confusing and unwieldly system ever created made worse by the most horrendous documentation ever.
Pulseaudio is amazing for switching sources/sink on the fly and could do exactly what you are trying to do with gstreamer and generally that is what I think the problem is as don’t hack in complexity by bringing in many multiples of audio platform and try to stay to the same the whole way through even if it means you have to research or develop the solution.

I know if I setup my own apps on a clean host I can use alsa, pulseaudio without problem and share devices and use snapcast no problem as do many applications and desktop examples available.
Snapcast is a brilliant platform that solves the problem of network audio sync and works amaziningly well but the spaghetti of the audio chain in Rhasspy is likely causing confusion and somewhere a problem where format or rate conversion is clashing or problematic or wrongly set.

Pipewire is another attempt to try and set this right but still not really complete and haven’t really played much with it than noticed the latest distro releases have it in conjunction with pulseaudio.

Snapcast creates its own ringbuffer and use its own time signal embedded into the protocol, it takes a single source adds a delay with the ringbuffer and the comparison to network time to the embedded time data causes the client to inch forwards and backwards to a postion in the ringbuffer so each clent is roughly synced on the (100msec ?) tolerence is it that snapcast has if my bad memory is functioning at all.
Snapcast does not sync with a device it purely uses it as a source for its own synchronisation and delivery method as does Airplay.
A common one can be a 44100 & 48000 mismatch as that can sound just crackly than a total ear sewer and before you rule anything out check that format, channels and sampling rate are all consistent and where ever possible use auto conversion plughw: than hw: direct.

Typically the deviation is below 0.2ms

Had to have a look as my memory is bad as thought 100 seeme high.

Pipewire is the future.
ALSA as a low level driver is fine with one source, but once you start to try an mix any source it becomes really complex really fast.
Pulseaudio removes some of the complexity compared to ALSA, but at the cost of high and fluctuationg latencies.
Pipewire try to take the best from Pulseaudio and ALSA. Lower complexity, low stable latency and with added input, output and routing options.

The mess with Linux audio is that there are too many systems and none really the default.
Some only use ALSA, some use Pulseaudio, some even use OSS and then some have started to use Pipewire.
ALSA, Pulseaudio and OSS do not provide timing feedback for max/share/snoop devices and more and more programs need it to sync their channels.
Snapcast do not only take network timings into account, but also the latency in the audio device.
Here is my issue raised on Snapcast:

Rhasspy is also part of the problem, because it adhere to the old ALSA/Pulseaudio option, which makes it impossible to play together with other sources that require extra features.
Rhasspy and Snapcast would be a logical combination to mimic a Sonos, Google or Alexa smart speaker.

I don’t know why you are using dmix on 8 microphones as just summing 8 mics like that is totally pointless and has zero advantage over a single mic.
Just something I keep trying to make known but not your fault, dunno as its not dmix its dmix with your setup and hardware that is not reporting latency and maybe something to do with USB devices and that driver.
Scrap that setup and use a single channel if you don’t want to use the xmos beamformer but 8 mics alone has zero benefit as you also garner the sum of 8x SNR then the spacing between mics creates small endfire/broadside arrays that act as various high passfilters even though relatively small.
Only saving grace as that the SNR on mems can be so good you can get a tiny increase in sensitivity but for the complexity it is minimal.
Invensense did a great application note that I post as a mic bible
Why do you have a respeaker and then bypass the Xmos beamforming anyway? Also why are you sending your mic to a snapcast server? but hey prob do what badaix said and see Discussions · badaix/snapcast · GitHub as not a fan of respeaker but just because that USB device with that setup with dmix does not mean all don’t and even though its not a solution I would prob start scrapping that duplicate 8 mic as audio wise its a bit pointless and would reduce some complexity.

The microphones were not the issue and I had not really set them up at that point. They worked somewhat okay with standard settings.

The issue was with the output and missing timing feedback on a mix/share/snoop device.
I also tried with the onboard sound chip, a HIFIBerry hat and a couple of USB cards.
They all had the same issue with missing timings and when I went to the ALSA developers they confirmed that these devices do not report timings.

Dunno aint got a usb respeaker to test but latency is usually set by the input buffers and maybe you could set those yourself in the asound.conf.

period_size 1024
buffer_size 8192

Is sort of normal and if you google that you will prob find where to insert but think should be in the source slave and just under channels and stuff.

Think its the overall buffer size that sets latency and the period size are the individual chunks it fills to give you that buffer.

I have the 6mic hat (none usb) on its way as been developing a software beamformer that I am going to test but haven’t got a pi out or anything to test, but will have check when it turns up sometime.
Also you could do things backwards if you the alsa-plugins installed and you can share a pulsedevice through alsa like something here.
Again prob have to google for better examples, but pulseaudio will prob set up the period_size & buffer_size if you struggle.

Also checkout

As my memory is vague but pretty sure this has occured and been fixed before
But still confused as you seem to have a dmix (playback) for mics?
Alsa Opensrc Org - Independent ALSA and linux audio support site Dsnoop is for mics (Sources).

That is added latency.
The timing feedback is the actual latency added by the handling in the sound cards software and hardware. This is not something you can change as such. You can switch the driver of course, but then you just get another fixed value that can not be changed with that driver.
Not even my Native Instrument Traktor DJ 2 card will provide the timings in the mix/share/snoop setting.

Doing it backwards does not solve it. Pulseaudio does not provide the timing feedback either, since you do not get around the Alsa at the lowest level.
Normal Pulseaudio is Pulseaudio → Alsa (Low level drivers) → Soundcard.
The suggested guide is just Alsa (High level software) → Pulseaudio → Alsa (Low level drivers) → Soundcard.

I actually read that guide and used it alot.
I could mix Rhasspy and Aplay commands all day long, even from other users and sessions, but it fails when it comes to Snapcast.

I have currently moved my Snapcast to Raspi Z with a Hifiberry soundcard.
It works, but I had hoped to use some echo cancellation by playing the music on the Respeaker and then capture the commands for Rhasspy there, which seems to require it happens at the same soundcard.

If you have a loopback channel you don’t need the same soundcard but yeah AEC (Think its non linear AEC) is very sensitive to clock drift so you want a sound card that shares the same clock for audio in/out.
If you have a loopback (a spare adc channel) then you connect the output of a soundcard to that loopback and gain sync to your mic channels.

I am pretty sure its drivers with those respeaker cards always been a pet hate for me as they used to near almost always fail on every kernel increment.
But also they do some really weird things as why it creates a asound 8 channel output for a device with a stereo jack is just extremely curious to why.
Even in the asound.conf it has this line that is a complete misnomer

# use samplerate to resample as speexdsp resample is broken
defaults.pcm.rate_converter "samplerate"

Which is complete BS as in practically every distro (I don’t know one there might be) where speexdsp resample is not the default resampler as the audio quality difference to samplerate is based on audiophile analysis that no-one can hear a difference but its load is considerably less and everybody else but respeaker uses it and doesn’t consider it broken.
Some of the stuff respeaker do and say is just a bit out of the normal as is the setup of the asound.conf that doesn’t require all that anyway.
Every time I use a respeaker from 2 mic or above I near always junk their asound.conf and do my own without dmix & dsnoop.

But to be honest I don’t think its that dmix doesn’t work with snapcast but with dmix you have set period_size & buffer_size as prob it obscures the cards initial settings.
Linux is not a real time OS you can never run on minimum latency of the card as you can never guarantee cycle times and all audio needs some sort of buffer or otherwise you will get underruns and failures in the audio stream.

Prob easiest way is just to scrap the respeaker asound.conf and rename it asound.respeaker and do your own.
But also I do think if you do add the period and buffer setting it might fix the snapcast prob you have.

But if you don’t want to use that card its no bother to me as generally I don’t like them even if I have just purchased a 6 mic hat for some experimentation.

True, but it becomes a pretty complex setup then.
In my research for such a solution I found that if I tried to use multiple sound cards, then the clock drift would rapidly increase the CPU power required to do the AEC as the time spectrum increased.
One way to limit the time spectrum was to get the timing feedbacks, but then the problem from before come up.
If I do not have the timing feedbacks, then I need to guesstimate the most drift that can occur and always use that as the time spectrum to do scannings on and something told me that a Raspi4 might not even be enough then.

With a loopback there is no extra load as theoretically the physical audio on the spare ADC is the same as what is hitting the mics, so one channel is your reference and in perfect sync.
Its why go to all the hassle of a loopback and a spare adc channel connected to your audio output its a low load way of syncing audio in/out by adding extra hardware.

Hmm, might look into if I decide to change the current setup. :slight_smile: