RTX Voice with Rhasspy?

solid · April 6, 2021, 7:50am

There is a python script on github that takes a wav file and runs the RTX voice filtering on the wav, and then outputs a clean noise-free .wav file.

That got me thinking. Maybe this method could be used to filter out noise from a microphone using MQTT to send the mic audio, or maybe some other way? I would love if someone could try to make this work with Rhasspy.

Lets say I got a PC with a GTX or RTX graphics card. Maybe someone could make a script thats sends the mic audio from a Raspberry pi to the RTX/GTX PC, and then let the PC filter out noise and send the clean mic audio to a Rhasspy server? That would be really nice. This way we can play music and get clean filtered noise-free voice from the microphone.

rolyan_trauts · April 6, 2021, 11:05am

Yeah RTX voice is amazing but actually only works really well on there RTX cards that have inference cores that is also part of there DLSS rendering.

Facebook also have a https://github.com/facebookresearch/denoiser which also needs considerable acceleration and generally your looking at a GTX1080 or above and with the current market for many the $ is just bat shit crazy.
1050ti/1060 don’t run it well supposedly 1080 does but all reviews I have seen use a RTX card.

A centralised GPU/accelerated based shared system is likely the way to go as most research now is looking at RTX style single mic processing for med to high end phones.

Chow-ai · June 10, 2022, 6:49pm

Did you get anywhere with this @solid ?
I am looking to get RTX voice running on my rhasspy base acting as a preprocessor to STT, but Im not sure where to intercept the audio prior to it entering the STT, other than perhaps using external http as the STT, writing a script to preprocess then pass it into deepspeech.

@rolyan_trauts Do you have a better method?

rolyan_trauts · June 12, 2022, 1:10am

You will have to work out how to integrate as never been a fan of the rhasspy audio pipeline.

Rtx voice is great but also the broadcast SDK that contains it is only windows I think as is RTX voice itself.
With RTX voice itself it acts as an intermediary but on Linux you would use a Alsa snd-aloop that is a double ended virtual soundcard that what you play on as a sink on the other side becomes a source.
Think there are Windows similar called virtual cables and such like.
RTX is setup to use a mic so you create a virtual mic like Alsa snd-aloop and play your input into the output from that you send to deepspeech or whatever.

Sanebow did an amazing job with https://github.com/SaneBow/PiDTLN and the ns.py sometimes gets overlooked as it does an absolutely amazing job in very low low likely the best pi based NS and much lighter than the aec version.

So not being so very windows conversant here is a breakdown of alsa asnd_loop Playing with ALSA loopback devices | Playing with Systems and you need a windows alternative which I doubt will not be available its just that I am not one to ask about windows.