Thoughts about security

farfade · July 20, 2020, 10:06pm

It’s late and my writing is messy… I hope I will start something and we’ll progress together to make this security study better day after day

First of all, even if I have some ideas to improve security, I am very satisfied of my first inspection. The architecture is very well designed and the basis are there to make something really serious from the security point of view !

General points :

architecture :
– make an architecture diagram showing the protocols and security involved in exchanges between modules
— authentication (does the receiver check that the caller is who/what it pretends to be ?)
— authorization (does the receiver check that the caller can ask to do that ?)
— in-transit cyphering (TLS)
– keep it simple and identify deviations from the norm defined in the architecture diagram (for instance : if you decide that the end-user interface must only make calls to the websockets or API endpoints, do not introduce RPC or something else)
code :
– validate input ! if I expect an integer as a parameter, it must check that the variable contains an integer - use input control libraries. And there are lots of inputs available thought the web client (sentences.ini, profile.json,…). A lot of attacks come from malicious code injection towards uncontrolled input.
– consider using github dependabot for dependencies vulnerability monitoring : https://docs.github.com/en/github/managing-security-vulnerabilities/about-alerts-for-vulnerable-dependencies
– consider requesting beta access to github security scanning https://github.com/features/security/advanced-security/signup (but it consumes github actions - do you have to pay for it even with this software of public interest ?)
– read and think about OWASP top-ten web app security risks : https://owasp.org/www-project-top-ten/
deployment : inform your end-users : add a security chapter in the documentation
– identify the default behaviour setup by rhasspy installer, and the options supported (ex : supervisord – unix signal --> rhasspy ; rhasspy – DEFAULT : cleartext / option : TLS --> MQTT server; user agent (browser) – DEFAULT : HTTP / OPTION : HTTPS --> rhasspy webserver)
– warn about risks when using default setup and provide detailed doc and / or secured profile configuration option during installation
– document mitigation for unsecured points (ex : while there is no authentication on webserver / websocket, end-user must filter the access to the port with a firewall)
– tell that everyone can report directly to you when finding a flaw in the released version (instead of publicly tell it on the forum)
structure your releases with stable / beta / security channels and make rules clear
– never release when you know you’re introducing a serious flaw (keep it for the beta channel)
– in security channel : only security flaws are fixed (no features additions in this channel, just security fixes)

Specific points I noticed - ordered by gravity - with my limited understanding of rhasspy internal architecture :

web server (used both for user interface and websockets)
– detail TLS setup in doc - provide optional default setup during installation enabling TLS
– add support for authentication (for end-users and probably for system user (rhasspy internal / inter-modules communication (with websockets ?))
mqtt dependency
– detail TLS setup in doc - provide optional default setup during installation enabling TLS
– merge all mqtt code in one unique library
– setup authentication (user / password)
– provide a default config file for authorization needed on MQTT side (topics needed / read write) and doc to setup it during installation
– good point already there : never hardcode unsecured TLS (verification disabled…)
never run rhasspy as root
– well designed : can run as a normal user
– warn in documentation !
– provide a default debian service configuration (for systemd) with unprivilegiated user
files permissions
– installation = check config files are not world-writable / check files containing secrets (passwords) are not world-readable
– identify and isolate files that must be modifiable by the user interface in a specific directory. Isolate the files that contain security configuration (for instance passwords) and write-protect those files from the OS user running rhasspy
“skins” : I don’t know what it is. But if it is about plugins, it has to be carefully thought because it basically opens the door to untrusted code execution.
supervisord
– explicitely forbid webserver startup in configuration generator (include in the generator a systematic check that the file does not contain an [unix_http_server] chapter)

Hope it helps a little and sorry if it is messy good night !

Cheers !

Daenara · July 20, 2020, 11:15pm

I read through this and while you make quite a few valid points, in my opinion quite a few of them would actually make rhasspy very hard to use for what it is intended for.

First of all, I do not expect anyone to inject any malicious code into my rhasspy assistant since rhasspy itself is only run in my home network, if that is compromised, then I have other problems than the security of my home network. Sanitizing inputs and so on is still a good thing, but more for the prevention of unintended inputs actually breaking the code, not as a security concern.

The biggest issues I see however are the file permission issues. Rhasspy is intended to be modified and played around with. The HermesLedControl even goes so far as to read the mqtt settings from the profile.json of rhasspy. While that is not the best behavior security wise, it is pretty good from a user perspective because it is basically a rhasspy addon and I only have to specify my mqtt server in one place. Rhasspy is not a out-of-the-box solution that you install and it does everything, it is a project for ppl to set up their system up around and also a project to play with.

As long as rhasspy is run as what is is intended, namely an offline voice assistant that runs in a somewhat isolated network (no access from the internet) then security shouldn’t be the only and main aspect, usability and reasonable easy access to extend the system should also come into play. Untrusted code execution is not something that I can see happening in the normal rhasspy use case because if it is in my home network then I better have installed it myself and if I install something I can’t trust then all the security in the world can’t help keep me save because then I am the security issue.

That said, I am not trying to tell that security shouldn’t be thought about, but from reading through this I got the impression of completely shut off software that makes tinkering around with it way to hard or even impossible because of security protocols.

farfade · July 21, 2020, 6:13am

Thank you Daenara

The art of security is to make it both bullet-proof and invisible to the end-user… you trust your bank, but you’re not an expert of why it is not hacked every morning. I have not well explained it in my first post, but I imagine two kinds of out-of-the-box security sub-profiles : one for the standard user that wants something running without questions and relying on his network perimeter protection (and in this case some security can anyway be done in rhasspy but only focused on external communication, like when rhasspy downloads profiles post-installation of communicates with cloud services (hass.io ?)), one another for the advanced user that is preoccupied by security and think that lan perimeter protection won’t be sufficient for him.

I really think that the incredible work you all made with rhasspy would help more than isolated individuals on their small home network. Think about helping and monitoring disabled people in charities with the satellites pattern… I’m sure there are lots of use-cases in larger organizations that are good for the humanity. Snips has been sold for millions of dollars - rhasspy worths millions of use-cases, dlrectly or embedded in other global solutions that will use it without we even know about it - it is the power of free software.

Anyway, you’re right, I should have begun by telling that setting the risk appetite is the first thing to do - and to clearly tell it. If rhasspy is intended to be protected by an external additional thing (the lan configuration around it), it has to be clearly specified in the documentation.

Finally, I’ve just written that because @synesthesiam asked for it. I don’t tell all that I told must be done. I just share my thoughts and conviction that security and usability can go together for the best

Cheers

koan · July 21, 2020, 6:41am

Thanks for giving this some thought. A couple of the issues you raise have crossed by mind already, so there’s definitely some stuff we should try.

However, your list is quite broad and I don’t think we have currently the manpower to even look deeper into all of your points. I think it’s better if you open some issues on GitHub for specific improvements that you suggest, or even better open pull requests if it’s appropriate and you have a suggested solution

farfade · July 21, 2020, 3:52pm

I agree with you koan. I am really sorry because I have problems with my personal manpower too and I have to make choices

I only answered to synesthesiam who was interested in thoughts. The only thing I could reasonably propose to do is to help reviewing issues prioritization and pull requests if you want.

To any reader : feel free to take some of the ideas and to develop it opening issues and pull requests !

koan · July 21, 2020, 5:20pm

Yes, thanks for your extensive list, if anyone is interested in working on Rhasspy’s security, this is an excellent list to start from

I was already thinking about trying dependabot, and seeing it in your list made me try it. I’m evaluating it now in rhasspy-hermes-app to see if we can use it to follow up dependency updates more closely.

synesthesiam · July 21, 2020, 6:43pm

Thanks to everyone for their inputs! I see the tension between different viewpoints almost exactly as @farfade and @Daenara have articulated: Rhasspy is aimed at power users, but certainly has uses for people who may not fully understand the choices available.

For now, I think it’s best to focus on having good defaults for Rhasspy that don’t unnecessarily expose a machine running it. But I don’t want to lock everything down up front, making the typical Rhasspy user have to jump through hoops just to do basic things like hook up a websocket.

An HTTP authentication API like Home Assistant would be nice (long-lived access tokens), but I’d rather make that a “best practice” instead of an absolute requirement. I can’t be the only person annoyed at having to constantly paste giant strings into all my curl commands even on localhost.

One of the key security assumptions in Rhasspy is that it should only ever write to your profile directory or the system’s temporary directory. The default Docker run command helps enforce this by only mapping ~/.config/rhasspy/profiles as writable.

In a perfect world, the Rhasspy web server and all (supported) services would be restricted this way by default. In practice, there are ways to break this assumption; probably more than I know about. If we make sure that the web server is careful and that (by default) Rhasspy services do everything through the MQTT API, then security should come down to MQTT TLS.

I don’t think the “internal” MQTT broker needs have TLS enabled by default, but a simple option to enable it would be nice. The default docker run command doesn’t expose the internal MQTT port (12183), so the web API is the only way of communicating with a Rhasspy Docker container by default unless you explicitly connect to an external broker.

More thoughts on security are welcome

farfade · July 21, 2020, 7:24pm

Mostly agree with you synesthesiam.

I just want to add a couple of things to your synthesis
:

the deb package does not come with docker behaviour (and this is a good thing - simple is beautiful), so we shouldn’t assume that docker is a protection. In fact, I would think at the opposite : if the way the deb package is delivered offers a security option, it will also be secured when wrapped in docker without any further question.
do always think about TLS AND authentication. TLS let the client know the server is really the server and protects data in-transit. But without authentication, any client having access to the endpoint will be able to send whatever it wants

synesthesiam · July 21, 2020, 8:16pm

That’s true

I know @koan has talked about MQTT topic restriction as a kind of authentication. Apps might declare up front which intents they plan to handle, and the broker then only gives them access to those topics.

Is it possible with MQTT TLS to only allow clients who present a valid certificate? That would make “access to the endpoint” effectively be an authentication mechanism then, no?

farfade · July 21, 2020, 8:37pm

Yes it is ! But keep in mind that it requires client certificates management that is somehow more complicated than password authentication (certificates must be signed by a certification authority, must be renewed before expiry date). A long random password generated at installation and shared between MQTT and rhasspy would be a more realistic approach when remembering Daenara prayer for usability.

Let me introduce you to the difference between authentication and authorization.
MQTT topic restriction is a form of authorization. You tell for instance that the user “rhasspy-satellite-1” can do nothing except write on “hermes/rawWav” and read on “hermes/intent”

But before authorization, you need authentication. This is answering to the question : is the user “rhasspy-satellite-1” really the one it pretends to be ? and here comes the standard way to do that : generate at the installation time a long random password (as known as “a secret”) and share it between MQTT and rhasspy (in their configuration files reasonably protected from read access).

And in the end, comes the question of protection of the password that is exchanged between the “rhasspy-satellite-1” and the MQTT server. TLS comes into play at this moment to make sure that the password is not sent in clear-text through the network (in that case, anyone who is “on the way” can just read the password and use it), and to give the client the guarantee that it sends the password to the MQTT server and not to a hacker that mimics the MQTT server.

This is exactly the same story when we talk about security between the web agent (your firefox browser, where you type “synesthesiam” and your password before getting to rhasspy) and the web server listening on port 11201; or when “rhasspy-satellite-1” wants to use websockets / web API instead of MQTT.

And after all of that, comes into play session management (the long-lived access tokens you mentioned about HA, that is just basically another form of secret, but renewed more often than a password), in order to avoid sending your password at every call. This is not a problem to send the same password at each request when we talk about service-to-service exchanges (for instance from the rhasspy software to the MQTT software), but we cannot ask an end-user to type his password at every page change in his browser

koan · July 22, 2020, 6:15am

@farfade if you’re interested, I have proposed a proof of concept of limiting what Rhasspy apps can do with Docker, Mosquitto ACLs and MQTT password authentication a while ago:

maxbachmann · July 22, 2020, 6:28am

Github actions for public projects are free so this should not be an issue

farfade · August 3, 2020, 8:42pm

Another reminder for the TODO list; that I didn’t see at first : do not internally share password on command lines (like for instance launching another rhasspy service from the master one giving the MQTT password in parameter.)
Any user on the system can do a “ps -elf” and see the processes of the other users… and the passwords on the command line.

rolyan_trauts · August 3, 2020, 9:15pm

Its still confusing though as what is meant by user on the system? A voiceAi has a natural abstraction and you can say “ps -elf” all day.

It probably didn’t need any internal security and could of been wrapped by standard linux methods such as Stunnel and a vast amount of others and all the rest of internal complexity you can say “ps -off” to, but hey.

farfade · August 4, 2020, 4:39pm

Hello @rolyan_trauts

A “by-design” secured system is better than a system “the user can secure if he wants, with his knowledge, and external tools.”

I do not agree when you say “It probably didn’t need any internal security”. I’m an user of Rhasspy, and I need internal security.
For instance, because for serving my need, Rhasspy is hosted on a mutualized server with some openings onto the internet. Other users of that server do not need to know the rhasspy password for my mutualized MQTT server.

I think we can’t really say that nobody will never do that, especially when adressing some “more serious” use-cases Rhasspy is able to adress than “a little pi-zero next to my TV”.

You are right, a Rhasspy user concerned with security could audit Rhasspy and protect it by himself with external tools.

For instance, for this password on the command line problem, I mitigated the risk at the OS level by mounting /proc with hidepid=2. But it would be a non-sense to explain to all the Rhasspy users to do that if they want and can !

Another, more mature approach on this topic, is to design Rhasspy for getting the password from the configuration file instead of getting it on the command line. Transparent for the end-user, and (a little more) secured by design.

By the way, I’m not saying Rhasspy immediately has to do it. I just write a bunch of topics that can be investigated if someone wants it to improve Rhasspy security. It deserves it

Cheers, (and thank you again and again for the amazing work !)