Can you please take some time here to explain why their post was incorrect, with a brief technical explanation of why SELinux enforcement failed to stop the attack that exploits the particular vulnerability?
I realize you're busy, but it would be much more helpful than a curt statement that simply claims they are wrong.
This shouldn't be downvoted. Docker has been very openly history towards Red Hat in the past. To the point of openly mocking their developers at DockerCon.
Hi, I'm the founder of Docker. It's not my place to say whether comments should be downvoted or not, and I don't want to ignite teenage-drama arguments over who was mean to whom at recess - we get enough of that level of discourse on the US political scene these days.
But I think there is an interesting topic to address here, that we deal with a lot at Docker.
The problem in a nutshell: if you choose to take the moral high ground and only promote your products with positive marketing (and that is our strategy at Docker - you will never see any Docker marketing content criticizing competitors), you are vulnerable to bullying and unfair criticism by competitors who don't mind hitting under the belt. Then the question is: do you allow yourself to respond and set the record straight? Or would that just legitimize the criticism by bringing more attention to it? On the other hand, not responding is also risky because it emboldens the trolls to take more and more liberties with facts and ethics. This dilemma becomes more and more pressing as you become more successful and more incumbents start considering you a possible threat to their business. Some of these incumbents have been defending their turf for decades by perfecting negative messaging. Like one competing executive once told me - "we eat startups like yours for breakfast". This situation can be bad for morale also, because your team sees their work and reputation dragged in the mud, and can interpret their employer's silence as a failure to stand up and defend them.
The most perverse variation of this problem is when trolls start preemptively painting you as bullies. If that narrative sticks, then you're in trouble, because any attempt to set the record straight will be interpreted as hostility. Now you have two problems: defending yourself against the bullies AND defending yourself against unfair accusations of being a bully.
The root cause of the problem, I think, is the diminishing importance of facts and critical reasoning in the tech community. We are all guilty of this: when was the last time you repeated a factoid about "X doesn't scale", "Y isn't secure", "I heard Z is really evil" without fact-checking it yourself? Be honest. Because of this collective failure to do our own thinking and researching, bullies have a huge first-mover advantage.
I see an direct parallel between the problem of corporate bullying in tech and the problem of partisan bullying in politics. And I think in both cases, there is a big unresolved problem: how do you succeed and do the right thing? How do we collectively change the rules of the game to make bullying and negative communication a less attractive strategy?
I tried really hard to make this a constructive post about a topic I care about. If you interpret any of this as hostile or defensive, that is not at all the intention.
When the top comment is a claim that we at Red Hat post incorrect information and that we at Red Hat are expected to delete said supposedly incorrect information without any technical explanation on why said information is incorrect, I do wonder who is the bully in your opinion.
The post is in fact incorrect. The reason Nathan is not sharing more technical details is to protect the security of Red Hat users.
Also, if hypothetically the full details made Red Hat look bad, is it fair to assume you would be calling Nathan hostile for sharing them? In that scenario is there any course of action we could follow that would satisfy you?
The article is about how SELinux helps in mitigating or even blocking paths that would lead to a working exploit.
The article explicitly states the CVE number and the fact that updated packages are available.
The article IMHO doesn't attack nor provoke Docker and its people. Yet the first comment posted here DOES contain direct accusations against Red Hat. I don't think that's helpful nor needed. That's all.
I still think that SELinux and Docker are a good combination and this article helps in understanding why.
The title of the article is "Docker 0-Day Stopped Cold by SELinux." The title strongly implies that SELinux would have prevented the issue in the CVE even without the fixes Docker provides.
Then the text of the article states:
"This CVE reports that if you exec'd into a running container, the processes inside of the container could attack the process that just entered the container. If this process had open file descriptors, the processes inside of the container could ptrace the new process and gain access to those file descriptors and read/write them, even potentially get access to the host network, or execute commands on the host. ... It could do that, if you aren’t using SELinux in enforcing mode."
So, not only does the title make this suggestion, but the text of the article downright says it.
If the claim is wrong, then Docker's security team is right to correct it. However, I think they should do so in a forum other than in the comments of a HN post, be thorough in their explanation, and maintain a professional, polished tone in any communications.
And, of course, Red Hat should correct and/or clarify the post as well.
Work with RedHat and let them issue a correction; HN seems like hardly the forum to be calling this out in such a manner. The fact that you even made the "bully" post leads me to believe that Docker views RedHat as a threat and their post promoting SELinux as an affront to Docker. I'd wager it didn't appear that way to the many individuals that understand both the benefits of systems like SELinux and AppArmour, as well as the difficulty in promoting them. Very interesting at the least; one could read a lot into that.
Honestly, Docker stepped in it here. This appears to be a theme, and with your posts in particular. I personally can't think of another project that causes more drama on HN.
SELinux is great, and it somewhat mitigates this CVE, which is what defence in depth is indeed supposed to do. There is a big difference between "Docker 0-day stopped cold by SELinux" and "Runc CVE mitigated by SELinux" which is what a factual headline would be, as it is not a 0-day, and affects the majority of container runtimes (it was derived from an lxc CVE originally). You should always use defence in depth to make exploits harder, but you should always have the humility to understand that security is hard, and is an ongoing task to make every part of the system more secure, while keeping is usable.
This drama is a bid... odd in tone to me. It feels as if theres a huge confrontation looming, where right now all Docker came out and said was "SE Linux only barely helped us, our software was actually so terrible that even with the best effort of SE Linux, it still was vulnerable".
This makes Docker look worse.
The only way SE Linux can look bad is if you were on the fence about its efficacy, and are only now hearing it can't even stop Docker's problems.
"A number of developers from RedHat were once very involved in the project. However, these developers had a very arrogant attitude towards Docker: They wanted docker changed so that it would follow their design ideas for RHEL and systemd. They often made pull requests with poorly written and undocumented code and then they became very agressive when those pull requests were not accepted, saying "we need this change we will ship a patched version of Docker and cause you problems on RHEL if you don't make this change in master." They were arogant and agressive, when at the same time, they had the choice of working with the Docker developers and writting quality code that could actually be merged. Another thing they often said was along the lines of "systemd is THE ONE AND ONLY INIT SYSTEM so why do you care about writing code that might be portable?" Or "even though the universal method works with systemd now. A redesign has already been planned in systemd, so do it the new systemd way and don't be portable."(This is in response to the fact that Docker writes to cgroups, and systemd would like to be the "sole writter to cgroups" some time in the future.) I think everyone got fed up with those people, and Docker has rightly pushed them out."
I think this is a pretty poor understanding of the vulnerability. Yes, runc was split out from Docker but it now is maintained by many companies, including Red Hat - so to suggest that Docker software was "so terrible" doesn't sit right.
Not to mention that this is a fairly gnarly CVE - a great catch by SUSE and Docker - and claiming the software is terrible because it contained this seems like a real stretch.
I'm not saying Docker is bad, only that having a Docker engineer come out and say 'no our bug wasn't fixed yet' shouldn't be the start of a confrontation.
It's not infrequent to see occasional tension between vendors around the wording and timing of security disclosures. There are frequent examples of this with Google's Project Zero for example.
As far as disagreements on the best way to handle a security disclosure, this one is pretty straightforward and was entirely avoidable. The vulnerability in question is a runC vulnerability, it affects equally all products with a dependency on runC (not just Docker). The vulnerability had already been patched in runC, and an update to Docker had already been released and announced. So it was not a zero day. A few vendors (not just Red Hat) have incorrectly announced to their users that they didn't need to upgrade to the latest version of Docker because their enterprise-grade commercial platform would "stop the vulnerability cold". In the case of Red Hat, the commercial differentiator is what they call "security-enhanced Linux" using SELinux.
These vendors are under a lot of pressure to justify the high cost of their enterprise subscription by demonstrating concrete value. A great way to do that is to describe a scary vulnerability in a well-known product like Docker, and show that buying their product is the best protection against it. That is why the article talks about a "Docker vulnerability" instead of a "runC vulnerability" - Docker is a better-known product so the story will be more impactful that way. And it's also why the vulnerability is qualified as a "zero-day" even though it wasn't: it makes the vulnerability scarier.
Red Hat was privately contacted to inform them of their mistake. They privately acknowledged the mistake. When the article hit the Hacker News front page, they were again privately informed of that as well. In spite of multiple requests, after several days they have still not corrected the article. This puts the security of Red Hat users at risk by continuing to tell them that an upgrade to Docker 1.12.6 is not necessary. This is especially disappointing because of the obvious conflict of interest. It was a perfect opportunity for Red Hat to set the bar high for themselves and remove any doubts that they might put the security of their users before commercial interest.
The saddest part is that RHEL and SELinux have genuine security benefits that could be explained in very compelling ways without these shady marketing tactics.
BTW, can you confirm than SELinux in enforcing mode really prevents exploiting of this runC vulnerability? Therefore, the argue on the post's correctness considers only RadHat's marketing war.
Because if the answer is "No", and there's some other way to bypass SELinux and exploit this bug, it raises more grave accusation of RedHad - false statement about the vulnerability workaround.
Thank you for clarification of your point. It really shows perfect example of the Red Hat marketing.
Can you please give a link to the announce from Red Hat or someone else urging their users that they don't need to upgrade? It would be the last thing closing the question.
The blog post being discussed here is the latest example. NOTE: the blog post has since been updated without acknowledging the inaccuracies in the earlier version.
$ wdiff -n -3 first latest
======================================================================
[-Docker 0-Day Stopped Cold by-] SELinux
======================================================================
SELinux {+Mitigates docker exec Vulnerability+}
======================================================================
Fixed packages [-have been-] {+are being+} prepared and shipped for RHEL
======================================================================
[-Centos.-] {+CentOS.+}
======================================================================
[-Stopping 0-Days with-] SELinux
======================================================================
SELinux {+Reduces Vulnerability+}
======================================================================
[-How about a more visually enticing demo? Check out this animation:-]
======================================================================
we were glad to see that our customers were [-safe-] {+safer+} if running containers with setenforce 1
======================================================================
{+Even with SELinux in enforcement, select information could be leaked, so it is recommended that users patch to fully remediate the issue.+}
{++}
{+This post has been updated to better reflect SELinux’s impact on the Docker exec vulnerability and the changing threat landscape facing Linux containers.+}
======================================================================
I'm not sure that first post's version can be considered as recommendation to not upgrade. It just shows how RedHat people was happy to see that bug was prevented by another subsystem. Me, as a sysadmin, would be happy to to know that I'm not obligated to upgrade urgently everything I have. For most sysadmins it can be considered as a workaround, already engaged.
You as a Docker developer see the post as an attack on your project. But most of sysadmins and kernel developers see it as a nice example of the fruits of invisible long work - when well cared system with accurately configured security restrictions saves from some vulnerabilities.
Anyway, it not means underestimation of the Docker and you great job. Sorry you've got stressed by all this noise.
Don't know if it was marketing material when you published it in that whitepaper, but it definitely became marketing material when the @Docker twitter account tweeted it (https://twitter.com/docker/status/768232653665558528).
I don't think the material you're referring to qualifies as criticizing competitors at all.
- The comparison table is part if an independent study, not authored or commissioned by Docker.
- The table shows the strengths and weaknesses of different container runtimes; weaknesses are highlighted for all of them, including Docker
- The table is used in Docker material to illustrate the point that independent security researchers consider Docker secure. Nowhere do we make the point that other products are insecure. I encourage you to read the whole material and decide for yourself.
- The context for this material was to respond to a massive communication campaign painting Docker as insecure.
Even when written, the study made inaccurate claims about rkt (the SELinux support in rkt is identical to that in Docker, because rkt uses the same code from libcontainer - there is literally no difference there). It certainly wasn't an accurate depiction of rkt's state of security as of August.
(Disclaimer: I implemented some, but definitely not all, of those security features in rkt, and I currently work at CoreOS)
Come on. This comes across as entirely disingenuous.
"This was independent, not authored or commissioned by us." - "the fact that we posted it on our Twitter as "why your containers are safer with Docker", posted a lengthy blog article in no way means we were criticizing competitors..."
"Nowhere do we make the point that other products are insecure." No, just "less secure".
"The context we were in was responding to a massive communication campaign ... " - so you were responding to criticism by what, exactly? Oh, yeah, that independent study that you just decided to post about?
Haters gonna hate, man. It's fine to correct misinformation, but in the long run -- and whether they say so or not -- there's much greater respect earned in taking the high ground and not dragging oneself down into the muck of name-calling, ascribing malicious intent to others, and other ill behaviors.
If I were your counselor, I'd advise you to do nothing other than stick to the facts; make the best product you can; delight your customers; take pride in the great work you do; and apologize openly and honestly when you make avoidable mistakes. You can't make everyone happy, so focus on the people you can, and aim to exceed their expectations.
>Haters gonna hate, man. It's fine to correct misinformation, but in the long run -- and whether they say so or not -- there's much greater respect earned in taking the high ground and not dragging oneself down into the muck of name-calling, ascribing malicious intent to others, and other ill behaviors.
This is a naive perspective that doesn't bear out in the real world. It's important to know that by taking the "high road", you are putting yourself at a competitive disadvantage. Someday, those with fewer scruples may have to pay the piper and their dubiously-maintained prosperity may disintegrate ... but then again, maybe not.
Most often, the truth is that large companies are pretty ruthless, and have consolidated such a huge amount of control that it's extremely difficult to do anything about anything they do or have done. They control the messaging, they have a reputation that supersedes any complaint an individual may make, etc. Those companies do slowly atrophy, but usually it's more because they've lost sight of the founder's vision that originally connected with the masses than that they're engaging in questionable tactics.
If you're taking a position out of principle, that has to suffice for itself, because it probably will cost you in material terms.
That's a funny comment, because you are "taking liberties with facts" exactly as lamented by shykes above.
You start with a grain of truth — something that actually happened in reality. In this case, it was a joke protesting systemd hegemony.
Perhaps you thought that joke was in poor taste. But lets leave that aside for now.
So you start with an actual fact. Then, you exaggerate/falsify it, changing the details pretty wildly, and present this story of something that supposedly happened. In fact, nothing like that happened... but it sort of feels like something that might have happened. It vaguely resembles the actual event that did happen (in that, a Docker employee did wear a badge with an opinionated phrase on it at a conference).
The key thing, though, is that what you describe is completely and utterly different from the thing that actually happened in reality world[1].
You might not even be the person who changed the details to make the story more compelling (and false). Maybe you got this information from a post shared on Facebook, or from an email forwarded by your uncle.
Either way, though, the impact of your comment is to pollute the body of discussion and degrade the collective understanding of this topic. (If this process feels familiar, it's because it is exactly the process that eventually caused the failure of the American democracy... just at a much smaller scale).
Personally I don't have any stake in the Docker/RedHat relationship and I don't care about it. I only looked up what actually happened[1] because the idea of a Docker employee wearing an official badge that says "I reject red hat patches" seemed so unlikely to have occurred that it sent my bullshitometer into the red.
Suggestion: when something smells like bullshit, don't eat it without conducting a bit of research.
That badge seems pretty unprofessional to me. I would discipline my employees for that sort of behavior, especially if it occurred at my own conference.
Meh...I wouldn't. It's definitely an ingroup-humor signaling thing, but I find it hard to believe that someone would read that and seriously get offended unless they're being self-righteous.
He stated the source, and information about his Docker affiliation is readily available. HN guidelines discourage signing comments:
Please don't sign comments; they're already signed with your username. If other users want to learn more about you, they can click on it to see your profile.
There's a huge difference between having a generic signature for every comment you post and disclosing an affiliation that adds validity to the claims made in the comment.
It doesn't say "don't sign all your comments", it simply says "don't sign comments". Also, it should be interpreted in the light of the fact that modern netiquette on other sites like Stack Overflow which have usernames is to never sign your posts.
Here it is to disclose affiliation, which else people would forget to check due to nature of 'battle'.
Also, there is an assumption that the signature contains up to date information and/or does not change over time. The latter situation would else impact historical purpose. The signature has changed and does not refer to the position/information related to the moment of writing.
I agree with how both jwildeboer (Jan) and shykes (Solomon) approached this. Much appreciated in this case.
But yes, in a normal situation, this is irrelevant and the username signature is sufficient.
I don't know that there is a huge difference between those two. What I do know is that in this case there was no difference of any significance.
The comment was signed with his username, and his Docker affiliation was disclosed under said username. That was all that was needed to add validity to the claims in the comment.
All HN comments have that "generic signature". All HN users are free to disclose information about themselves on their profile, and all HN readers are free to click usernames to learn more about the the people who comment on HN.
Give it a rest. This is a semi-anonymous forum where people's identities aren't tied to their usernames. This isn't name dropping, it's providing helpful context.
I have no information here, but it's certainly possible that both sides are not willing to publicly disclose the full extent of the vulnerability. I think that's less wise than usual given what Red Hat is writing and how disputed it is, but that's probably their standard practice.
Some of the comments from Red Hat previously implied that they thought the vulnerability could only be exploited via ptrace, which SELinux denied by default for Docker containers. That's definitely not true; ptrace was used in the PoC because it's easy and likely to win the race condition, but you can also grab file descriptors out of /proc/$pid/fd.
However, the blog post appears to show SELinux stopping attacks that don't involve ptrace, because SELinux forbids writing to an open file or an open network socket that has the wrong context. If Docker believes there are attack vectors that aren't covered by the default SELinux policies (such as writing to something that's not a regular file or network socket), they might be unwilling to disclose that too loudly until Red Hat gets around to saying "Uh, actually please patch".
Your usage is actually correct. Which is great, considering many native English speakers get this one wrong. The heuristic we hear in school is something like "use 'affect' as a verb and 'effect' as a noun," which like many grammar heuristics is of course an oversimplification of reality. Usage of effect as a verb isn't super common in general conversation by native English speakers whereas I think most might choose to say something like "establish authority" instead in this case, but still your intention is still clear.
Because the other comment didn't spell it out: effect is correct there. Effect as a verb means something like "to cause to happen". Don't pretend effect/affect is just a noun/verb split. Both words have meanings as both verbs and nouns. It's best to just learn both meanings of each instead of following some rule that's wrong a fair amount of time.
> the benefits of having SELinux in no way means you shouldn't update
Where is this explicitly mentioned in the post?
I got the opposite impression reading this story titled "Docker 0-Day Stopped Cold by SELinux", with the closing statement "When we heard about this vulnerability we were glad to see that our customers were safe".
I'm sure your customers are glad to hear that as well, but it sounds like the Docker folks have reason to believe SELinux doesn't fully mitigate this vulnerability.
> When a 0day hits, you first assess the impact, define your solution and start to work.
That sounds like a sensible approach and very much related to why I raised concerns about the original title and closing statement earlier. Someone could easily have assessed that there was zero impact (based on the first revision* of your marketing material) if SELinux was enabled and, consequently, find no need to "define a solution and start to work" - why would you update when your OS vendor explicitly says you're safe?
It would have been extremely easy to recommend your customers to install the updated packages, but you didn't do that initially - despite from such a recommendation being quite standard, and despite being notified by the people who found and fixed this vulnerability warning that customers should still update.
Instead you seem to have used this as an marketing opportunity at the expense of your own customers' security. As it turns out, SELinux did in fact not fully mitigate the issue (by Red Hat's own admission in the updated blog post and CVE).
---
* I'm referring to the first revision because the post and CVE have since been updated several times (as pointed out elsewhere in this discussion). A recap of some of the changes that are relevant to this exchange and the phrasing I mentioned earlier:
1. The title "Docker 0-Day Stopped Cold by SELinux" has been renamed to "SELinux Mitigates container Vulnerability" -- accurately reflecting the fact that it was not:
a) a Docker (it was runc)
b) 0-Day (patches were released for runc afaik)
c) Stopped Cold (it was mitigated but still leaking information)
2. The closing statement "When we heard about this vulnerability we were glad to see that our customers were safe" has been changed to the slightly more long-winded, less catchy (but fortunately also less misleading): "When we heard about this vulnerability we were glad to see that our customers were safer if running containers with setenforce 1. Even with SELinux in enforcement, select information could be leaked, so it is recommended that users patch to fully remediate the issue."
3. The sentence you referred to "Fixed packages have been prepared and shipped for RHEL as well as Fedora and Centos." has been changed to "Fixed packages are being prepared and shipped for RHEL as well as Fedora and CentOS.". Honestly haven't looked into whether the packages were actually released at the time this post was published (?), but I'll assume Red Hat didn't change the wording here for no reason - there's quite a difference between updates that "are being prepared" rather than "have been prepared".
Do you have a PoC? Even if you don't, please add the information to the security@opencontainers.org thread. I'm guessing that it would involve overwriting program code before the SELinux policy is set by runC?
SELinux used to be one of those things you'd disable immediately upon installing a new RHEL/CentOS box for all the troubles it would cause. But default policies have evolved a lot, making this the wrong thing to do, for a few years now. But people still do it out of habit.
If you ignore SELinux, it won't cause issues besides the ocasional need to run "restorecon" (which one gets into the habit of doing whenever an "access denied" error happens when permissions seem otherwise correct).
But one problem still remains. SELinux is (very) complex and people (myself included) have a very hard time groking its base concepts. This limits adoption greatly, and I'm still to find a decent document that starts from the simple stuff and lets one build a mental concept of how it works before jumping into the more complicated (real-world) use cases.
I worked at Red Hat when SELinux was made mandatory. Every single customer was hit with 'avc denied'. Having to explain that avc was 'Access Vector Cache' and that 'Access Vector Cache' was part of SELinux, and trying to convince somebody to patch the messages to just say 'selinux denied' was nightmarish.
That said: SELinux is one of the only things that can make shared Docker hosting (ie, where the containers are actually isolated from the host and each other) possible.
You're absolutely right. The problem is that it really doesn't matter if it is amazing an works fine out of the box: people won't use it because SELinux is generally perceived as a technology that is too complicated and will break things if you don't disable it.
At this point, very few people will bother to learn the few bits you need to know to troubleshoot and fix any issues you might encounter; and any benefits you get from using it are not worth the effort.
I used Fedora from 12 to 21 (? I think) and always left SELinux enabled, and it just works. The few things that failed (I remember two issues, one with an experimental build of Chromium and another one with OpenVPN and certificates in $HOME), I submitted a bug report and created my own rule to workaround the issue.
If I managed to run a desktop system with SELinux on, it should be possible (and potentially easier) to use it properly on a server.
> people won't use it because SELinux is generally perceived as a technology that is too complicated
It's amazing how bad things have got with the majority of developers and admins that they refuse to learn things that are difficult and instead simply turn it off. It's not like all of them are too young to remember when nothing in their system was easy and they actually had to learn about what they were doing.
I'm looking at disabling selinux on some of the systems I work on. But I don't think it's for a lack of trying to use and understand it. I don't think it's just a matter of being complicated, to me I find the system cryptic, and information about how to do things correctly very difficult to find information for.
While I agree with your sentiment, I think I have tried to give selinux a fair chance, but I've reached the point where it seems to add more overhead than it benefits us. This is combined with, I'm also likely making mistakes with my configuration, that make it too permissive, because with my limited understanding, I'm putting together modules that make my stuff work, but I don't actually understand the implications of some of the decisions I'm making for permissions.
And my entire team seems to be struggling with selinux, and a little bit fed up with it, because we keep running into it blocking things, simple things, against our intuitions of the way our system and selinux should work.
It might be the perfect access control system, but to me UX is horrible. Or maybe I just hate to admit it, but I've utterly failed at building a mental model for how it works, and how I can effectively interact with it to do what I need to do.
So while you might be right, at least in the case of selinux, I'm not sure I agree that it's simply a matter of developers and admins refusing to learn things that are difficult. In the selinux case, I think it goes deeper, where it creates a cognitive load, that someone needs to invest a significant amount of effort to truly learn it.
I agree that the UX is bad, and I think that's because they were wearing heavily SELinux-tinted glasses when they designed it. As an example, even if you correctly expanded `chcon` to "change context" the name of the command itself tells you nothing about it being SELinux related. In contrast, `aa-enforce` and friends hint that they're all AppArmor commands. (This is more useful when trying to remember the name of the command you typed a few months ago.)
This is of course difficult to get right - one would need to be aware of what everything does detail, but still know what it would be like to use it with no prior knowledge.
Just an agreement. I make a point of using some LSM everywhere I can, but in practice selinux is still hard to debug properly. The level of abstraction in module doesn't help here. System-wide integration means that patching single app profile sometimes involves patching the system profile package, sometimes just one module. And that's just on redhat-like system - anything else gives no guarantees about selinux working at all.
Sure, apparmor has fewer options, but it's trivial to manage a profile along with the package itself. The learning and reporting system is much simpler. And there's no need for debugging the "where did I miss the context setting this time" problem in deployment.
I can deal with both and either one is needed. But selinux simply wastes my time way too often.
> it creates a cognitive load, that someone needs to invest a significant amount of effort to truly learn it.
To benefit from SELinux, I don't think you need to "truly learn" it... but you do need to invest some effort developing troubleshooting skills that are specific to SELinux. But the thrust of your point is right. We need to invest some effort to benefit from it.
I think postings like RedHat's are useful because they show us in a concrete way why it might be worth the small effort to develop the troubleshooting skills, or even worth the large effort to really understand SELinux.
There is a line between something being difficult and something causing any part of the system to randomly stop working with no explanation whatsoever except -if you're lucky- a single log line somewhere that will give no result when searched for on google.
Well - devs and admins spend the minimum effort required to reach their primary goal. And I believe most of the time that it is in line with their employer's policy.
I'm too young to remember how it was in the old times (would be glad to hear a story or 2), but since my very first job, the internal policy (the real one - not the one presented to customer or auditors) has always been "scew firewall, selinux, principle of least privilege. Disable everything so it works right now and grant the dev team root access to the prod environments - they need it for an urgent customer issue!".
Humans are designed to obtain their goals while conserving as much energy as possible. As a rule, people will always take the lowest-effort route to get what they want. At a macro scale, this can only be counteracted by designing systems that strongly discourage specific low-effort-but-harmful routes, and even then, there will always be some sector of the population that doesn't grok the downside of taking the "easier" way.
I think one of the big things that has impacted SELinux adoption is that everyone has sort of seen it as a proactive booster rather than a really necessary part of a secure environment. I'm sure a lot of that is because lots of people are used to administering systems that were around before SELinux (and similar) was. For something that's perceived as a bonus point, it interferes far too often.
Most people don't realize SELinux is on until it does something really bad like stopping a database from restarting correctly or otherwise harming what's supposed to be a stable environment. When those are the stakes, SELinux does not have, or at least does not make immediately clear, a sufficient value proposition to incentivize the admin to fix the rules rather than just turning the whole thing off.
For example, to contrast with the OP, when SELinux stops Docker from doing something, the impulse is not going to be "Yay, SELinux stopped Docker from doing something dangerous! Thank you glorious SELinux!" Instead, it will be "Ugh, SELinux again getting in the way of stuff. I just need to disable that. I get enough headaches from Docker as-is and SELinux probably just isn't modern enough to handle my uber-charged stack with all of it's new-fangled features. Disabling!"
I agree with your post, except for one point: system design shouldn't be focused on strongly discouraging harmful low effort routes, but on ensuring that the good routes are less effort than the harmful ones. That is, while it can be achieved by raising the effort of the harmful routes, it can also (and might be better) achieved by lowering the effort of the desired routes. For example, if journalctl showed the relevant AVC messages when showing the log of a failing unit, it would remind the administrators they may need to adjust it. (Except actually mention SELinux in the message, nobody knows what AVC is.)
To be fair, documentation on SELinux sucks. Git is also complex, but there is a lot of good documentation. When I first tried to develop an SELinux policy it took me a month to learn. Too many things are undocumented, or are documented but not clear where.
> that they refuse to learn things that are difficult and instead simply turn it off.
I wish I could do this with our Logstash cluster. It requires so much hand-holding, and troubleshooting can be quite opaque. Last night the logstash indexing service had 'just stopped', and was sending 57000-character-long json loglines into its own log.
And if they do fix a bug, you can't just upgrade one component, you have to upgrade logstash and elasticsearch and kibana... and maybe whichever beats you're using.
"I used Fedora from 12 to 21 (? I think) and always left SELinux enabled, and it just works."
I've had all sorts of weird errors over the years when I set up new CentOS machines.. one thing or the other. And then I remember to turn off SELinux, and it suddenly works. Over and over again.
It's going to take a lot of convincing to get me to leave SELinux on when it's caused so many problems over the years.
Did you consult the audit.log? Did you pump it through audit2allow? The audit2allow tool will even tell you if the issue can be fixed by setting a boolean on the SELinux config. Most of the stuff I have run into recently can be fixed by setting a boolean.
Yeah, SELinux does require learning it, but it adds a lot. I recently helped a friend of mine fix his PHP CMS because it was hacked. The PHP was high jacked and it started attacking other instances in hosting providers network. If only he had not turned off SELinux, it would have prevented outgoing connections from the http server.
The thing that kills it for me is that Fedora ships with broken default configuration.
Install Fedora, reboot and log in. Chances are within minutes you'll start seeing "selinux denied" messages popping up, complaining about services, files and policies you've never heard of. How is anyone but a seasoned RHEL admin supposed to know what to do with that?
Sort of. I tried the latest workstation release recently and the installer was broken.. you need a bug tracker account to file a bug, so you need your browser to work, which is rough when your system doesn't.
That's the issue with LSMs in general. They all work, but historically have been difficult to configure and impossible to maintain. That's changing.
There used to be only one (SELinux), however, there's competition now from other LSMs. Smack, AppArmor, Tomoyo, etc. In part, that's why SELinux is improving.
I've tried them all and settled on Tomoyo. The documentation is outstanding and it is (to me at least) the easiest LSM to reason about and configure.
> I'm still to find a decent document that starts from the simple stuff and lets one build a mental concept of how it works before jumping into the more complicated (real-world) use cases.
That sounds pretty cool, but where do rules come from?
Suppose some author wrote some daemon. Is the packager responsible for writing the rules? It sounds like having packagers understand SELinux rules is a lot of responsibility, and if upstream is cross-platform, they might not care about such specific needs so as to provide it.
Also, what happens if I write some small app? Do I need to write its rules? If it has no rules applied to it, then it's basically game over, because SELinux sounds like it works ONLY if it applies to all processes.
A packager is generally responsible for selinux policies if they aren't suitable for inclusion in the core policy, as a developer if you want to write them please do but certain aspects rely on things like where binaries and application data are stored so you can't always write a policy that won't need tweaking on a specific distribution.
SELinux policies take some time to write the first time or two, but typically running in permissive mode and running your app with a permissionless context will give you everything you need to include in one.
Admins have the hard job when they move default data directories around, takes time to get used to running 'semanage fcontext' in addition to setting file system permissions.
Read that a while ago and it didn't make it click for me.
If I had just stumbled across it without a recommendation and without recognising the bame of the author I'd easly dismissed it as some kind of trolling but thats maybe just me.
What I want is:
1. How to do tasks x, y and z (with explanations on why).
2. Complete documentation of all commands, settings files etc.
Usually 2. is somewhat covered in the official docs and 1. is available in blog posts etc. Last I checked 1 was not covered in any way when it comes to SELinux.
Making a coloring book out of it is nowhere on my list.
To late to edit but let me be perfectly clear that I am totally fine with people making coloring books etc, i just wish there was more how-to, but that might just be me.
It's very hard to come back from such a popular image of "100% broken and must be disabled immediately". Even if SELinux evolves to absolute perfection, the damage to its image is done, and that will take a long time to change, regrettably.
It should not have been shipped so green.
> If you ignore SELinux, it won't cause issues besides the ocasional need to run "restorecon" (which one gets into the habit of doing whenever an "access denied" error happens when permissions seem otherwise correct).
Sound like it's still pretty broken, IMHO. I should never see an "access denied" error on a host I control, unless I misconfigured it.
The truth is, the defaults MUST work on all common scenarios all of the time for these things to be successful. Otherwise, people will only see the downsides (and the upsides are rarely visible, and rarely outweigh the downsides).
> I should never see an "access denied" error on a host I control, unless I misconfigured it.
What do you mean? Even stock UNIX will give a permission denied error if you try to run an executable without `chmod +x`-ing it or `rm -rf /boot` as a regular user.
> SELinux is (very) complex and people (myself included) have a very hard time groking its base concepts.
I wonder what a clean slate OS design would look like. One that satisfied the same requirements without any concerns about backward compatibility with POSIX history.
The MLS model was too difficult to adapt to commercial use. Biba was good for stopping malware from overwriting files. They still preferred something more flexible. SCC then invented type enforcement in another high-assurance system:
Flask architecture was combining that tech with a microkernel. SCC, acquired by McAfee, added type enforcement to a BSD OS for their Sidewinder firewall. The next work by Mitre was proof-of-concept for OSS by adding it to Linux. That and a pile of incremental additions is called SELinux. I'm sure you'll find the LOCK design a lot cleaner as it was originally intended. ;)
Also worth noting are the KeyKOS system (esp with KeySAFE), the capability-security machines, and one language-based mechanism:
How about Magenta [0] by Google, on top of Little Kernel (LK), a neat and modern microkernel design? (It's part of Google's work-in-progress complete OS named Fuchsia, which appeared briefly a while ago on tech news sites.)
Microsoft's Midori project. It never saw the light of day, but there are some very interesting blog posts about it. The Redox project has some leanings in this direction.
Robigalia is interesting but not nearly ready: https://gitlab.com/robigalia (their website has a cert error right now. It seems like I've seen a lot of those these days)
It's a rust userland built upon SEL4. SEL4 is very simplified in order to meet their verification goals so robigalia has to implement some interesting resource sharing primitives on top of it to get things to work. It could be interesting.
> If you ignore SELinux, it won't cause issues besides the ocasional need to run "restorecon" (which one gets into the habit of doing whenever an "access denied" error happens when permissions seem otherwise correct).
You also have to put things in the correct place. For instance, your VPN certificates should be in $HOME/.cert so restorecon knows they should have the home_cert_t label.
When I asked them about this and why they do not even tell the customer about this, they said it is so that they can reset the password when requested through the control panel.
You need to be in the context that will allow the change of password. For example, if this is an agent, you need to be able to execute code in the agent. If this is from a DHCP hook in early boot, you need to execute in that hook.
That's totally not equivalent to the use of "setenforce 0".
Yes, you are right. However, when I calculating risks and protection/price ratio, I see no difference. System with `setenforce 0` is equal to system with the rule for first two significant digit, because root password is most significant and risk of attack through webui is same and high too. Other risks are less significant than that and SELinux does not reduce them significantly.
It sounds like you're saying the security position can be simplified to one dimension. It can't. Unless you're able to provide risk and exposure for all users and somehow make them the same for everyone (they're not).
Then you write that SELinux doesn't reduce the risk. It definitely does that for webapps executing wrong commands, utility services being exploited for local access, many attempts at race conditions via shared directories, etc. For example almost all interesting use cases for imagetragick exploits are severely limited by properly configured selinux. Once in a while there's going to be an issue with a trivial exploit which everybody and their dog will use to scan the whole internet before you have time to patch. This is what LSM is great to protect against.
I'm not saying that risk is not reduced. I'm saying SELinux is not effective, so it's better to spent my time/money on something that can reduce risk significantly.
I saw no 0-day exploits to date which are stopped by SELinux. For example, a trojan can use apache process as malware host, without reading/writing to disk at all. SELinux will not stop that even in theory.
It will not. But looking at what exploits normally do - that would be uncommon and targeted. A lot of common stuff will only drop a stage 2 downloader and try to execute it. Doesn't work - move to the next target.
For most of automated exploitation, selinux is perfectly capable of intervening.
Since when is a security issue which is known to and patched by the vendor a "0-day"? Have there been any reported exploitations of this vulnerability in the wild before it was publicly disclosed with a patch made immediately available?
No, there were not, and it was disclosed to related vendors some weeks earlier. Some people seem to thing "0 day" is a l33t term for "vulnerability" it seems.
I think all they are suggesting is that running SELinux can protect you from vulnerabilities that no one knows about yet. Obviously they can't give an example of an vulnerability that is still a 0 day.
SELinux is one of our main protection against user abuse. Our project (webminal.org) provides free terminal access anyone. Thus we need protection. Behind the scenes we rely on things like SELinux/Quota/Pam.d/limits.conf/rootkits etc But SELinux has a learning curve than others, but its worth a ton.
I'm not sure if the situation has changed, but I recall installing Fedora (and another distro I cannot recall) years ago, and SELinux would kill sshd when a connection was received, on a clean, out-of-the-box installation, making the host inaccessible.
These sort of super-critical bug make software go immediately into my blacklist, and it's very hard to come back from that - it basically meant that it had to be disabled immediately, because it's defaults were completely broken.
I installed Fedora Server 25 a week or so ago on a small server, sshd was open (with an extremely strong random password) and fully functional at install time. SELinux has caused no issues, I actually chose to install docker in the installer at install-time and it came pre-configured to play nice with SELinux.
Firewalld on the other hand, I'm still figuring out (firewall-cmd is useful, but trying to translate iptables rules -> firewalld is proving harder than I expected)
One of the reasons I like OpenBSD. Linux has a habit of taking well established things and replacing them with incompatible things (firewalld, systemd for example).
I have enough to do, I don't need to throw away years of acquired experience every few releases. It's one thing if the replacements are clearly better, but for me they just seem to be new, different ways to do the same things.
I'd argue that (at least in recent years) OpenBSD does a lot of replacing as well, but I happen to like their direction, where the replacements are simpler rather than more complex.
Every time I mess with Fedora I have to screw around with selinux trying to figure out how to make ~/.ssh/authorized_keys work. Every time I need to google the stupid magic incantation that makes things work (that normally should "just work") because I can just never remember it.
I have always run SELinux on Fedora/CentOS/RHEL and I don't remember a time where I had issues with authorized_keys. The only thing recently I recall about ssh is that is complains if the files in .ssh are not mode 600.
I'm not talking about RHEL4. I'm talking recent Fedoras (within the last year or 2). Making a brand new ~/.ssh/authorized_keys file has never worked for me without running restorecon (which is the thing I can never remember).
Selinux is badly designed and it's not suprisingly people don't use it. Experts are supposed to simplify complexity. You can't design convulated and complex applications that are user hostile and break things without warning and then accuse people of laziness. Few who deploy apps take security casually and selinux is not the only way to gain security.
In a typical scenario when deploying software things can already get hairy and with selinux in the way you could end up going down multiple rabbit holes and squander hours only to discover selinux is somehow in the way disabling some functionality but not logging clearly what exactly it is disabling with proper messages. That's why most advice online is to disable it.
Given its connected to the NSA and Redhat tried its best to get it into the kernel at one time is all the more reason for anyone concerned about NSA to avoid it. Security experts like the author of Grsec also doesn't think too highly of it.
Spender/GRSec mainly doesn't like it because he has a competing RBAC which people pay for. It might be better, it might not.. but I don't have a GRSEC license.
I just want to update this to clarify that it blocks ptrace, but this is only part of the issue and you shouldn't rely on AppArmor to mitigate this CVE entirely.
Here's a write-up showing how AppArmor can protect Docker containers and the underlying host... quote from the article, "So without even patching the container we have prevented rouge pid from spawning using a correct security profile with AppArmor."
AppArmor and SELinux share the same set of LSM hooks underneath, so there's comparatively little difference in capability. The big fight is over the philosophy of how they are configured (very broadly: AppArmor tends to tolerate ad hoc configuration by doing what you want instead of what you say, and SELinux like to break everything it doesn't understand perfectly).
I really wish SELinux would evolve some better configuration tools/culture, but after a decade and a half I despair of this ever happening. Every single Fedora release I leave it enabled on my personal machine thinking "THIS will be the moment where I really puzzle it out". Then I disable it a week later when I realize how much annoying work configuring rules for my rando backup and device management scripting will be, and when I see that it still lacks rules for a bunch of in-distro tools I need to use.
Last I looked, Apparmor still applied rules based on file names, not on types assigned to the underlying objects. That's a big limit on writing correct policies: every hardlink is like a firewall-spanning device.
I see it mentioned so often, but in practice... is that really an issue for you? You need to have root privs with unrestricted access (or at least hardlink creation in the directory explicitly allowed by apparmor) for that. That means the attack would have to look something like:
1. gain access to the system (unrelated exploit) 2. elevate to root with profile which can create hardlinks at all 3. have access to a directory unrestricted in the profile 4. have local, unrestricted application which can be exploited by hardlink manipulation
This is theoretically possible on some systems. But it's a massive effort and fairly easily mitigated by having a profile for all running services which disallows hardlinks in the first place. As far as risk of service exploitation goes, this should be a fairly minimal one. (and requires targeted approach)
AppArmor is a DAC+MAC and SELinux is an RBAC+MAC. AppArmor is not nearly as powerful. Grsec is much more effective than AppArmor at preventing a wide range of attacks, but SELinux is more or less king.
My understanding is: yes, it is blocked by AppArmor.
Based on seeing email chatter about this report in the context of Cloud Foundry[0]. Under the hood CF uses runC, partly to allow AppArmor to be applied.
Disclosure: I work for Pivotal, the majority contributor of engineering to Cloud Foundry.
I would recommend not giving that advice, and recommend that all your customers upgrade. There are a number of ways to exploit this, and as stated elsewhere on this thread Red Hat's advice that SeLinux entirely mitigates this is not correct, it is highly likely that it can be exploited on Cloud Foundry too. I would never advise people to not upgrade software when there is a CVE because someone says there is a mitigation, it is very high risk.
Disclosure: I worked on testing the fixes for this CVE.
It was our understanding from the original report that the vulnerability was mitigated by AppArmor disabling ptrace, by no user process running as pid 1 inside the container, and because in CF buildpack apps, user processes run as unprivileged users. This is the stance communicated in the CVE report.
However, with some further consideration and updated information yesterday, we decided it would be prudent to patch and release immediately to be on the safe side. This was communicated to the Cloud Foundry Security team.
User processes running as unprivileged users may be sufficient to mitigate. AppArmor did not always during testing. But upgrade is highly recommended as there were several ways to exploit and different races, including one related kernel bug that was fixed in very recent releases.
> It was our understanding from the original report that the vulnerability was mitigated by AppArmor disabling ptrace, by no user process running as pid 1 inside the container, and because in CF buildpack apps, user processes run as unprivileged users. This is the stance communicated in the CVE report.
I'd have to think about this further, but I'm not convinced that would be sufficient protection (accessing /proc/$pid/fd has a different set of access requirements to ptrace -- it's a dumpability check basically). However, since you've already sent patches around it's all good.
Disclosure: I discovered, wrote patches for and helped with coordination of this vuln.
Agree. This was the initial stance, when the focus (our misunderstanding) appeared to be on ptrace as the only vulnerability rather than ptrace as the means to easily exploit the vulnerability. Once we had a better understanding, we were also not convinced that this provided total protection.
I expect Red Hat to issue a retraction shortly. We notified them last night that this post was incorrect.
Source: Security at Docker.