Two basic principles will take you far with regards to AWS security:
1. Avoid long lived credentials (access key/secret key pairs) and IAM users whenever possible, and instead have everything assume roles for access to resources in the same as well as other accounts. Temporary credentials from STS are the way to go.
2. Turn on IMDSv2 by default for EC2 instances. v1 is susceptible to SSRF attacks so even if you’re attaching an instance profile to an instance and doing everything else right, you’ve got a problem if somebody finds a weakness in the app running on the instance.
Regarding 1, my own cli runs aws-vault set-up such that I need to enter a totp code to assume a privileged role.
It sounds like really basic security - my saved credential can't go delete servers. Absolutely everyone, from aws certified people to apparently renowned developers have gotten stuck into me over this, with complaints from "I'd quit anywhere that did that" to "bad setup, proper iam never requires mfa". I'm sure there's going to be some form of blow up where long term saved credentials eventually come out as problematic, and it will blow people's minds.
Do you get such pushback over the need for MFA, or more generally for the use of roles, or something else?
MFA seems like an essential defense-in-depth measure to ensure that a compromise of the locally held IAM key is not enough on its own to compromise your AWS account.
It's just the workflow. Mfa is fine with passwords and console, but everyone has been educated to understand once they have a iam key saved on a disk its no longer correct to have.
Maybe that's because AWS' own aws-cli setup encourages you to store these credentials on disk in plaintext in a standard-named file in your home directory, and their best story for temporary roles is to invoke `aws sts get-session-token` and copy paste values from the JSON output to env vars.
It's really disappointed that aws-cli doesn't easily support this type of workflow, when using MFA and setting up multiple AWS accounts with cross-account roles are two things recommended as security best practices by AWS themselves.
Don't get me started on how you can only have a single U2F key attached to your root user.
I agree with that being a major part of the problem.
Regarding root, I always create an account with a console login that can remove the root user mfa or reset a password, it becomes the recovery account and I can put its own key on it, and ideally never gets used once tested.
MFA gets in the way of automation, and large scale adhoc actions.
Say I want to load a huge amount of data, or do a ton of s3 uploads. My role assumption via MFA / TOTP lasts 4 hours.
Now say you are doing lots of different "big data" flows, how do you assume privileges when you need them?
I understand what MFA / TOTP brings to security, but security people pretend like my job is about logging into a box once-in-a-while for a couple minutes, and that is the ONLY USE CASE.
Least privilege is another thing that sounds great to a security guy but is a nightmare for development. So whenever you do anything off the beaten path (diagnose new systems deployed, try new database backends, do some data moves or loads, backup/restore), uhoh, you spend a ton of time diagnosing shortfalls and decoding error messages. The security guy DOES NOT CARE.
S3 is a perfect example. S3 should be simple. It is a nightmare of permissions on the user side, permissions on the bucket, and even worse across accounts. Tiresome. Magic numbers/dates for the "version" field policy, what is with THAT? ACLs/Policies stomping over each other.
The funny thing is, ask a security person to first enable cloudwatch and dig through the data to find issues and vulnerabilities, or ask security people to actually produce well-tuned IAM roles for what you need. Guess what? They don't want to do it. They want to sit around making checklists and sending memos.
Isn't IMDSv2 still insecure against attackers who can initiate arbitrary http requests? I really don't get why all big cloud providers are so keen to preserve that vulnerability.
IMDSv2 lead here! Overall the IMDS is a pretty massive security win ... it means that software and scripts can use short-lived ephemeral credentials that are tied to an instance role very easily. That rules out the kinds of credential leakage issues mentioned in this article, but also filesystem traversal issues and so on that have been common in software historically. That's the big why.
It's a HTTP interface so that the most software can benefit from that, whether it's a shell script using curl or something written in Java, Scala, Python, Node.js, Rust, whatever ... pretty much everything can make HTTP requests. Of course over time that HTTP interface became a target itself and when applications have SSRF issues, badness could ensue.
IMDSv2 limits the reachability in four ways to protect against SSRF. 1. The IMDS can be entirely turned off if you prefer. 2. Access requires a PUT request, our analysis showed that SSRFs that grant the ability to PUT are very rare. Most SSRF issues only grant GET because they are a URL or header escaping issue that don't give up control of the method. 3. Requests with X-Forwarded-For are denied, so if it's a misconfigured open proxy or WAF that adds this header, you get protection. 4. The request has to come from the local box because we set the IP TTL to "1" which means it isn't routable off-box. This protects against misconfigured NATs and routers.
The AWS article and docs on IDMSv2 really bury the lede and people may take away “protects against SSRF” when AWS should really say “protects against SSRF for GET/POST methods only”. It’s fine as an additional layer but they kind of oversell it.
I don’t get why IDMS is TCP at all. IMO these services should be designed so that the VM root user can access them using an out of band mechanism only and VM non-root users cannot access them at all without explicit action from root.
It’s dorky, but 9pfs over virtio would work for this purpose, at least with a Linux guest.
At some point though you start making it too hard for legitimate applications and services to make use of the original purpose of the service.
IMDSv2 is really aiming to do two things:
1. Require a session to access credentials. You can only initiate the session with a PUT request to prevent some reverse proxy and WAF misconfigurations from becoming the front door. If your reverse proxy still supports PUT requests, IMDSv2 is going to reject anything with an X-Forwarded-For header
2. Once the session is established, it can only be used from the instance where the session began as you get an instance-specific token - so you can't take credentials and then plug them into your own client to start exfiltrating data
Reading a value out of /sys/fs/idms/key seems like just the right amount of barrier to entry. Most languages and environments support this out of the box with a line or two of code. Some may treat it as a privileged operation, which is a good thing. Operations like fetch() can only read it if they’re wildly misconducted so they have local filesystem access, which is a rather higher barrier to entry than plain SSRF.
And this supports regular file modes and ACLs, whereas getting this right with iptables/nftables is awkward at best and requires enabling a firewall.
I'm not sure what can really be done about that while still allowing custom AMIs with arbitrary operating systems. "Talk to local network" is a pretty generic interface.
AMIs are already specialized for virtualization. To get full performance, guests need a whole pile of specialized drivers. Most are industry standards, but they’re still specialized for VM. Adding one more for IDMS v3 and similar mechanisms from other providers seems straightforward to me.
Sure would help if I wasn't using some slapdash webscraping thing like gimme-aws-creds from Nike to do this. AWS isn't particularly helpful at providing role assumption, and really, neither is Okta.
And it's hilarious that ssh is being treated as some massive vulnerability by the security folks, and they are pushing enterprisey crap like Teleport instead, which crashes / agents go dead / doesn't work in low memory.
I tried to reverse engineer the python code of gimme-aws-creds and it was parsing/scraping an HTML response from AWS. If that's what it has to do, well ok, but the fact it wasn't a json service was a bit WTF
Yeah, SAML is, as far as I know, always headed. My copy of aws-vault is bright enough to launch against our AWS SSO provider(1), which secretly is also SAML to Okta, but I guess if your company's setup doesn't use AWS SSO then Nike must have thought parsing HTML to be The Solution™
The trouble with blog posts like this is that they are effectively click-bait written to drive traffic to a blog (i.e. aiming to ensure high ranking on search).
I say that because taking this blog post as an article, it makes one major false assumption: Credential Leakage = Compromise.
In reality, anyone who knows AWS IAM knows that you can:
- Activate/Deactivate credentials
- Time-gate policies
- IP-restrict policies
- Use roles with STS temporary credentials
- Use roles in conjunction with the externalId attribute
- Many more things I've missed out....
Yes sure you probably should rotate your credentials yada yada (or perhaps use the "new" X.509 auth method instead).
But that doesn't detract the fact that this blog is making a cheap-shot and not really taking into account the strengths of AWS IAM in terms of the security granularity that make it possible (with a correct, layered config) to comfortably say Credential Leakage DOES NOT equal Compromise. You just need to take a minute to RTFM and sit down and plan your IAM setup, instead of just doing it on a whim.
AWS IAM is actually one of the best bits of AWS. Its one of the areas where AWS leads in comparison to its competitors (IMHO).
What's even worse is that IAM and Trusted Advisor will tell you what stupid things you have done. At no point does their article give guidance in turning that on. Because it compromises their sales pitch.
Also if you push AWS keys into github (this does happen from time to time at the best organisation), github will nuke the rev. And even if you did it successfully there should be no damn keys which are usable because anything human issued should be via SSO and windowed or have MFA required set up. Everything else should be assume role based.
We did release some (hopefully) actionable guidance alongside the study[1].
Trusted Advisor is a fair point, but note that most of its security checks only come with the Business or above AWS support plan. IAM Access Analyzer is a great service but it currently supports only 6 resource types[2].
We'll look into adding both, appreciate the feedback!
FWIW all the interesting checks in Trusted Advisor (literally 100% of the good ones) are gated behind an AWS Support contract. We don't use Trusted Advisor because it's useless without a support contract, and as a small company we don't want to pay for a support contract that we don't otherwise need. I think it's not so great that AWS charges for these automated checks.
> Leakage of credentials sure has led to quite a few compromises.
No.
As per my post above, they "led to compromise" because lazy syadmins have been too inept to RTFM and make full use of IAM.
I mean, I could post my IAM credentials right here. I've even been lazy and haven't rotated them for 5 years (yes I've already slapped myself on the wrist !).
But they'll be naff all use to you because:
- My credentials are deactivated when I'm not using them.
- The STS policies they are linked to are IP-restricted
- The Role policies they are linked to are also IP-restricted
- The Role policies have minimal necessary privilege
- The Role policies can only be invoked be an assumed role
- The Roles have externalID attribute set using a UUID as value
Thus even if I copy/pasted my credentials right here, you wouldn't even get past step 1 let alone any further.
And that's just a simple example. You can go much more layered and granular than that.
Its not rocket science, its not difficult, its just a case of the old PPPPPP (Prior Planning Prevents P* Poor Performance).
We can't expect everybody to get to the level of understanding you have about the fine points of AWS IAM. It's a scalpel shotgun, an extremely sharp tool that can easily blow your company's entire leg off.
Companies generally totter along until about 50 engineers / 100 employees before hiring a single security-focused engineer. When those first few security engineers come on board, you can bet they'll have a backlog of at least several months' if not a couple years' worth of research and remediation before they get anywhere near taming the company's IAM setup. The vast majority of companies are small and probably aren't where they need to be in terms of IAM practices.
AWS adoption has become a lot more mainstream among companies. 10 years ago it was almost a secret weapon for forward-thinking (and VC-money-burning) companies that let them out-scale competitors. Unfortunately their defaults, documentation, guard rails, etc are still set up for those bleeding-edge, hyper-competent, top-0.01% companies.
Who is supposed to understand this then if not everybody? If you think of this as a "security focused engineer" problem then you've already lost.
If an engineer doesn't understand how to write secure software then they will continue to write un-secure software and there will be leaks and compromises. You're just externalizing risk and costs to your customers.
I'm going to disagree with you on "it's not difficult". I've been using AWS for fifteen years (since S3 came out) and I don't know how to configure any of the things you list there - at least not without doing deep research into every single one of them.
Not saying they're impossible, but they are definitely difficult to figure out!
I imagine they just log in to the console and activate/deactivate them. But that seems like more steps than necessary. Ideally you'd just use aws sts. In short you request a temporary set of access keys, and authenticate the request with both your access keys and mfa as part of the request. This returns a set of short-lived credentials that you then use. By using a policy attached to the user/role you make sure that the only thing they can request is a set of temporary access keys, with all kinds of conditionals, like ttl, source ip and mfa.
You can quite easily monitor cloudtrail for malicious activity and use cloudwatch to notify you when someone uses your credentials from, say, an unknown IP, outside of working hours, etc.
>> Leakage of credentials sure has led to quite a few compromises.
> No.
> As per my post above, they "led to compromise" because lazy syadmins have been too inept to RTFM and make full use of IAM.
You're contradicting yourself: "No, it hasn't led to compromises" but "they led to compromise because ..."
It's possible that (as you argue) AWS IAM provides strong security controls and, simultaneously, a lot of people use it in a way where credential leak _does_ represent compromise. My going-in expectation is that both of these are true. But this post isn't about whether AWS IAM is any good. Nor does the post talk about solutions -- I don't see a call to action to buy Datadog (aside from the banner in the footer that doesn't look connected to the post). So your criticism of "this is a non-problem -- everybody's just using it wrong" sounds pretty lame.
> So your criticism of "this is a non-problem -- everybody's just using it wrong" sounds pretty lame.
I don't follow your argument.
Perhaps let me try with simpler example.
I'll tell you my credit card PIN number is 7341.
Following your line of argument, you'll stand there screaming until you're blue in the face: "that's a secret credential leak, your card is compromised".
To which I say: "well, my card is in my wallet, and my wallet is in my pocket, so watcha' gonna' do with that PIN number chum ?"
So its the same with AWS.
I could tell you one of my access keys is AKIA3CWKZKKGZLHN and its secret access key is QOF0yG/lvqqAcklAHCDzPKRtk9D5oPnY.
But watcha gonna do with it ? Try logging in ? To what ? And even if you knew which AWS service, even with just low-hanging fruit security layers of IP range restrictions and time-gating you still won't get anywhere. Once we start adding extra layers such as roles and STS on top, then frankly you're more likely to win the jackpot on the lotto twice in a row.
I do get what you're saying. And yes, a credential leak is not _equivalent_ to a compromise. And yes, if you're doing things well, it won't be. (Although I think most people would say one of your layers of defense is breached and one ought to rotate the credentials, right? Otherwise, why have the credentials at all?) I think that's an important, somewhat tangential point.
Maybe we came to this with different assumptions? My going-in assumption is that most people have not set up those extra layers that you're talking about. For them, a credential compromise is tantamount to an account compromise. Thus, a post like this is relevant to raise awareness of credential compromise (and sure, maybe a missed opportunity to talk about those other layers one could add).
Is your going-in assumption that most people _are_ using those extra layers and so the post is pointless because it _erroneously_ implies that a lot of folks might be more exposed than they think? That's not what I thought you were saying. I thought you were saying: "if a credential leak compromises your account, then you're doing it wrong". That might be true, but if there are lots of people doing it wrong, then it doesn't matter.
(I don't like your PIN example because in real life, the initial conditions set by the bank are that you know your PIN and you have your card. You have to go out of your way to expose both. By contrast, with the AWS credentials, you have to have taken the extra steps you mention to establish the extra layers that you're talking about (IP range restrictions, time gating, etc.))
I don’t know why you’re getting downvoted, because you’re completely right. Perhaps people are reading your comment as “credential leakage doesn’t decrease security at all”, but that doesn’t seem to be what you’re saying.
A public/leaked set of IAM credentials, using reasonable AWS access policies, should be considered more secure than a private set of credentials that just has unrestricted access to everything.
People should be working on the assumption that their IAM credentials WILL be leaked and work back from there.
That leak may be accidental on Github, it may be via a hack, you might have accidentally pasted it into an email, it doesn't matter, the mechanism of leak is irrelevant.
Blast radius / Attack surface / Layered security ... whatever your choice of words, that's what you need to do and that's what AWS IAM gives you the tools to do.
As I've previously said, its really not that much effort to add the extra layers of security (e.g. you can start with easy stuff, low-hanging fruit like IP/Time constraints and then move onto more advanced stuff later).
Infact I'd argue the detractors here who have been busy downvoting me and arguing against me could have put a lot of extra security on their IAM credentials in the same timeframe. ;-)
Something I learned a few years back: Never read vendor or conference blogs.
The number one cause of AWS security issues is human error or not understanding what you are doing. AWS provides a hell of a lot of tooling, documentation and guidance to manage those risks out of the box. What doesn't help is buying a vendor product first and assuming it's going to do magic unicorn farts and make everything ok. What it will do is cost you a ton of money and time to tick a box somewhere that seemed like a good idea.
Someone selling a solution to those is selling you snake oil.
All of the issues identified will be picked up by Trusted Advisor or complain loudly on the IAM dashboard. If you don't notice that or don't use it, then your funeral.
I find this to be pretty unhelpful. "Do better" isn't useful or actionable otherwise everyone would already be doing it. "Trusted Advisor/ The Dashboard will tell you" is empirically unhelpful, as demonstrated by the post.
These are all interesting and worth looking at, but I found some metrics weird too. For example:
> 40 percent of organizations have at least one IAM user that has AWS Console access and does not have multi-factor authentication
Yeah, I hope your last resort, non-root account is locked in a safe place and has no multi-factor on it. Statistics are cool, but some context behind the findings is also useful.
Similar story with the active root accounts. If they exist only on paper in a safe, that's fine.
The root user is not an IAM user and probably is not counted among those statistics. Enumerating IAM users won't include the root user, so it depends on how exactly they got that 40% metric
(Root user ARN is arn:aws:iam::555555555555:root versus IAM users which have ARNs like arn:aws:iam::555555555555:user/USERNAME)
One of the authors here - confirming that "40 percent of organizations have at least one IAM user that has AWS Console access and does not have multi-factor authentication" is about IAM users and does not include the root user.
Alternative interpretation: for large customers with thousands of accounts, Datadog is too expensive - so small shops are over-represented in their data.
It surprised me too, but then I wonder if many organizations are only giving DataDog access to one account and that’s how they are pulling their numbers for this
These figures should be interpreted as lower bounds, because organizations using Datadog may not monitor all of their AWS accounts, such as those used for testing or development purposes.
Frankly, I would expect something deeper and more accurate from a large company such as Datadog.
This piece is clearly "content marketing", or even "technical content marketing". The purpose is to spread awareness about Datadog and possibly lure potential clients into looking at their product offerings.
There's nothing bad with it per se; but I would take the bait if you provide some really insightful information. This piece doesn't seem to add much value.
An example of something more useful, IMHO, is this report about cloud native threats [0]. (I think it's quite stupid to ask for name, company, etc, in order to be able to download the report. Anyway...) If I'm actively into security, it provides useful information.
As an example, the report cites that "It costs $430,000 in cloud bills and resources for an attacker to generate $8,100 in cryptocurrency revenue.". This information tells me that these attackers can generate a lot of damage, and it suggests I should do something to control the spending in my cloud infra, in case one of these attacks is successful.
Author here - sorry you feel like this is content marketing. I identify myself as a cloud security engineer, so that's a clear antigoal. The intent is to show what's the systematic adoption of cloud security controls _that matter_, based on real-world data breaches. We also published a follow-up post[1] with actionable advice for practitioners on how to turn on some of these mechanisms (such as S3 Public Access Block).
Would love to hear your thoughts on how we can make it "deeper" and "more accurate" in the future!
I think it's problematic that most security features (configurable by the user) are switched off by default and require some effort to set up. This is getting better though and for S3 buckets specifically a lot of additional features have showed up lately to "solve" this.
Can you give an example? Everything I can think of is either default-deny or minimally scoped, and "system" IAM roles tend to just not exist until you need them for the first time.
Ah yeah for IAM the defaults seem sane, true!
The things that I've come across recently is that encryption at rest is usually disabled (
EBS, SQS, S3, SNS, Kinesis, Cloudwatch log groups, Cloudtrail). Some of these have server side encryption and then it's easy to just check that box, other require you to set up KMS keys etc.
Audit logs are in many cases disabled by default (RDS, S3, OpenSearch, ELB).
S3 does not require TLS requests by default.
ECR does not have image scanning enabled by default.
Also new accounts almost all regions enabled and a default VPC in each (and subnets, route tables, security groups, internet gateway, dhcp option set). Unused VPCs are not recommended to keep around but I suppose it makes onboarding easier.
Good points. Totally agree about encryption - I think S3 is a legacy case where SSE-S3 is implemented differently to SSE-KMS, but still I'd be on board with KMS encryption (using an AWS managed key) as the default.
Audit logging costs money, so I'm on the fence about that.
A default VPC is easy to disable in enterprise deployments, but for the rest of us it is necessary to do quick tests with EC2-adjacent services - I'd be in favour of it not existing until you try to launch something though.
Looks like one of the DD directors had a falling out with AWS because they just found out AWS is about to announce a new wizbang service that will eat their lunch at re-invent
As someone that frequently criticises AWS i have to if there’s one thing it does right it’s security. Over-complex but way better than classic hosting.
> 40 percent of IAM users have not used their credentials in the past 90 days (access key or console password), affecting 70 percent of organizations.
I am curious where the OP got this information. Cannot imagine AWS sharing this. Datadog or GitGuardian study mentioned in the article cannot track data for such a large fraction of AWS users either. Just curious ...
We have started using wiz.io to detect these and many more problems with AWS infra. It uncovered several misconfigured (open) services and ancient access keys
1. Avoid long lived credentials (access key/secret key pairs) and IAM users whenever possible, and instead have everything assume roles for access to resources in the same as well as other accounts. Temporary credentials from STS are the way to go.
2. Turn on IMDSv2 by default for EC2 instances. v1 is susceptible to SSRF attacks so even if you’re attaching an instance profile to an instance and doing everything else right, you’ve got a problem if somebody finds a weakness in the app running on the instance.
Also heavily recommend checking out Scott Piper’s SCP Best Practices (https://summitroute.com/blog/2020/03/25/aws_scp_best_practic...) that still hold up even if being a few years old at this point.