I PM the secret scanning team at GitHub and wanted to mention what GitHub did behind the scenes here. GitHub scans every commit to a public repo for secrets one of our secret scanning partners may have issued. We forward those candidate secrets to the issuing partner, and they take action. In some cases they auto-revoke the secret (AWS normally does this, I believe), in some cases they notify the user, and in some cases the response is configurable.
I checked that GitHub detects these tokens myself - within 1 second of the commit GitHub had notified AWS and Slack of the leak. AWS and Slack will then have taken action and informed the token owner, which in this case is Thinkst Canary, rather than Andrezj (the OP). I believe AWS normally auto-revoke, but they may have a custom setup with Thinkst Canary's tokens that allows Thinkst to continue to monitor them even once compromised.
Finally, GitHub actually delays the indexing of our search by a couple of seconds to ensure that, for normal cases, our secret scanning partners have time to take action before anyone else can find the tokens.
We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.
I think they meant it would autoinvalidate the tokens which might be valid.
I think the math on an AWS secret and access key would be ridiculous to brute force.. but other types of keys might be an interesting attack vector.
Bypassing rate limits and amplification, presumably. Just generate a huge list of keys and push once to attack many services at once with the whole set.
what about the birthday paradox however? i.e. the attacker doesn’t need to brute force a specific key, but just any key... I guess for AWS the search space is still huge enough for it not to be a problem still (but didn’t do the math)
I believe AWS secrets are 240 bits. That is a pretty massive space. I don't know how many active secrets are out there, but I think someone would need to get very lucky to collide before the attack was noticed and stopped.
Other partner's secrets may be more susceptible.
Edit: I did not consider the paired access key which is another 70 or so bits. I think you'd need to collide on both to make someone have a bad day.
You need to guess both to use them, but you only need to guess the secret to get it revoked. GitHub does not check that the corresponding access key is somewhere in the repo too before taking action. You are right about this being impractical though.
I suspect this is relying more on a "birthday paradox" approach. The goal wouldn't be to invalidate a particular secret, but rather that with a relatively small number of randomly generated secrets, you would be taking advantage of this setup to invalidate at least some.
Trying to use millions of generated tokens is not really feasible.
Most services will throttle or block you quickly. Also often you would need to know the permissions the token has to get any access.
Writing millions of generated tokens to a text file and pushing them to Github is easy.
There is obviously no meaningful benefit to doing this, except potentially breaking some random deployments until they can replace the keys.
There are a few considerations on that one, but one very practical reason is the developer experience of dealing with false positives.
False positives are one of the big problems in secret scanning. Some partners issue credentials with patterns that make them very hard to distinguish from innocuous strings. For example, a Datadog token looks identical to a commit SHA. We would never block developers from pushing commits to GitHub just because they had 40 character hexadecimal strings in them!
GitHub's partnership approach works around the false positive problem by having the token issuer check whether a token is real and take action only if it is. However, this is a one-way communication from GitHub - the token issuer doesn't need to tell us whether the candidate secret we sent them was real or not, and in most cases we never know.
As a result, we can't replicate the zero false positive experience in a pre-receive hook (i.e., before the commit is pushed to GitHub). There would also be performance considerations from making 30+ http requests as part of a pre-receive hook.
In future, we are looking at creating a pre-receive hook solution that focuses on patterns that have a very low false positive rate. There are already some open source solutions that do this (links below) - in fact the OP linked to one from his Twitter thread. If/when GitHub offer is, it will definitely be opt-in, rather than opt-out!
I think so, and we 100% should do, but I just did a test and the secret I committed was still working a full minute after I converted the repo. Could be that the scan was in a queue, could be that it didn't trigger.
I'll dig into it and make sure this is working and is fast - it's a critical time to do a full scan of the repo's git history.
Private repos need a different approach, but committing secrets to them can still be a problem.
If a secret is committed to a private repo then anyone with read access to that repo could use it. That might give those users more permissions than they're supposed to have. It's particularly a problem in large organisations, where thousands of developers may have access to a private repo, but should not necessarily have direct access to production infrastructure.
That said, the risk tradeoff when a secret is found in a private repo is different to when one is found in a public repo. If it's a personal private repo that no-one else has access to, the risk may be limited. If it's a corporate repo with hundreds of contributors, someone almost certainly wants to be aware of it. Even then, each organisation will want to respond in different ways, perhaps depending on who has access to the repo, and what access the leaked secret granted.
I'd be remiss not to say that GitHub has a beta offering for private repo secret scanning that we launched in May. It's a paid feature, targeted at large, security-conscious organisations, that scans your git history and each new commit for secrets and displays them in the GitHub UI.
Private repos can be turned public, intentionally or by mistake. Repos can be exported to give software to third parties. Also, git users clone repos, which means that those secrets are copied every where. Can you make sure those stay private too? Do you make your developers encrypt their laptops or delete repos from them before they leave their house or office?
Also, it's possible that when you have a secret in a private repo, it accidentally leaks when you deploy that repo to a public server. And it's easier to do this than you'd think, e.g. by a mix of a few unrelated changes by different developers.
Also, when an attacker gets access to one private repo by some means, you don't want him to pwn your whole organization.
Just use a fucking blog, man. I'm so sick of threads like this.
Edit: I'm sorry, this came off as way more aggressive than I intended. I get why people use twitter to share stuff like this, but it's much harder to archive, find or reference in the future, not to mention it being much less readable than a simple webpage.
To anyone reading this, please consider publishing your findings on a blog as well as on twitter.
Hey, OP here. I agree that a blog post would be more readable. In this particular case I just didn't expect that it will catch fire. If I would then I would spend more time on the form. I won't make that mistake again (i.e. in the future I will use a blog post as main driver of such twitter thread).
Don't worry, post wherever you want. Content existing somewhere is better than someone thinking it may take too much effort and not writing it in the first place. People may have their preferences about publishing platform, but telling someone off for not following that preference is not fair.
old.reddit.com is better than the abomination that is the most recent Twitter redesign. Though new Reddit is truly terrible, probably even worse than Twitter.
It is amazing how fast and effective those bots are.
I remember one time I installed Windows 95/98. I wanted the PC to be on internet but did not have a firewall for Windows. But I knew the internet address where I could get one.
So after installing Windows I took my chances, connected to the internet, downloaded the firewall asap, installed it, and was already too late. The PC was compromised within 10 minutes and I had to reinstall it.
I think secret detection is great overall, but the only times I've run into it are false positives with client side API keys that are by their nature public.
For example, I recently configured something to use the Google calendar API from JavaScript on the client. It's fully safe to check in this key, since it is intended to be run in client-side JavaScript anyway, but I was still nagged about it.
It's a difficult challenge. Secrets detection is probabilistic, without checking the credentials it's nearly impossible to determine, with 100% accuracy, a true vs a false positive. But it has made big improvements. What detection solutions have you been using?
I get automated emails that I didn't sign up for from GitGuardian:
"GitGuardian has detected the following Google Key exposed within your GitHub account."
My understanding was that they could use an API to check whether it was a real key, but perhaps that doesn't say whether it is a client-side or server-side key?
Is there a way (outside Github) that adversaries can get access to the "full feed" of commits? I don't understand how the attackers can find a new key from all the changes that must go into github across millions of repos, within 11 minutes.
The firehose is simply the /events endpoint of GitHub API v3 off all public events. It’s delayed by 5 minutes. Anyone has access (subject to rate limits of course, which is 5000/hr when authenticated?). You can even have a look at the response in your browser, without any authentication: https://api.github.com/events
There are bots (some even run by security and threat intel companies) feeding off of the firehose. For a public display of one type of scanning functionality, take a look at shhgit[0,1].
Is the firehose public or do these companies have a relationship with github? If the latter, I assume github doesn't give the firehose feed to attackers who are only looking for AWS keys.
Yes you will, but only because GitHub already recognised this class of problems and came up with their own solution [1]. Bear in mind that it works only for vendors that integrated, so while it's true for AWS it might not be for your FOO API.
This would make a good blog post. Maybe they should consider making one so we can have the article [in an easily readable/shareable/updateable form] after it's deleted from Twitter
OP here. I'm planning to do so, however it will require more work (better description of the problem, wider description of viable solutions, additional case studies). Most probably it will land on Medium and Dev.to.
Couldn't we come up with a standard format for secret keys, that would make it obvious they are a secret and which service they're from? This would make scanners easier to implement, and would remove the requirement to partner with GitHub to get your key format supported.
AWS uses an `AKIA` prefix for access keys (but none for secrets), SendGrid uses an `SG.` prefix on API keys, etc.
Greetings fellow Hackers! OP here. I see that my experiment got some traction which means more awareness should be spread about this class of bugs.
For starters I recommend reading "How Bad Can It Git" [1] and "Detecting and Mitigating Secret-Key Leaks inSource Code Repositories" [2] papers.
After that you can read "How I made $10K in bug bounties from GitHub secret leaks" [3] and some notable reports on HackerOne Hacktivity [4] [5] and [6]. This last one is interesting - leaking secrets is not only about code repository! Actually it's about entire toolset used for software development, hence secret scanning could (should?) be performed for other places such as CICD logs or even Slack messages [7].
Anyhow, back to code repositories. GitHub and GitLab both recognized secrets as a problem, so they came up with solutions. If you use GitHub you can easily integrate GitGuardian [8] into your workflow ($$$) but even if you don't GitHub provides you with Secret Scanning feature [9] (both are mentioned within the Twitter and HN threads). If you use GitLab you have a Secret Detection feature [10] at your disposal BUT in order to use it you need to setup Auto DevOps (that's why in my experiment GitLab didn't alert me - I just pushed commits to my public repo but didn't setup anything).
Apart from built-in solutions provided by GitHub and GitLab, one can use tooling of their own choice. For this I'd recommend two types of solutions: proactive and reactive. For proactive security, as mentioned in the Twitter thread, you can use Talisman [11] as pre-commit hook. For reactive security you can use GitLeaks [12] (used by GitLab) or similar tools - there are many of them but one stands out, namely truffleHog [13] which can sniff each and every commit across all branches (also used by GitLab).
What if you already commited a secret into the public repository? Start with revoking and continue with this tutorial [14]
You mean adversaries? No. For token generation I used https://canarytokens.org/ so the only information I got was abou triggering the token, but not the context in which it was triggered.
BTW. GitHub (apart from GitGuardian) also has Secret Scanning feature [1] that basically allows the provider to act on the leaked secret. Amazon is integrated and it should invalidate and inform the owner but this also went to Thinkst, not me, so I don't know if it was actually invalidated and alerted.
I PM the secret scanning team at GitHub and wanted to mention what GitHub did behind the scenes here. GitHub scans every commit to a public repo for secrets one of our secret scanning partners may have issued. We forward those candidate secrets to the issuing partner, and they take action. In some cases they auto-revoke the secret (AWS normally does this, I believe), in some cases they notify the user, and in some cases the response is configurable.
I checked that GitHub detects these tokens myself - within 1 second of the commit GitHub had notified AWS and Slack of the leak. AWS and Slack will then have taken action and informed the token owner, which in this case is Thinkst Canary, rather than Andrezj (the OP). I believe AWS normally auto-revoke, but they may have a custom setup with Thinkst Canary's tokens that allows Thinkst to continue to monitor them even once compromised.
Finally, GitHub actually delays the indexing of our search by a couple of seconds to ensure that, for normal cases, our secret scanning partners have time to take action before anyone else can find the tokens.
We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.
* List of GitHub secret scanning partners: https://docs.github.com/en/free-pro-team@latest/github/admin...
* Thnkst Canary tokens: https://www.canarytokens.org/