Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What will happen when you commit secrets to a public Git repo? (twitter.com/andrzejdyjak)
133 points by 0xad on Nov 7, 2020 | hide | past | favorite | 64 comments


Cool experiment!

I PM the secret scanning team at GitHub and wanted to mention what GitHub did behind the scenes here. GitHub scans every commit to a public repo for secrets one of our secret scanning partners may have issued. We forward those candidate secrets to the issuing partner, and they take action. In some cases they auto-revoke the secret (AWS normally does this, I believe), in some cases they notify the user, and in some cases the response is configurable.

I checked that GitHub detects these tokens myself - within 1 second of the commit GitHub had notified AWS and Slack of the leak. AWS and Slack will then have taken action and informed the token owner, which in this case is Thinkst Canary, rather than Andrezj (the OP). I believe AWS normally auto-revoke, but they may have a custom setup with Thinkst Canary's tokens that allows Thinkst to continue to monitor them even once compromised.

Finally, GitHub actually delays the indexing of our search by a couple of seconds to ensure that, for normal cases, our secret scanning partners have time to take action before anyone else can find the tokens.

We're always looking to make secret scanning at GitHub better, so feedback as always welcome. It's also fascinating (and validating!) to see what happens to exposed tokens.

* List of GitHub secret scanning partners: https://docs.github.com/en/free-pro-team@latest/github/admin...

* Thnkst Canary tokens: https://www.canarytokens.org/


Couldn't auto-revokation be used for a "DOS" attack of sorts by generating a lot of randomized tokens and pushing them to any repo?

I realize that the search space is huge for many tokens types, but it seems viable.


Selecting (reading) data is very fast most of the time. So if no token matches I don't think this will result in a DOS.


I think they meant it would autoinvalidate the tokens which might be valid. I think the math on an AWS secret and access key would be ridiculous to brute force.. but other types of keys might be an interesting attack vector.


If you were guessing valid tokens why would DoSing be more valuable to you than use of the token?


Bypassing rate limits and amplification, presumably. Just generate a huge list of keys and push once to attack many services at once with the whole set.


what about the birthday paradox however? i.e. the attacker doesn’t need to brute force a specific key, but just any key... I guess for AWS the search space is still huge enough for it not to be a problem still (but didn’t do the math)


I believe AWS secrets are 240 bits. That is a pretty massive space. I don't know how many active secrets are out there, but I think someone would need to get very lucky to collide before the attack was noticed and stopped.

Other partner's secrets may be more susceptible.

Edit: I did not consider the paired access key which is another 70 or so bits. I think you'd need to collide on both to make someone have a bad day.


You need to guess both to use them, but you only need to guess the secret to get it revoked. GitHub does not check that the corresponding access key is somewhere in the repo too before taking action. You are right about this being impractical though.


Ah ok. I wasn't sure how that part worked.


seems like you could just log in directly at that rate


As spydium suggests in a sibling comment, I was not referring to overwhelming Github infrastructure but to mass invalidation of guessed tokens.


If you can guess keys and secrets, you probably wouldn't use that power to invalidate them!


I suspect this is relying more on a "birthday paradox" approach. The goal wouldn't be to invalidate a particular secret, but rather that with a relatively small number of randomly generated secrets, you would be taking advantage of this setup to invalidate at least some.


The point still stands: If you could do that, you would use them, not invalidate them.


Trying to use millions of generated tokens is not really feasible. Most services will throttle or block you quickly. Also often you would need to know the permissions the token has to get any access.

Writing millions of generated tokens to a text file and pushing them to Github is easy.

There is obviously no meaningful benefit to doing this, except potentially breaking some random deployments until they can replace the keys.


Why not refuse to publish a detected secret at all until the repo owner takes an action to allow it?


There are a few considerations on that one, but one very practical reason is the developer experience of dealing with false positives.

False positives are one of the big problems in secret scanning. Some partners issue credentials with patterns that make them very hard to distinguish from innocuous strings. For example, a Datadog token looks identical to a commit SHA. We would never block developers from pushing commits to GitHub just because they had 40 character hexadecimal strings in them!

GitHub's partnership approach works around the false positive problem by having the token issuer check whether a token is real and take action only if it is. However, this is a one-way communication from GitHub - the token issuer doesn't need to tell us whether the candidate secret we sent them was real or not, and in most cases we never know.

As a result, we can't replicate the zero false positive experience in a pre-receive hook (i.e., before the commit is pushed to GitHub). There would also be performance considerations from making 30+ http requests as part of a pre-receive hook.

In future, we are looking at creating a pre-receive hook solution that focuses on patterns that have a very low false positive rate. There are already some open source solutions that do this (links below) - in fact the OP linked to one from his Twitter thread. If/when GitHub offer is, it will definitely be opt-in, rather than opt-out!

* https://github.com/thoughtworks/talisman/

* https://github.com/awslabs/git-secrets


Cool! Thanks for explanation.


Think about it in terms of incentives and nudges.


Awesome, thanks for the background information!


Do you also scan when a private repo is changed to public?


I think so, and we 100% should do, but I just did a test and the secret I committed was still working a full minute after I converted the repo. Could be that the scan was in a queue, could be that it didn't trigger.

I'll dig into it and make sure this is working and is fast - it's a critical time to do a full scan of the repo's git history.


So scanning is only done on public repos?


GitGuardian scans on every event, this includes a public event (when a Repo is made public) and will alert if secrets are found within.


Why secret scanning is enabled only for public repos but not for private ones?


Private repos need a different approach, but committing secrets to them can still be a problem.

If a secret is committed to a private repo then anyone with read access to that repo could use it. That might give those users more permissions than they're supposed to have. It's particularly a problem in large organisations, where thousands of developers may have access to a private repo, but should not necessarily have direct access to production infrastructure.

That said, the risk tradeoff when a secret is found in a private repo is different to when one is found in a public repo. If it's a personal private repo that no-one else has access to, the risk may be limited. If it's a corporate repo with hundreds of contributors, someone almost certainly wants to be aware of it. Even then, each organisation will want to respond in different ways, perhaps depending on who has access to the repo, and what access the leaked secret granted.

I'd be remiss not to say that GitHub has a beta offering for private repo secret scanning that we launched in May. It's a paid feature, targeted at large, security-conscious organisations, that scans your git history and each new commit for secrets and displays them in the GitHub UI.



Because it should be OK to commit secrets to private repos - that's why they're _private_, after all, right?


No, that's not not why.

If you have secrets, encrypt them.

Private repos can be turned public, intentionally or by mistake. Repos can be exported to give software to third parties. Also, git users clone repos, which means that those secrets are copied every where. Can you make sure those stay private too? Do you make your developers encrypt their laptops or delete repos from them before they leave their house or office?


Also, it's possible that when you have a secret in a private repo, it accidentally leaks when you deploy that repo to a public server. And it's easier to do this than you'd think, e.g. by a mix of a few unrelated changes by different developers.

Also, when an attacker gets access to one private repo by some means, you don't want him to pwn your whole organization.


I suppose you could still XOR your secret S with a random bitstring B, then commit both S^B and B. Am I missing something?


Just use a fucking blog, man. I'm so sick of threads like this.

Edit: I'm sorry, this came off as way more aggressive than I intended. I get why people use twitter to share stuff like this, but it's much harder to archive, find or reference in the future, not to mention it being much less readable than a simple webpage.

To anyone reading this, please consider publishing your findings on a blog as well as on twitter.


Hey, OP here. I agree that a blog post would be more readable. In this particular case I just didn't expect that it will catch fire. If I would then I would spend more time on the form. I won't make that mistake again (i.e. in the future I will use a blog post as main driver of such twitter thread).


Don't worry, post wherever you want. Content existing somewhere is better than someone thinking it may take too much effort and not writing it in the first place. People may have their preferences about publishing platform, but telling someone off for not following that preference is not fair.

Thanks for writing about your experiment.


Thank you!


Or post it on Reddit or Medium or whatever if you can't be bothered with a blog.

Twitter "threads" need to die.


Reddit and Medium are no better in terms of weight and complexity.


old.reddit.com is better than the abomination that is the most recent Twitter redesign. Though new Reddit is truly terrible, probably even worse than Twitter.


> Twitter "threads" need to die.

I would even remove "threads". Sick of all the hate and fakedom.


You can use Nitter to make it a bit more readable and a bit less bloated. https://nitter.net/andrzejdyjak/status/1324360905237372929


It is amazing how fast and effective those bots are.

I remember one time I installed Windows 95/98. I wanted the PC to be on internet but did not have a firewall for Windows. But I knew the internet address where I could get one.

So after installing Windows I took my chances, connected to the internet, downloaded the firewall asap, installed it, and was already too late. The PC was compromised within 10 minutes and I had to reinstall it.


Where did the malicious code come from?


Well not from the firewall because it was highly trusted software. With the firewall there were also no problems.

It was just that there were some holes in Windows that were exploited by bots.



I think secret detection is great overall, but the only times I've run into it are false positives with client side API keys that are by their nature public.

For example, I recently configured something to use the Google calendar API from JavaScript on the client. It's fully safe to check in this key, since it is intended to be run in client-side JavaScript anyway, but I was still nagged about it.


It's a difficult challenge. Secrets detection is probabilistic, without checking the credentials it's nearly impossible to determine, with 100% accuracy, a true vs a false positive. But it has made big improvements. What detection solutions have you been using?


I get automated emails that I didn't sign up for from GitGuardian:

"GitGuardian has detected the following Google Key exposed within your GitHub account."

My understanding was that they could use an API to check whether it was a real key, but perhaps that doesn't say whether it is a client-side or server-side key?


Is there a way (outside Github) that adversaries can get access to the "full feed" of commits? I don't understand how the attackers can find a new key from all the changes that must go into github across millions of repos, within 11 minutes.


The firehose is simply the /events endpoint of GitHub API v3 off all public events. It’s delayed by 5 minutes. Anyone has access (subject to rate limits of course, which is 5000/hr when authenticated?). You can even have a look at the response in your browser, without any authentication: https://api.github.com/events

Docs:

https://developer.github.com/v3/activity/events/#list-public...

https://docs.github.com/en/free-pro-team@latest/rest/referen...


There are bots (some even run by security and threat intel companies) feeding off of the firehose. For a public display of one type of scanning functionality, take a look at shhgit[0,1].

0: https://www.shhgit.com/

1: https://github.com/eth0izzle/shhgit


Is the firehose public or do these companies have a relationship with github? If the latter, I assume github doesn't give the firehose feed to attackers who are only looking for AWS keys.


The author of sshgit wrote a great post on how it works using the public GitHub API: https://darkport.co.uk/blog/ahh-shhgit!/


You'll probably get an email from AWS that your account is compromised and you have 5 days to rotate your keys or your account could be terminated.

Then everyday they email you to see if you made any progress rotating the keys.

I made this meme about it that my boss didn't find funny.

https://imgur.com/ZCUu9rr


Yes you will, but only because GitHub already recognised this class of problems and came up with their own solution [1]. Bear in mind that it works only for vendors that integrated, so while it's true for AWS it might not be for your FOO API.

I giggled at meme.

[1] https://developer.github.com/partnerships/secret-scanning/


In some cases, github will require you to remove the offending file from the commit history - or make the repo private.

e.g. https://github.com/aliostad/deep-learning-lang-detection


This would make a good blog post. Maybe they should consider making one so we can have the article [in an easily readable/shareable/updateable form] after it's deleted from Twitter


OP here. I'm planning to do so, however it will require more work (better description of the problem, wider description of viable solutions, additional case studies). Most probably it will land on Medium and Dev.to.


Couldn't we come up with a standard format for secret keys, that would make it obvious they are a secret and which service they're from? This would make scanners easier to implement, and would remove the requirement to partner with GitHub to get your key format supported.

AWS uses an `AKIA` prefix for access keys (but none for secrets), SendGrid uses an `SG.` prefix on API keys, etc.


Greetings fellow Hackers! OP here. I see that my experiment got some traction which means more awareness should be spread about this class of bugs.

For starters I recommend reading "How Bad Can It Git" [1] and "Detecting and Mitigating Secret-Key Leaks inSource Code Repositories" [2] papers.

After that you can read "How I made $10K in bug bounties from GitHub secret leaks" [3] and some notable reports on HackerOne Hacktivity [4] [5] and [6]. This last one is interesting - leaking secrets is not only about code repository! Actually it's about entire toolset used for software development, hence secret scanning could (should?) be performed for other places such as CICD logs or even Slack messages [7].

Anyhow, back to code repositories. GitHub and GitLab both recognized secrets as a problem, so they came up with solutions. If you use GitHub you can easily integrate GitGuardian [8] into your workflow ($$$) but even if you don't GitHub provides you with Secret Scanning feature [9] (both are mentioned within the Twitter and HN threads). If you use GitLab you have a Secret Detection feature [10] at your disposal BUT in order to use it you need to setup Auto DevOps (that's why in my experiment GitLab didn't alert me - I just pushed commits to my public repo but didn't setup anything).

Apart from built-in solutions provided by GitHub and GitLab, one can use tooling of their own choice. For this I'd recommend two types of solutions: proactive and reactive. For proactive security, as mentioned in the Twitter thread, you can use Talisman [11] as pre-commit hook. For reactive security you can use GitLeaks [12] (used by GitLab) or similar tools - there are many of them but one stands out, namely truffleHog [13] which can sniff each and every commit across all branches (also used by GitLab).

What if you already commited a secret into the public repository? Start with revoking and continue with this tutorial [14]

gl, hf.

[1] https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git... [2] https://people.eecs.berkeley.edu/~rohanpadhye/files/key_leak... [3] https://tillsongalloway.com/finding-sensitive-information-on... [4] https://hackerone.com/reports/716292 [5] https://hackerone.com/reports/396467 [6] https://hackerone.com/reports/496937 [7] https://github.com/PaperMtn/slack-watchman [8] https://www.gitguardian.com/ [9] https://developer.github.com/partnerships/secret-scanning/ [10] https://docs.gitlab.com/ee/user/application_security/sast/#s... [11] https://github.com/thoughtworks/talisman [12] https://github.com/zricethezav/gitleaks [13] https://github.com/dxa4481/truffleHog [14] https://docs.github.com/en/free-pro-team@latest/github/authe...


Thanks for sharing. Did you also investigate what they actually did with the keys?


You mean adversaries? No. For token generation I used https://canarytokens.org/ so the only information I got was abou triggering the token, but not the context in which it was triggered.

BTW. GitHub (apart from GitGuardian) also has Secret Scanning feature [1] that basically allows the provider to act on the leaked secret. Amazon is integrated and it should invalidate and inform the owner but this also went to Thinkst, not me, so I don't know if it was actually invalidated and alerted.

[1] https://developer.github.com/partnerships/secret-scanning/


Nice honey pot experiment.


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: