Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft's GitHub account allegedly hacked, 500GB stolen (bleepingcomputer.com)
392 points by badRNG on May 7, 2020 | hide | past | favorite | 114 comments



Sounds sketchy given what the employee from Microsoft commented. The article is also not completely up to date with their “interesting” findings. For example, while a language projection for the Windows Runtime to Rust is interesting, it is also a public repository: https://github.com/microsoft/winrt-rs I’d take this article with a grain of salt until we hear more.


I am pretty sure anything from Microsoft on GitHub is intended to be open source eventually. Theres no reason they dont have proprietary projects in their own internal version control systems.


Microsoft had multiple Github Enterprise accounts before the acquisition, owned by multiple teams independently inside of Microsoft. They chose to use these instead of Microsoft's own internal repository (some VSS-based thing I think), which management ordered them to use.

More internal Microsoft code was being hosted by Github instead by Microsoft.

However, the Microsoft account on Github seems to only be public repositories, or repositories being prepped to be public. Actual internal stuff is hosted by one or more Github Enterprise accounts, the alleged hack does not claim they were hacked as well.


Ah source code at MS. Always amusing to me. VSS [1] (Visual Source Safe) was terrible and I always wondered how "real" companies could use it. From what I heard, MS never did use it, they bought it and sold licenses. That's it.

That being said, Microsoft actually moved Windows to Git [2] years and years ago. Presumably they did the same with everything else. Team Foundation Server (TFS) supports Git, so they probably have the critical stuff on that TFS still, rather than GitHub. Especially since those repo's are huge.

[1] https://en.wikipedia.org/wiki/Microsoft_Visual_SourceSafe [2] https://arstechnica.com/information-technology/2017/02/micro...


I left Microsoft around 2013 when the transition to Git was announced.

At the time, most teams (that I encountered) used something called Source Depot. I was told that back in the day, MS licensed the source code from Perforce, forked it, and that became Source Depot. I do know there were some teams using TFS and I'm sure there were some smaller teams using Git too.

I'm not sure how the transition went, since I left before it started, but I would bet it's still at least somewhat ongoing. Some of the people on my team that had been at MS for awhile were EXTREMELY skeptical of change.


They moved to Git, from what I can tell as an outsider, as part of the "Balmer fucked Microsoft" clean up duty that produced a toxic and soon-to-be-fatal culture over there.

Github ended up being the final piece they needed after years of effort. Most of what I said in the earlier comment was what I pieced together from multiple (ex-)Microsoft people who chose to talk about it, over years.

TFS afiact started as just a repo manager (in the sense, does what Github does) for VSS and Perforce, with more modern support for other things, including Microsoft's gargantuan Git monorepo... but a lot of people at Microsoft still like Github Enterprise better.

I imagine given this, whatever TFS does better, Github Enterprise will learn to do over the next few years. Microsoft is big on eating their own dogfood now, but they don't seem to want to do Google sillyness where they have 20 products that do the same thing and let them fight it out in FFA arena combat (even Lync is still Lync underneath, it just keeps getting new frontends).


> Lync is still Lync

mmm.. I know what you're saying but the change to Teams from a VOIP perspective substantially changed some of the under-the-hood stuff. They stood up a facade to offer luke-warm integration from SfB/Lync IP phones (authentication and basic calling, but that's about it). AFAIK this wasn't just a breaking change but a re-arch of some of the backend. Point being, they may be evolutions but the move to Teams is not just a new front-end. Substantial evolution has happened since we first installed server stacks to support Lync, even though we still see Lync fingerprints and junk dna everywhere.


Actually if you are taking a stab at Teams I met a developer who said the lead architect did work on Lync but it is in fact from scratch.


> some VSS-based thing I think

It was Source Depot, a fork of Perforce.

However, IIRC they moved to a git based Git Virtual File System for windows development a few years back..


>Git Virtual File System

Hows that coming along, I wonder.


It was released, and was used for the Windows repo. Search "vfs for git" - it's open source.

Then macOS removed kernel extensions, so they came up with a different approach and released that too. Search "scalar" to see it.


It may be useful to note that "scalar" is mostly (albeit not entirely) a git (auto) configuration tool using (now) built-in tools like sparse checkouts and sparse checkout "cones", it shows how much of Microsoft's improvements have made it upstream into git itself. (And it is a stated goal of "scalar" that everything eventually should be.)


Microsoft moved from VSTS Source Control to Git over 3 years ago.

VSS hasn't been used in a decade or more.


Right, I last used VSS in like 2008 or so and it was a pretty tired old tool by then. I mean, it worked, especially in the world it was built for in late 90's (corporate on-premise LAN stuff) - I can't remember any big complaints with it, although I do remember an occasional corruption, but was never f'd over by it. Unsuitable to the distributed and scaled world of today for sure.


In this sense "hacked" doesn't make sense. 8f everything was intended for open source.


The culprit would now have the ability to merge pull requests and make changes to the open source codes master branch. So if one of these owner accounts was hacked. Then yes this is a hack.


Sure, but isn't the point of git to be able to roll back all changes on any branch? And since GitHub is owned by Microsoft, they themselves would surely be able to regain ownership of the account.

This seems like a non-story.


How much code with malicious modifications has been pulled from these repos since then? There's no evidence this happened, but we can't just handwave it away with "we can undo changes".


Having openpgp-signed commits would prevent such an issue.


Would it?

Couldn't the "hacker" have simply generated a new GPG key and added it to the account he had control of?


The users would have specific keys trusted.


I think there are still two concerns here even if everything is intended to eventually become open source.

One, this doesn't address exactly how the repo was compromised. Likely one of the many folks with access had their credentials compromised, but until we know, there may be risk to other projects. Two, as the article mentions, it may not have had all passwords or API keys scrubbed.


Considering Microsoft owns GitHub, there's no reasons their internal teams aren't choosing to host more code on there.


They also have a competing service: Azure DevOps (formerly Team Foundation Server, Visual Studio Team System), which somehow has barely anything to do with Azure.

I believe that the stuff they're open sourcing go on GitHub, while the internal tools go on Azure DevOps. They also have their own VCS (Git alternative) called Team Foundation Version Control (TFVC), though I have no clue why they keep that thing around. They've also had Microsoft Visual SourceSafe, but even they themselves didn't really use that.

On a side note, they really suck at naming things.


> which somehow has barely anything to do with Azure.

This is some weird Microsoft thing. I think Windows 10 development is now handled by the "Azure" org[1]

[1]:https://www.zdnet.com/article/how-microsofts-azure-organizat...


Azure Devops is increasingly based on the same code as Github.

Github Actions and Azure Devops Pipelines, for example, utilize the same infrastructure in my understanding.


It's a lot nicer to use on the GitHub side, and more organized (.github/workflows/foo.yaml vs a yaml in the root of the project for all "Actions")


Azure DevOps does not require a YAML file in the root of the repository either - it's quite common to have them in a 'ci' folder.

While I prefer GitHub Actions in most respects, duplicated setup work and jobs are handled much better by the Azure version currently IMO.


That acquisition happened a blink of an eye ago, in the grand scheme of things. Things don’t change that fast, especially when you have the sheer amount and scope of DevOps and integrations that MS has. With the possible (but even then still unlikely) exception of greenfield projects, there’s basically no chance that anything but open source (or soon to be open source) MS projects were hosted on GitHub.


Not my field, but security would clearly be a greater issue for repos held in a publicly accessible site vs those only accessible over a VPN.

Sure, dogfooding is good, but commercially sensitive repos would seemingly - to my naive view - be better on private servers?


GitHub has an offering you can run on your own servers: GitHub Enterprise Server


They probably had their own internal version control for their IP before they bought GitHub. Can't imagine they suddenly migrated all their stuff to GitHub.


Microsoft is moving everything to GitHub. We were at a MS training center recently for Azure DevOps training and the trainer said they were porting DevOps features without equivalent to GitHub and bringing a few things like YAML to DevOps in preparation for the transition.


> In a directory listing and samples of other private repositories sent to BleepingComputer, the stolen data appears to be mostly code samples, test projects, an eBook, and other generic items.

Other than private keys or sensitive info being left behind, doesn't appear to be severe. Looks nothing burger given the data until more is released.

> Microsoft employee Sam Smith replied to Under the Breach's tweet stating that he thought the leak was fake as "Msft has a “rule” that GitHub repos must be public within 30 days."

Curious, what does microsoft use internally? Instance of github enterprise? Azure devops?


> Curious, what does microsoft use internally? Instance of github enterprise? Azure devops?

I guess they have internal Git servers, since they develop VFS for Git[1] to handle large amount of files in git, but IIRC github isn't support it yet

[1]https://vfsforgit.org/


Yes. Most teams are using git repos in Azure DevOps. Anything in GitHub is supposed to be made public pretty quickly.


When I was in Azure last year we used an internal Git repo named, I think it was named 'onebranch'. There are only two things we posted to a private github.

First was our description of our JSON apis, which would eventually be sent to a public github.

The second was some internal documentation we had for our internal clients. I don't think there were any 'secrets' in there, but it would be documentation for the internal side for internal clients that no outsider could ever use.


As far as I know, everyone uses azure devops. I don’t know of any teams committing to github, but it would make sense only things that will be open sourced will be placed there.


Yes, basically everything is azure devops internally, spread across numerous tenants for different orgs.


I don't think the potential leak of source code is the worry for Microsoft here, it is the fact that they got access in the first place, as that could have security implications for other projects.


Well if MS can't secure their own code on GitHub, surely anyone with private repos has to be thinking to jump ship (at least until the flaws are identified and fixed)?


When I interned on the Office team in 2014, we used something called Source Depot. I’m assuming it’s changed since then.


Many teams have used the public github.com version. You can see all the .NET platform development happening in the open.


I used Azure DevOps while at NERD in 2019.


I looked at this yesterday. 90% of it is garbage files from a Chinese developer, or projects that have been open-sourced for 3-4 years.

That's not how "leaks" and "hacked" works the last time I checked.


> This evening, a hacker going by the name Shiny Hunters contacted BleepingComputer to tell us they had hacked into the Microsoft GitHub account, gaining full access to the software giant's 'Private' repositories.

Well, someone asked the other day whether or not private repositories on GitHub were safe: [0] I think you now have a concrete answer regardless if this is true or not. I have already made the case to privately self-host, especially if you're a large enterprise, but preferably on-site [1][2] to avoid these types of attacks and in the process to reduce costs like this as many were discussing in other HN discussion [3], but here we are.

If they can do it to Microsoft, they can do it to anyone else who has a GitHub account.

[0] https://news.ycombinator.com/item?id=23057769

[1] https://news.ycombinator.com/item?id=22960579

[2] https://news.ycombinator.com/item?id=22868406

[3] https://news.ycombinator.com/item?id=23089999


> If they can do it to Microsoft, they can do it to anyone else who has a GitHub account.

I don't think this is necessarily true. Microsoft's org, like any large org, has a large number of users with access. Its security is dependent on each one of those many accounts being secure.

A smaller org, or an individual, can secure their repositories much more easily as there's fewer entrypoints.

They haven't mentioned whether this hack was achieved by compromising individual account credentials, or by compromising the Github platform itself. If it's the latter, you may be right, but I suspect it's more likely the former.


Isn't the upside of hosted platforms like this that they have teams of people securing and monitoring the platform, which can be a bit much for one person who's self-hosting? I do self-host other things but the article doesn't say anything about how the breach might've occurred (e.g. 2FA not enabled?).


That's the SaaS sales mantra repeated. It may or may not be true no matter how appealing the argument is.

Ultimately, your weakest point ends up being humans who are prone to mistakes. You can mitigate some of those mistakes with technology but you can't mitigate all of them. So SaaS may help shore up certain attack vectors but it may increase focus on the remaining vectors and may potentially make failure points more significant (more impact for a security breach from a large provider vs less impact of a security breach from systems of independent providers). Some of that can be mitigated with smart designs, but you lose some advantage of traditional "security through obscurity" which has some value (though it shouldn't be relied on as a failsafe).


The counterargument is that a SAAS platform like Github's interests are in the ongoing viability of the service, while my interests are only about in my data in the service.

Those are only somewhat aligned, as anyone with a dispute about terms of service can tell you.

> which can be a bit much for one person who's self-hosting

If your repo serves one person, why do you need your repo to be hosted in public at all? `git init` and a backup are all you need.


Many consider hosting the repo (privately) on Github etc to be the backup.


I have already made the case to privately self-host

What makes you think you can do a better job than Microsoft or github?


Could this stop? Every time some heretic evades the "cloud" and self-hosts, people (whose income presumably depends on the "cloud") spread FUD.

Here's the security of the "cloud":

https://arstechnica.com/information-technology/2012/03/hacke...

Why on earth should a maintained server that just runs git over ssh be less secure?


It's an overused and abused argument but it's not a null argument (e.g. just FUD). It has enough validity not to overcorrect the other way. As a general rule, organizations at least need to carefully consider the true cost commitments of providing even near-par level of security with their own internal resources as they could get 'out-of-the-box' from a cloud provider. It's easy for organizations to imagine they will, quite another for most to actually pull it off in an auditable fashion. The minute an org starts opening holes in their firewalls to accommodate remote access or using cloud-based tools for remote access, I start to get skeptical (e.g. how well is that network segregated, anyway?). The shear volume of internal process and policy dependencies that need to be managed and maintained to "do it right" is a supremely tough burden for SMBs, for instance.


Smaller target?


Exactly. Same reason you shouldn't upload your private keys to a popular, centralized entity.


Private github repositories are private the same way that facebok messages are private - private from your roommate, not from the people who own the platform or determined attackers.


Would be nice if git could store encrypted data and decrypt files on checkout. Repositories could be truly private that way.


Unfortunate timing[0] but that is pretty much what Keybase had implemented a few years ago[1]

1: https://keybase.io/blog/encrypted-git-for-everyone

0: https://blog.zoom.us/wordpress/2020/05/07/zoom-acquires-keyb...


Nothing stops one from putting encrypted artifacts into a git repo, encryption could be done via hooks. Except this would negate the delta storage, each version would be completely different, and non diffable.

One can just encrypt the .git folder and wrap the git client to handle the encryption/decryption on use. It's always a question where and how well do you keep the keys.


Use a gpg smartcard (yubikey or similar). This is how I store Ansible Vault secrets.

You’re absolutely right about the deltas. Initially I had one secrets file per environment, but as my projects grew I ended up breaking them out to a file per environment-project. Both for storage reasons and because it’s difficult to modify one encrypted file from multiple branches without writing plaintext secrets to disk.


This is leaping to a huge conclusion, but you are correct that if this was a Github data breach, this is clearly a much bigger issue. However, if this was the case, and 1) this "leak" happened on March 28th, and the individual claims to no longer have access to the account, I trust that Github would have proactively communicated with their users about such a large scale event, especially after having fixed it.

This, if true, is almost definitely a compromise and use of a single users' access credentials, which were then rotated (thus the attacked losing access).

I'm not saying that credential stuffing isn't a large-scale problem (I strongly believe that it is, and have even dedicated time to some potential solutions in the past), but jumping from "someone lost their credentials" to "omgz github can't be trusted!" is a bit of a disingenuous leap.


The most likely explanation is they phished an employee. How does self-hosting prevent that?


> I think you now have a concrete answer regardless if this is true or not.

How do we have a concrete answer if this is not true?


> If they can do it to Microsoft, they can do it to anyone else who has a GitHub account.

It happened to Cisco as well a while back, I have a copy of that source somewhere.


You probably shouldn't readily confess to hoarding stolen property in an online forum.


Not a crime to possess information, not property, that was illegally obtained in either USA or my own country provided you didn't counsel or encourage the original theft.


The fact that you did not commit the initial theft does not put you in the clear. Source: a lawsuit that I won with exactly that theme.

I don't know where you are but the bulk of the jurisdictions would not look favorable upon you. I agree that your loose interpretation of the law might work out in your favor. But just like downloading copyrighted material is illegal so is downloading copyrighted data from a source that you know does not have the option to legally give you a license to copy or use that data. So from one aspect of the law you are in the clear, from another this is an open-and-shut case of copyright violation and on top of that you will have to work real hard to prove that you weren't the one to steal it in the first place using the 'upload to someplace anonymous, then download it again' trick to whitewash the data.

Some risks are worth taking, this particular one I'd think long and hard about it if the counterparty is the proverbial 800 pound gorilla.


>The fact that you did not commit the initial theft does not put you in the clear. Source: a lawsuit that I won with exactly that theme.

In terms of criminal law, it is favorable to me. Source: I was criminally accused and have a court judgement clearing me of wrongdoing for possessing information that was stolen by 3rd parties and published online before I obtained it.

>So from one aspect of the law you are in the clear, from another this is an open-and-shut case of copyright violation

I am satisfied that merely possessing stolen information and not distributing or profiting from it is not a copyright violation if the source code can even be copyrighted.

>But just like downloading copyrighted material is illegal so is downloading copyrighted data from a source that you know does not have the option to legally give you a license to copy or use that data

I have not agreed to be bound by any licenses from Cisco before downloading the data nor did I necessarily know what it was before downloading a zip from a file sharing site.

I'm happy to discuss this over email if you want me to reach out for debate.


This got me thinking.

How many companies, in terms of Market Cap are currently relying on GitHub Private Repo for their source code?

And how does very large enterprise, or financial institution ( Which is like the foundation of modern day society ) handle their source code? I presume they wont use Github for anything important?


GitHub has an offering where you self host it on your own hardware or in your own cloud.


This sucks :(


So many people were paranoid about Microsoft reading their private source code post-acquisition; turns out it was the other way :)


I think that "leaked" would be a better choice compared to "stolen". You can't "steal" source code, unless if somehow you remove the original (such as if the source code is stored on paper in a safe somewhere and there are no copies and someone goes and steals it).


or if during the hack, the code was deleted after copying

regardless, I agree.


So a closed source software company buys an open source tool company, and inadvertently make closed source open source!

Or, in other words, if you want to keep something private, don't put it in the "cloud"!


Me and a friend were having coffee and were discussing secrets something like 10 years ago. The conclusion of our conversation was "Everything always comes out" (translated from Swedish [context was some gossip that eventually leaked about our common friend]) which boils down to that the only way you can really ensure something stays secret forever, is by only having it in your mind and not sharing it. As soon as you share it _anywhere_, there is a risk of it leaking somewhere.

The lesson I carry is that the more secret it something is, the closer to my brain it is. Top-secret = only in my head, little bit secret = encrypted on my harddrive, little less secret = encrypted in the cloud, not secret at all = just dumped in a Google Drive account


Mentally, tell your self there are some very personal photos in the data.

Then most people think more carefully about where they store it :)


>"Everything always comes out" (translated from Swedish)

"What's done in darkness will come to light" is an oft-used adage in English of Biblical origin. Many variations. I agree with it, too.


Someone else's compu... yeah, yeah. Yawn. Move along.


> Overall, from what was shared, there does not appear to be anything significant for Microsoft to worry about, such as Windows or Office source code.

> In a directory listing and samples of other private repositories sent to BleepingComputer, the stolen data appears to be mostly code samples, test projects, an eBook, and other generic items.


From the article:

> Microsoft employee Sam Smith replied to Under the Breach’s tweet stating that he thought the leak was fake as “Msft has a “rule” that GitHub repos must be public within 30 days.”

Does that mean MS bans the use of GitHub for permanently storing private repos?


We (the TypeScript team) have a bunch of private repos (blog post drafts, planning docs, reproduction repos, rando internal tooling ) on GitHub that are many years old. I'm pretty sure that Sam was mistaken here.


To be fair, we had to file for exceptions to the 30 day policy for a lot those. Not that any of them are terribly important to be private; a private GitHub repo is just a convenient discussion form for collaboratively composing blog posts and such (change tracking and reviews are so nice). The blog post one, in particular, probably gets a pass because everything composed on it is eventually published in the open anyway (albeit without the discussion and editorial history).


Comment above says their private stuff is on azure devops


Gentle reminder that any private repository is sensitive, because the people pushing to them might not be as careful with what they push, because it's private.

There are hundreds of different kinds of credentials that can be hidden all throughout the history of a Git repo (in code, in logs, in comments, binary blobs, etc). If you don't have a very robust credential scanner operating continuously, and you have a large organization, you probably have active credentials hidden in your private repos.


Considering these repos are meant to be made public within 30 days, I'd hope Microsoft employees would be more careful when pushing. Leaving it to a last minute cleanup sounds like a recipe for disaster.


The most impressive thing is that Microsoft has more than 3k projects (with forks) on GitHub as public repo


Not sure if anyone knows but if you use AWS, you can actually create your own repository there for your organization. it doesn’t have the GitHub UI or features like issues, but should going in the right direction where your organization owns your private repos.


Having your repo in Amazon's cloud instead of Microsoft's cloud doesn't mean that you "own" it to a greater degree, does it? It's just a different company holding the keys for you.


True. It's just a lot safer because of how AWS has their permissions/access/ACL setup. Not to mention the attack surface is less since you don't expose your repo organization name, repo users, etc like they have it out in open in GitHub.


How's this different from doing the same thing in Azure?


So is there a list somewhere to see if we were affected?


You're not affected. Even if real this is not a breach of user data.


GitHub reliability is not high enough to support the way people are using it. The administration of the site is not predictable enough, the site has frequent downtime, and there are security issues.


500gb is too big, any information about the content?


Plot twist, all the data was open source


Awfully boring things to breach. Not very exciting except for the fact that some employee probably installed nudez.exe.


I feel vindicated for my own insistence that private repos also get scanned and purged of secrets


[dead]


Epic HTML hacker here


[dead]


cool.


TL;DR: Microsoft failed to protect its own secrets


if it is in the cloud, it will be eventually hacked


Wouldn't that mean any computer connected to the internet shares that risk?


yeah, that's why in an average on-prem setup you usually would have firewalls, DMZs, IDS/IPS and all the good stuff - and people have been doing on-prem security for decades and accumulated an enormous wealth of knowledge and practice.

in cloud - it all new. people are still figuring out how to deploy their software so that it works both for users and developers. That's why on average onprem is more secure than cloud.


"new" the cloud is not new. AWS was launched in 2006.

Also firewalls/DMZ/IDS has nothing to do with a SaaS offering from GitHub. That on prem setup you mentioned would be practiced by GitHub.

My opinion of general on-prem security is that it's often haphazard and updates are almost never applied.

But also a lot of on prem security practices hamper developers and users. Because they rely on decades old outdated ways of working.

Also your comment makes no sense as this is likely to be a hack through stolen credentials.


Maybe I'm crazy but I'm not sure all that security makes things more secure than the cloud.

I feel like the hacks I hear about are pretty evenly distributed between cloud and on prem type setups... and most of the big ransomware attacks are almost exclusive to on prem.


Why?


because cloud security (and cloud configuration in general) is hard. people check in sensitive stuff to github repos all the time, misconfigure IAM policies just so that it works (capitalone).

It is a new and ever changing field, there are many cloud vendors and their product line and configurations change all the time - meaning it will take a lot of time until majority of IT specialists become familiar with configuring secure cloud and majority of users of those cloud services will not make security mistakes.


That only covers some cases, not the absolute of everything in the cloud eventually being hacked. Plenty of people are competent at secure cloud based systems. Your claim and your support don't match. I expected this to go more in the direction of: cloud providers are such big juicy targets they'll just be infiltrated by advanced persistent threats who in turn gain illicit access to everything hosted.


WE JUST WANT SUNRISE BACK


Microsoft stands to benefit from getting their private code exposed, because they can use it to claim that open source competitors are ripping it off. https://en.wikipedia.org/wiki/ReactOS#Internal_audit https://www.theregister.co.uk/2019/07/03/reactos_windows_res...


That sounds like a fairly weak argument, because IP protections still apply to open source, except for trade secrets. The degree of protection is dictated by a project's license, or if in the absence of a license then a nation's default copyright status.

In your first ReactOS example from 2006 it wasn't even Microsoft that discovered or claimed the problematic code violation. It doesn't look like Microsoft was involved at all.


From your Wikipedia link:

> Also, the 2004 leaked Windows source code was not seen as legal risk for ReactOS, as the trade secret was considered indefensible in court due to broad spread.

This sounds like Microsoft can't benefit from a public leak.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: