Insert remark on why we use a centralized service for a distributed source contr...

jasode · on July 31, 2017

>Insert remark on why we use a centralized service for a distributed source control system,

Because Linus didn't put features into Git that Github solves.

You must break apart the different features of Github:

#1) communication (issues tracking, bug reports, pull requests, README.MD landing page, etc)

#2) hosting disk storage & bandwidth

#3) distributed source code merges based on content hashes (SHA1) instead of using centralized locks/unlocks (check in / check out) model of CVS/SVN.

Git itself only takes care of #3.

Github handles #1 and #2 (and also gets #3 by being built on top of Git).

You can't go back in time and wonder if Linus should have addressed #1 and #2 because he wasn't interested in starting a hosting company. Instead, he focused on the data format (Merkle trees, BLOBs, SHA1) and a sync protocol (git pull, etc) for Git.

If people wonder why we can't just use email for #1 (communications), you have to see that Github has become a "Schelling Point"[1]. Attempting to use email groups & mailing lists will not prevent the emergence of a Schelling point. Email can be a workflow for existing contributors (e.g. contributors of the Linux kernel source) but it's not convenient for discovery of new repositories (e.g. the web's "landing page" of a repo).

As for #2 (hosting), not everybody who wants to share a repository wants to pay $9.99/month VPS or other hosting plan from a web hosting provider. It would also be inconvenient to host it from the home laptop and punch a hole through the ISP router to make it work. Github solves hosting+bandwidth for free for modest non-commercial projects.

To restate, Linus' Git is a distributed _protocol_ but Github is a _service_ acting as a platform for the distributed protocol.

[1] https://en.wikipedia.org/wiki/Focal_point_(game_theory)

DaiPlusPlus · on July 31, 2017

#1 can be done by creating a separate submodule repo that only stores docs+issues files. It's up to the repo's users to agree on a system by which the files should be organized, but it's doable.

I'd propose directories "issues/open", "issues/closed", with each issue filename being "{created:yyyy-MM-dd} - {subject}.md". Symlinks could be used to track ownership/responsibility if each repo contributor has their own directory in the repo too.

apetresc · on July 31, 2017

That's an awful lot of extra work to insure against maybe 1 hour per year of downtime.

sleepybrett · on July 31, 2017

Just because present state is 99.7579% uptime (for the month) doesn't mean it will always be so.

You back up your data, why shouldn't you backup your github data?

manigandham · on July 31, 2017

Backup is one thing, choosing to run a crappy manual system just in case a vendor goes down is entirely different.

DaiPlusPlus · on July 31, 2017

It doesn't have to be a "crappy manual system" - I'm simply suggesting that given that git itself is a damned good distributed versioning database for arbitrary content, then we might as well also use it for distributed issue-tracking. A simple offline-mode browser-based editor that lives in a single HTML file within the repo would provide a nice GUI on top.

Hmm, I think I might be on to something... anyone want to start a project?

jasode · on July 31, 2017

>git itself is a damned good distributed versioning database for arbitrary content, then we might as well also use it for distributed issue-tracking.

For what it's worth, it's interesting to see that the Fossil distributed SCM includes an issue tracker but they made a deliberate architecture decision to not propagate the tickets data.[1] They had a chance to make your "distributed-issues-tracking" idea a 1st-class concept in Fossil but decided against it.

Also, the issues/tickets is just one example feature. Github will continue to evolve to add more and more sophisticatted SDL/ALM (application lifecycle management) like JIRA and Microsoft Team Foundation Server. Those features are not easy to implement in a peer-2-peer SCM with practical usability.

[1] https://fossil-scm.org/xfer/doc/trunk/www/qandc.wiki

DaiPlusPlus · on Aug 1, 2017

Thank you for the link. I read through their justifications and I think using a git-submodule solves their problems of polluting the main project history and permissions issue. Using directories for mutually-exclusive state grouping (e.g. "closed"/"open"/"new") solves the directory problem.

Goladus · on July 31, 2017

The reason is github did a fantastic job of implementing useful features. The visual design is unmatched and they have done a great job implementing developer oriented integrations and social features.

A more federated approach to this sort of thing might have been nice, but so far nothing I have seen comes close to the value-add offered by github.

xutopia · on July 31, 2017

You're sidestepping the main reason I believe it worked so well. It benefits from network effect. It is a collaborative tool and people like to have their work on there so others can collaborate with them.

Goladus · on July 31, 2017

That's important but I think people over-estimate it. It's both. I predict that if you analyzed the github network, you'd find many hubs are based around companies that chose to move their workflows to github based on features other than network effects. Or at least, the existing network was only one of many reasons.

manigandham · on July 31, 2017

As a business, we (and most other companies I know) chose github for features and performance. Nice that other open source stuff is there but doesnt matter for what we pay for.

n-gauge · on July 31, 2017

Totally agree - though I wish they would show files > 2Mb though on the web editor.

I develop directly on github - I even make all my commits to the Master branch, as this allows me to code with a nexus 7 tablet if necessary.

So this outage was a PITA for me. However I have plan B and started up tomcat...

Saved the day!

P.S. Thanks github for the free hosting! I can't really complain .

_skel · on July 31, 2017

Lots of people care, but we also recognize that the advantages of using a centralized system outweigh the disadvantages for many use cases.

tambourine_man · on July 31, 2017

Lots of answers so I'll try to address them all here. It was a rethorical question. We know what GitHub offers. I'm not a fan of its UI, particularly on mobile, but that's beside the point.

The point is, why we failed, once more, to have a distributed solution, even when the underlying tech assumes it.

Email was the last widely successful distributed medium. And it's dying, unfortunately.

Of course centralized services are easier to implement and use. Doesn't mean we should settle.

manigandham · on July 31, 2017

We didnt fail, nobody built it (probably from lack of demand).

tambourine_man · on July 31, 2017

Nobody built it = fail to provide a solution

manigandham · on Aug 4, 2017

Ok, that's not really the same thing but either way, what's the point of having a decentralized project management system?

Code is one thing which clearly allows for many benefits in having the entire local history but pushes more work towards the merging stage. When it comes to issues and discussions, it's often much easier to have a single source of truth without worrying about merge conflicts.

And the issue with github being down isn't an issue of centralization as much as it is about availability of a service. You're free to use github enterprise or gitlab and host the service yourself if you feel you'll get better reliability and performance, however I'm pretty sure you won't beat github's overall without significant investment of time and resources.

Perhaps having a simple read-only offline cache of the latest project management state is a good middle-ground for most of the problem and it shouldn't be that hard to do - but again that's up to how much demand there really is for it.

tambourine_man · on Aug 6, 2017

I'll quote myself for emphasis: Of course centralized services are easier to implement and use. Doesn't mean we should settle.

oliv__ · on July 31, 2017

That's it. I'm starting a github on blockchain.

Rjevski · on July 31, 2017

Count me in when you launch an ICO.

tambourine_man · on July 31, 2017

I wish you weren't kidding.

s73ver · on July 31, 2017

Because most things are easier when you can have one canonical source of truth.