GitHub forking has one big flaw (2011)

cryptica · on June 26, 2015

I think it's only fair that the original repo should be the most promoted one. Usually, the original author has put a lot of thought and effort into coming up with the idea and turning it into a popular open source project.

You don't want to create an environment in which forkers can easily steal credit from the original author(s). It takes a lot of passion and goodwill for someone to start a new open source project. I think they deserve some credit.

If an 'owner' no longer feels up to the task of managing their project, GitHub lets them transfer it to someone else. That's what happened with ExpressJS and it worked out fine.

tessierashpool · on June 26, 2015

I think it's only fair that the original repo should be the most promoted one. Usually, the original author has put a lot of thought and effort into coming up with the idea and turning it into a popular open source project.

OP brings up the example of a project where the original repo shouldn't be the most promoted one, because it's been abandoned, while plenty of other repos are still alive. I blogged about the same thing earlier this month and used the same example.

if you haven't encountered this problem, you will. it is absolutely a problem every developer will encounter. it's only a matter of time.

somebody starts a great project, doesn't have time to keep it alive, and the community fractures because GitHub has no way to differentiate between "original repo" and "canonical repo."

not going to rant about this because I already did in a blog post: https://www.pandastrike.com/posts/20150610-thought-experimen...

If an 'owner' no longer feels up to the task of managing their project, GitHub lets them transfer it to someone else. That's what happened with ExpressJS and it worked out fine.

this is a ludicrous statement. the Express.js transfer of ownership was a ridiculous fiasco full of angry drama, hurt feelings, and core developers resigning from the project.

documented here: http://gilesbowkett.blogspot.com/2014/07/the-bizarre-bazaar-...

also, the idea that you can solve this problem by having the original owner transfer ownership doesn't make any sense. the whole problem is that the original owner isn't paying attention at all, doesn't care in the first place, and wouldn't know who to transfer ownership to, if they did care.

it happens all the time.

cryptica · on June 26, 2015

>> this is a ludicrous statement. the Express.js transfer of ownership was a ridiculous fiasco full of angry drama, hurt feelings

True, I should rephrase; from the consumer's point of view it turned out fine :p The project is still healthy.

About ExpressJS, I did read something about one of the main developers not even being aware that the transfer was happening until the last minute. I also heard that there might have been money involved and it probably wasn't a fair process. There are a lot of ethical dilemmas there. A transfer of ownership doesn't have to be this nasty though.

omribahumi · on June 26, 2015

When do forks become a new project?

Would Xorg still be a fork of XFree86? Would Ubuntu still be a fork of Debian?

Also, sometimes a project is abandoned and a fork is still maintained. Don't you think that the fork should be in the spotlight in that scenario?

chris_wot · on June 26, 2015

Do you consider Libreoffice it's own project!

McElroy · on June 26, 2015

I for one do. They've diverged heavily since forking from OpenOffice. LibreOffice is a great project, I use it myself and I recommend it to all my friends and family.

chris_wot · on June 27, 2015

That was meant to be ended with a question mark. Sorry.

tacone · on June 26, 2015

A root repository has pretty much always a big advantage, and it's usually always the one with most star and watchers. If you were to sort everything only on the basis of popularity, not much would change, except perhaps a few notable exceptions. It would just be more fair.

z3t4 · on June 26, 2015

> You don't want to create an environment in which forkers can easily steal credit from the original author(s).

In most licenses I've read on Github, even the very permissive ones, the creator will still have copyright of all forks, even if all his/her code has been replaced.

tenfingers · on June 26, 2015

Also, of note, is that currently if you "fork" using git and push it into a github repository, there's no way to re-attach/hint github about the original ancestor.

The problem is compounded by the fact that doing anything related to the ancestor, such as pull requests, or even just diffs, will not be possible.

I submitted a feature request to the github folks years ago, but nothing has really happened (I was just suggested to delete and fork the repository again).

Not that it's hard: you could determine the ancestor and different lineages just using the hashes of the commits upon the first push to github. You could also do it completely offline, it wouldn't matter.

There's also quite a number of forks available on github which aren't really visible because of that. I know that for some of my own projects and smaller projects that I checked, a code search would actually reveal many non-linked repositories. And I also know why: I often don't fork on github (why would I if I know nothing about the project yet?), I just shallow clone locally. Forking on github doesn't serve any purpose until you actually change the code, which oftentimes has already been done locally.

kpcyrd · on June 26, 2015

    git clone git@github.com:original/repo.git
    # make some commits
    git remote rename origin upstream
    # fork the repo on github
    git remote add origin git@github.com:your/repo.git
    git fetch origin
    git push -u origin master

tenfingers · on June 26, 2015

Sure it's possible, however the only way to do it afterwards is to delete the repository and fork it manually again. Deleting the repository is only fine if there are no issue/wiki/data associated with it.

The main problem is not omitting the "fork the repo on github" part voluntarily. Sometimes you're just not aware that you're pushing a repository which already has some ancestor on github, while you originally cloned from the main author's website instead. This has happened to me countless times.

The graph network is totally useless in these cases.

zimbatm · on June 26, 2015

This would be perfectly fine if all the project data was stored into git. Unfortunately github issues, comments, project metadata, webhook setup, team membership, ... are not part of the repository.

TazeTSchnitzel · on June 26, 2015

Not to mention past and outstanding pull requests.

nathankleyn · on June 26, 2015

Amusingly, Bitbucket has since removed a lot of their useful fork information after a redesign that took place between now and this article's publish date (2011) [1].

One approach to this problem, as the article mentions, is to list by popularity - however what would this mean? If it's by the number of "stars", not many people curate their list to keep them up to date. It would have to be some kind of rolling popularity measure, perhaps number of unique users who've cloned a repository in the last month or something.

[1]: https://bitbucket.org/site/master/issue/5009/list-of-project...

qznc · on June 26, 2015

It affects only one part of the rant, but I wonder why Github considers it necessary to publically fork a project. Often, I want to push a single fix. I would like to

    git clone git@github.com:foo/bar.git
    # fix locally
    git commit
    git push  # creates pull request

msandford · on June 26, 2015

God that would be awesome, wouldn't it? I've got a few one-change repos cluttering up my GitHub account. I don't do anything with them now that the one feature I needed is back upstream. Should I delete them? I don't know.

jvanier · on June 26, 2015

Sure, delete it! The pull request contains all the information needed about the actual change.

fapjacks · on June 26, 2015

Good to know! That is extremely helpful.

ghthor · on June 26, 2015

While that would be convenient,I have no idea how it would work. You're asking to push to something you don't have rights to modify.

Git evolved with a pull workflow because the problem it was made to solve was a the pull workflow of Linus and the kernel. This inherently means you must self host your changes while they're being reviewed and accepted.

carussell · on June 26, 2015

... which GitHub totally messes up, by the way. You have to go ahead and create a (superfluous) on-GitHub fork to file a pull request. Not a problem if the maintainers know how to use git and are willing to pull from you without using the GitHub UI, but there are tons of people whose only exposure to git is through GitHub and stops there.

As I wrote to an acquaintance earlier this week while venting about GitHub (and the condescending remarks you're liable to get from people who equate it with git and will assume that a tendency to stay off the former means you're unfamiliar with the latter):

"Coming from a background where wiki pages would be hosted on wikis and submitting [code] changes for review is as simple as a) creating a patch and b) attaching it for review, as I look at all the unnecessary (>3x) overhead that GitHub imposes and all the people who don't have a problem with it and feel that it's good and proper and normal, I feel like I'm in crazytown."

Further reading: Mozillians'comments on Gregory Szorc's post "Please Stop Using MQ"[1]. Pay particular attention to everything that Gijs has to say.

1. http://gregoryszorc.com/blog/2014/06/23/please-stop-using-mq...

rquirk · on June 26, 2015

It's sounds like how Gerrit works, from a user's POV at least. You can push to a repo that you don't really have write permission to, and it goes into Gerrit. The post-push scripts create a sort of branch-tag thingy from master with your commits on, and so when the Gerrit review passes ("pull request accepted") the change is merged/rebased/cherry-picked onto the latest stuff. If the review is rejected then the temporary branch is dropped and that's that. Since all the reviews items in gerrit are just git references, you can use all the usual git commands on them (pull or fetch it, then merge) if you know the gerrit tag, but since they are strange branchless things they are not pulled down by default in a normal clone.

It's harder to explain than to use actually. Ah! there's a bit of a wrinkle with gerrit in that it uses a local hook to insert an ID into commits, so rebasing or cherry picking knows which commit to reference. But that might be optional, it'd be like cherry-picking a pull request, I think github doesn't close the original in that case? Not sure on that though.

qznc · on June 26, 2015

You can selectively deny and modify pushes. For example, gitolite features read/write permissions per branch. You could allow unknown pushers to create new branches but never modify existing ones. UI wise you want to mark those rogue branches and filter them out on most occasions. For example, just have them in a "sub-dir".

It should feel like the inverse of the "check out pull requests locally" trick. https://help.github.com/articles/checking-out-pull-requests-...

singularity2001 · on June 26, 2015

    git clone git@github.com:foo/bar.git
    # fix locally
    git commit
    git push  # creates pull request

THAT ... would increase git user engagement by a factor of x>>0

caboteria · on June 26, 2015

Github can make some changes as administrative actions even though there's no UI to do it. For example, I have a project that was originally a fork but became the upstream when the fork went offline. Github was able to "break" the link from my fork to the dead upstream so mine became the upstream.

It's not as easy as DIY but it's just a support request.

IshKebab · on June 26, 2015

I've noticed this too. Loads of github projects have dozens of forks with identical Readmes. The only way to work out the difference between them is to look at the commit messages on the Network tab.

musically_ut · on June 26, 2015

I wrote a Chrome/Firefox extension to address the problem of finding "notable" forks of original repos. These often are community supported version of the original which would have been hard to find otherwise: Lovely Forks ~ https://github.com/musically-ut/github-forks-addon

ryanbrunner · on June 26, 2015

I agree with the idea that "not all forks are considered equal" insofar as GitHub should do a better job of surfacing notable forks, rather than a fork that fixes a small environmental issue specific to a single person, or ones that don't make notable changes at all.

In terms of not elevating the root to special status, I disagree. Recognizing one particular repository as canonical is a feature, not a bug. As much as git itself doesn't place any special significance on a particular repository, the culture of open source development does. Linus Torvalds can certainly say that his Linux repository doesn't hold special status, but that's just not true beyond a technical level. It's useful to be able to say "this is the main, supported version of this library"

MereInterest · on June 26, 2015

It is good to have a canonical version, but the canonical version and the root version may be entirely different. The author brought up this point, talking about a project that he had started, which was later forked. That fork has many more features, and should be considered the canonical version, but isn't.

lsaferite · on June 26, 2015

So, what are your choices to elect the canonical one then?

First project pushed is canonical and can pass the baton?

Seems better to drop the idea of a root or canonical version totally. Linking forked projects together with hashes for the network graph seems like a good idea personally.

amelius · on June 26, 2015

> I want to add a feature or fix a bug but I want to share the changes upstream or with other people that may find it actually useful. I fixed a bug for my own environment. These changes may break everyone else and so no body should probably use this fork. I want to lock a version of a project away in a safe place that I know won’t change or break later and may use it as a point to send changes back up later. (This partially due to some design issues with git submodules.) I want to experiment. My changes are probably interesting but not ready for primetime, but if it works out it maybe come something fruitful.

Isn't that what "branches" are for?

IshKebab · on June 26, 2015

You can't make a branch on a projcet you don't have write access to.

btown · on June 26, 2015

Solution: Just make sure your fork has better SEO than the root! Work hard at promoting your fork, have blogs link to you as the best thing since sliced bread, etc. After all, since SEO is all about grassroots efforts, it's not at all like you're kowtowing to a central entity's policies and trying to work around their ridiculous restrictions. Because that would be counter to the distributed nature of Git and FOSS in general, right?

Oh, wait.

/s