GitHub has 11,995,200 open issues

holman · on April 27, 2016

Fun to see this up here. Originally when I built this there was some discussion on whether it made any sense to show a global view; most people are just going to use the issues dashboard to look at their own issues, obviously. Honestly, I just thought it was cool to have filters across the whole site, so I left it in (which was only an option given how quick it was to calculate and return these results in Elasticsearch — that's also part of the reason the numbers are fluctuating a bit, as some have pointed out here).

Still wish more people knew about this dashboard view into Issues. Even though it's now a prominent link in the header, I don't think the page got to be something I was really happy with — most of the work was done in the final week before we shipped Issues, so it was somewhat an afterthought. There's a ton of power in there, but it's hidden away behind an arcane syntax that I, the creator of the damn thing, can't really remember at this point, two years later, ha. Still dig the overall motivation behind the page, though!

willvarfar · on April 27, 2016

Issues could be integrated much better with forks.

If I have a fork of something, I should see not just issues people post in my repo, but issues people post in other forks.

These are differentiated visually, and perhaps don't trigger notifications.

When someone fixes the issue in another fork, I should see a 'patch pending' kind of thing, and get a notification.

robbiemitchell · on April 27, 2016

This might be another reason why Holman thinks branches are better than forks.

cf. https://twitter.com/holman/status/661365207143333888, https://twitter.com/holman/status/661357354827448321

pavel_lishin · on April 27, 2016

He seems to think that this is only the case for organizations, not open source projects: https://twitter.com/holman/status/661384740927242240

tanepiper · on April 27, 2016

The real question here is not why this is here, but why "explore" was removed (I'm aware I can just go to /explore however and if it is somewhere, it's not clear where). Github is still hard for project discovery.

parhamn · on April 27, 2016

Yeah they hid it under your user icon when you log in, which is bizarre since exploring is one of the best ways to be 'social' on github.

nathan-muir · on April 27, 2016

> Still wish more people knew about this dashboard view into Issues

I only discovered the dashboard recently after using github for years!

Having the overview was actually the missing piece in getting control of my workload & priorities. -- So Thank you!

shoyer · on April 27, 2016

Sorted by +1s: https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Areac...

Here are the top three:

1. Contribution graph can be harmful to contributors (https://github.com/isaacs/github/issues/627)

2. proposal: generic programming facilities (https://github.com/golang/go/issues/15292)

3. Proper tabs for open files (https://github.com/Microsoft/vscode/issues/224)

Can't say I'm terribly surprised!

limelight · on April 27, 2016

Keep in mind that has a heavy bias towards newer issues (since +1s are new).

Here are the most commented/contentious issues: https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Acomm...

ohitsdom · on April 27, 2016

What's going on with all the repeated comments for those top ones? I'm not a frequent user of Github issues, is it something automated perhaps gone wrong?

limelight · on April 27, 2016

It looks like some projects have infrastructure set up to automatically log runtime errors to GitHub issues.

ekns · on April 27, 2016

Funny how it says there's 479830 pages worth of results and even has a link to that "last page".

I checked and in practice it "only" returns the first 400 pages of results.

g_sch · on April 27, 2016

Isn't this because deep pagination is really costly? I've seen other production systems (especially using elasticsearch) where this limit is in place.

From Elastic:

>Deep Paging in Distributed Systems

>To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the coordinating node, which then sorts all 50 results in order to select the overall top 10.

>Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The coordinating node then sorts through all 50,050 results and discards 50,040 of them!

>You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.

https://www.elastic.co/guide/en/elasticsearch/guide/current/...

stephengillie · on April 27, 2016

That sounds like an issue. I wonder if someone's reported it...

smonte · on April 27, 2016

One more error to be reported in Danny Tuppeny's post from today http://blog.dantup.com/2016/04/have-software-developers-give...

ashmud · on April 27, 2016

The primary product I work on limits (on-screen) results to 10k results (after any filtering) regardless of page size. Aside: However, this was based on our most complicated queries at the time, which have since been simplified.

gyhchang · on April 27, 2016

The first link 404's for me, is it working for everyone else?

_lflx · on April 27, 2016

I had to login and then click the link.

kuschku · on April 27, 2016

And another case of HTTP Error codes being misused. Kinda annoying by now.

s4chin · on April 27, 2016

All links work for me. I think you need to login.

siddhant · on April 27, 2016

Probably off topic, but how do you get such a sort to work nicely (sorting on count from another table), without creating redundant data (in this case, maintaining the +1 count in the main issue table as well as maintaining a separate table for +1s), or just delegating the sort to an external search service?

takno · on April 27, 2016

There's no obvious reason why it would be particularly slow to get counts from another table. You're probably going to want to offer full text searches and searches on columns you wouldn't choose to index online as well though, at which point using a search service makes most sense

scrollaway · on April 27, 2016

It also seems impossible to get beyond page 400 on either of those results.

https://github.com/issues?page=400&q=is%3Aopen+is%3Aissue&ut...

lunactic · on April 27, 2016

Fun that the first one is also the one with the most -1s

jank66 · on April 27, 2016

6,6 Mio open issues created by GoogleCodeExporter. https://github.com/issues?q=is%3Aissue+is%3Aopen+author%3AGo...

nightpool · on April 27, 2016

Wow, that's kinda crazy. Maybe the title should be "over half of open issues on Github created by GoogleCodeExporter" :P

BtM909 · on April 27, 2016

It makes sense that this will only grow if people will export their (remote) repositories to Github. ;-)

butu5 · on April 27, 2016

Even though there are several open issues in github, how can someone with little development experience or newbie can start contributing.

On asking this question, many may suggest that first we should use the particular piece of code in own project and contribute on that project by raising issues or fixing them. As a beginner, people may start using very popular frameworks like Ruby on Rails or Node.js. Considering it's complexity or maturity, it's extremely difficult if not impossible to start contributing.

I am thinking, somewhere down the line, there is some form of hand holding or mentor ship needed. Where mentor give small task, help in giving some tips or advice, review the first pull request etc. This will definitely boost contribution to opensource projects.

There may be several people providing mentor ship. But I feel it's not structured, how a newbie knows there exist someone who is willing to help. Only way I can think of now is to spam lot of people randomly by looking at their github profiles.

Please suggest how to encouraging new developer to contribute more to opensource and help closing the open issues.

asimuvPR · on April 27, 2016

It's hard for experienced people, too. The issue is more about the lack of structure in some open source projects and the time availability to teach "noobs" a codebase. One thing that has worked for me in t he past is to join the development mailing list, try and understand what they are talking about and go look at the code to try and figure out the issue. Then trace back all the discussion to try and find if any of my questions/suggestions have been proposed. If not then I make a very simple case for the solution. If yes then I keep quiet and only comment when things need clarification. Slowly you will pick up the project and be able to contribute.

If the project lacks any kind of communication channels and is hosted on some online repo then by all means open an issue and ask about contributing. Make sure to ask about what are the most important issues ton fix and which are the smaller ones but most annoying ones. Offer yourself to document the project too.

It's not easy but it is fulfilling once you get underway.

innerspirit · on April 27, 2016

http://up-for-grabs.net/ attempts to make contributing easier for new developers but it still falls flat IMO, there are few really bite-sized issues you can tackle and even those are going to require you to read a lot of project code and discussion to figure them out.

50CNT · on April 27, 2016

Django has a django-core-mentorship mailing list[0] for people interested in starting to contribute, a guide on contributing[1] and a selection of issues tagged as easy-pickings[2] that are suitable for beginners to work on.

I haven't personally tried it, but I did think it was cool when I stumbled over it.

[0][https://docs.djangoproject.com/en/dev/internals/mailing-list...] [1][https://docs.djangoproject.com/en/dev/internals/contributing...] [2][https://code.djangoproject.com/query?status=!closed&easy=1]

jsmeaton · on April 27, 2016

I don't think that list is very active unfortunately. Also the easy pickings list has been mostly completed which doesn't leave a whole lot of room for newbies to contribute.

Funnily enough, having Tim (a paid contributor, also Core dev) do so much of the community work means there is less low hanging fruit for new contributors to get stuck in to.

lewispollard · on April 27, 2016

IRC is often your best bet. If you find a project you'd like to contribute to, see if they have an IRC channel, there will always be regulars there who have a lot of experience with the projects and will almost definitely have advice to give to beginners wanting to contribute.

peruvian · on April 27, 2016

Node is extremely friendly to new developers and has labels for "good for beginner" issues as well as a community very passionate about helping others. You should give it a try before giving up.

BTW, contributions can mean documentation or website markup. You probably won't fix a major bug right off the bat.

johndoe90 · on April 27, 2016

I wonder if one day GitHub will announce the World's Issue Closing Day. The day every programmer will try hard to close their issues. Though, isn't it what we do every day?

labster · on April 27, 2016

GitHub didn't announce it, but I'm a fan of Bit Rot Thursday: http://blogs.perl.org/users/zoffix_znet/2016/01/bit-rot-thur...

pavel_lishin · on April 27, 2016

Tech Debt Thursday would be a little more alliterative.

labster · on April 27, 2016

/t/, /d/, and /θ/ are all pretty close, but I've never thought of it as alliteration. But I suppose it does count. Thanks for expanding my literary toolbox.

daw___ · on April 27, 2016

55,272 of those are marked as "help wanted" -- feeling bored?

https://github.com/issues?utf8=&q=is%3Aopen+is%3Aissue+label...

pavel_lishin · on April 27, 2016

Can this be filtered by primary project language? I'm not going to be much help to anyone whose project is mostly C or Ruby.

r3bl · on April 27, 2016

Just add something like language:Python and you're good to go. :)

Mikushi · on April 27, 2016

I am indeed a bit bored, how do you filter by "help wanted" not too clear on this interface.

daw___ · on April 27, 2016

You can use the "label" filter:

https://github.com/issues?utf8=&q=is%3Aopen+is%3Aissue+label...

Remember to wrap in quotes multi word labels.

kristopolous · on April 27, 2016

Hah, after this commit, it'll make it 11,995,199 baby!

Frenchgeek · on April 27, 2016

Until someone smarter than you discover it actually make it 11,995,205 at least...

sleepychu · on April 27, 2016

They could be closing an issue.

mryan · on April 27, 2016

I think Frenchgeek's joke was that closing one issue can introduce new bugs, resulting in a net increase of issues.

masklinn · on April 27, 2016

Don't you mean 11,995,236?

M4v3R · on April 27, 2016

It has even more (20M) closed issues, which is a sign that on average the OS community is healthy and active :).

ptman · on April 27, 2016

"Closed, works for me"

ashmud · on April 27, 2016

Even worse: Simply, "Closed." No reason given. Had to hop on IRC dev channel to find out.

kek918 · on April 27, 2016

For a second there I thought GitHub itself had 12 million internal issues

schneems · on April 27, 2016

A bit late to the party. I find that many maintainers are left with a mountain of issues and very few eyeballs to help process them. I made a tool that helps others get involved with your open source projects to, hopefully, help keep your issue count manageable. Check it out: https://www.codetriage.com

jbergknoff · on April 27, 2016

Why is the default issue filter "is:open"? When I have an issue with a project, I never want to restrict focus to open issues. In fact, I'd much rather land on a closed issue where it turns out the issue was recently fixed, or there is a workaround, a better approach, etc.

jhgg · on April 27, 2016

Interestingly enough, when refreshing the count of closed varies wildly, and when looking at closed issues, the count of open varies wildly +/- a few million. I wonder what causes that.

aiiane · on April 27, 2016

My guess? They're giving an estimation based on talking to a few shards of a much larger sharded system rather than trying to actually get canonical results for every shard - since it's unlikely that you need a precise count across that many repositories (which would be really expensive to calculate in real time).

ryanlol · on April 27, 2016

404 already

Edit: not sure what makes this comment so controversial (at least 5 downvotes already) , the link does indeed 404 if you aren't logged in.

Kristine1975 · on April 27, 2016

GitHub now has 11,995,201 open issues.

sachkris · on April 27, 2016

It is not 404. You need to log in.

Washuu · on April 27, 2016

That is a terrible design by them. It should be 403 Forbidden.

sachkris · on April 27, 2016

No, 403 implies the resource is unavailable even after authorization. 401 Unauthorized maybe the right one here.

Matt3o12_ · on April 27, 2016

Giving a 401 indicates that there might be a resource, though, which can also be harmful.

It is fairly common to return a 404 to unauthorized users (or users with not enough permission) so you don't give away meta information. Granted, for the public search, it should return an appropriate error code but they should not do that for private repositories. Thus it think it is fair to assume that they have a policy: if user/guest does not have sufficient permission, always return an error 404.

shangxiao · on April 27, 2016

It's a pattern to prevent information leakage

danneu · on April 27, 2016

That makes sense for endpoints like /admin, but it's more confusing than it's worth for users when the endpoint is otherwise rather public. Well, just see this comment thread.

As an example, in this case with the /issues page, redirecting to `/login?redirect-to=/issues` would be more user-friendly since it signals that the page exists but you must authenticate.

Washuu · on April 27, 2016

I assume to prevent exposing the names of private repositories, correct? For the main(global) search page it would seem reasonable easy to just omit that from the search results.

stephengillie · on April 27, 2016

This way it can't be brute-scraped either.

dayta · on April 27, 2016

GitHub returns a 404 when you're not logged in, so ryanlol's statement is correct.

Just try it, before claiming it is not

curl -I https://github.com/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Ais...

anonfunction · on April 27, 2016

This is a nice query[1] to view all open issues for your org's private repos:

https://github.com/issues?q=is%3Aopen+is%3Aissue+is%3Aprivat...

aurelien · on April 27, 2016

I love github ... but sometimes it also contains incredible idiocy stuff: The most commented stuff of nothing! +16000 comment of wind -> https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Acomm... https://github.com/peej/to.uri.st/issues/128

stephengillie · on April 27, 2016

This number sounds like the number of unread emails in some inboxes. Some have embraced Inbox Zero - is there a similar movement for issues, something like "Bug Zero"?

tonyedgecombe · on April 27, 2016

I've had a policy of no known bugs for a long time, no matter how trivial they are. I'm lucky though in that I don't have a manager sitting over me measuring my rate of feature creation.

jimmytidey · on April 27, 2016

So if creating Wikipedia took 100 million hours, closing the worlds GitHub issues might be a task about one order of magnitude smaller than creating Wikipedia...

tommorris · on April 27, 2016

Once done with that, Wikipedia kinda has a backlog too...

https://en.wikipedia.org/wiki/Wikipedia:Backlog

anrao_arao · on April 27, 2016

https://github.com/wting/autojump/issues/353 - Yay I'm mentioned in one of the github open issues (Which actually isn't a issue anymore). Wonder how many such open issues are present, which are worth closing!

nikolay · on April 27, 2016

So, roughly a third of all issues are open. I think it would be nice if GitHub create a daily/weekly/monthly/annual "State of the Hub" kind of analysis for the entire ecosystem with drill downs and stuff.

tonyedgecombe · on April 27, 2016

20,928,924 closed so we must be doing something right.

yev · on April 27, 2016

Interesting to see that whenever one refreshes the page - the number changes.

Curious to see issue-per-minute value :D

dragthor · on April 27, 2016

I wonder what the percentage is for "actual" issues?

I see a lot of support & pilot error questions.

babo · on April 27, 2016

Pretty misleading title, all open Github projects has that many issues altogether.

musicalentropy · on April 27, 2016

I love this one : "question1-what did you do in the past two years?"

petrey · on April 27, 2016

On the other hand there's approximately 21 Million Closed Issues.

tbolt · on April 27, 2016

Got 11,995,200 problems but a repo aint one

therealmarv · on April 27, 2016

Github itself has not that much open issues (nobody would use it) ;)

striking · on April 27, 2016

You can only see the first 400 pages, unfortunately :(

chrisfosterelli · on April 27, 2016

I find it particularly odd how the number of results in the top right corner changes depending on what page you're on, as well.

diimdeep · on April 27, 2016

And it's growing

floordaemon · on April 27, 2016

There is nothing at the URL specified. Its 404.

What did we miss?

githubSearchSUK · on April 27, 2016

It's a SHAME Github is trying to protect its search results.

I am often left in front of this situation when hunting for code using advanced search parameters -- they are preventing people from searching efficiently.

Does anyone know what is their motivation behind this?

holman · on April 27, 2016

Not really sure what you're getting at, but I'm assuming you mean searching for specific syntax or language aspects.

GitHub's definitely not "protecting" shit; it's just that search is a hard problem, and searching code is a really hard problem, at least at the scale they're at. They're running one of the largest Elasticsearch clusters in the world, and a lot of significant things in code are stop words (or not words at all) in most search databases. Not to mention you need to invalidate entire repo indexes when you force push, etc. It just takes a lot of resources, and like anything, will get better over time.

githubSearchSUK · on April 28, 2016

I was under the impression that since the page returned 404 after being posted here, they removed the ability to search using these filters, at the very broad range it was used at.

Now the page is back and I'm not sure what to make of it.

lifeisstillgood · on April 27, 2016

It's not going to be an easy job to be fair - I also find the search frustrating - I would appreciate the creation of an overarching (elasticsearch?) index across all their stores but I would quake at implementing it.

It's a frustrating thankless task to do it of course, but looking for a competitive moat - that will make gitlab and Atlassian quake.