Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GitHub has 11,995,200 open issues (github.com/issues)
262 points by chris-at on April 27, 2016 | hide | past | favorite | 91 comments


Fun to see this up here. Originally when I built this there was some discussion on whether it made any sense to show a global view; most people are just going to use the issues dashboard to look at their own issues, obviously. Honestly, I just thought it was cool to have filters across the whole site, so I left it in (which was only an option given how quick it was to calculate and return these results in Elasticsearch — that's also part of the reason the numbers are fluctuating a bit, as some have pointed out here).

Still wish more people knew about this dashboard view into Issues. Even though it's now a prominent link in the header, I don't think the page got to be something I was really happy with — most of the work was done in the final week before we shipped Issues, so it was somewhat an afterthought. There's a ton of power in there, but it's hidden away behind an arcane syntax that I, the creator of the damn thing, can't really remember at this point, two years later, ha. Still dig the overall motivation behind the page, though!


Issues could be integrated much better with forks.

If I have a fork of something, I should see not just issues people post in my repo, but issues people post in other forks.

These are differentiated visually, and perhaps don't trigger notifications.

When someone fixes the issue in another fork, I should see a 'patch pending' kind of thing, and get a notification.


This might be another reason why Holman thinks branches are better than forks.

cf. https://twitter.com/holman/status/661365207143333888, https://twitter.com/holman/status/661357354827448321


He seems to think that this is only the case for organizations, not open source projects: https://twitter.com/holman/status/661384740927242240


The real question here is not why this is here, but why "explore" was removed (I'm aware I can just go to /explore however and if it is somewhere, it's not clear where). Github is still hard for project discovery.


Yeah they hid it under your user icon when you log in, which is bizarre since exploring is one of the best ways to be 'social' on github.


> Still wish more people knew about this dashboard view into Issues

I only discovered the dashboard recently after using github for years!

Having the overview was actually the missing piece in getting control of my workload & priorities. -- So Thank you!


Sorted by +1s: https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Areac...

Here are the top three:

1. Contribution graph can be harmful to contributors (https://github.com/isaacs/github/issues/627)

2. proposal: generic programming facilities (https://github.com/golang/go/issues/15292)

3. Proper tabs for open files (https://github.com/Microsoft/vscode/issues/224)

Can't say I'm terribly surprised!


Keep in mind that has a heavy bias towards newer issues (since +1s are new).

Here are the most commented/contentious issues: https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Acomm...


What's going on with all the repeated comments for those top ones? I'm not a frequent user of Github issues, is it something automated perhaps gone wrong?


It looks like some projects have infrastructure set up to automatically log runtime errors to GitHub issues.


Funny how it says there's 479830 pages worth of results and even has a link to that "last page".

I checked and in practice it "only" returns the first 400 pages of results.


Isn't this because deep pagination is really costly? I've seen other production systems (especially using elasticsearch) where this limit is in place.

From Elastic:

>Deep Paging in Distributed Systems

>To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the coordinating node, which then sorts all 50 results in order to select the overall top 10.

>Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The coordinating node then sorts through all 50,050 results and discards 50,040 of them!

>You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.

https://www.elastic.co/guide/en/elasticsearch/guide/current/...


That sounds like an issue. I wonder if someone's reported it...


One more error to be reported in Danny Tuppeny's post from today http://blog.dantup.com/2016/04/have-software-developers-give...


The primary product I work on limits (on-screen) results to 10k results (after any filtering) regardless of page size. Aside: However, this was based on our most complicated queries at the time, which have since been simplified.


The first link 404's for me, is it working for everyone else?


I had to login and then click the link.


And another case of HTTP Error codes being misused. Kinda annoying by now.


All links work for me. I think you need to login.


Probably off topic, but how do you get such a sort to work nicely (sorting on count from another table), without creating redundant data (in this case, maintaining the +1 count in the main issue table as well as maintaining a separate table for +1s), or just delegating the sort to an external search service?


There's no obvious reason why it would be particularly slow to get counts from another table. You're probably going to want to offer full text searches and searches on columns you wouldn't choose to index online as well though, at which point using a search service makes most sense


It also seems impossible to get beyond page 400 on either of those results.

https://github.com/issues?page=400&q=is%3Aopen+is%3Aissue&ut...


Fun that the first one is also the one with the most -1s


6,6 Mio open issues created by GoogleCodeExporter. https://github.com/issues?q=is%3Aissue+is%3Aopen+author%3AGo...


Wow, that's kinda crazy. Maybe the title should be "over half of open issues on Github created by GoogleCodeExporter" :P


It makes sense that this will only grow if people will export their (remote) repositories to Github. ;-)


Even though there are several open issues in github, how can someone with little development experience or newbie can start contributing.

On asking this question, many may suggest that first we should use the particular piece of code in own project and contribute on that project by raising issues or fixing them. As a beginner, people may start using very popular frameworks like Ruby on Rails or Node.js. Considering it's complexity or maturity, it's extremely difficult if not impossible to start contributing.

I am thinking, somewhere down the line, there is some form of hand holding or mentor ship needed. Where mentor give small task, help in giving some tips or advice, review the first pull request etc. This will definitely boost contribution to opensource projects.

There may be several people providing mentor ship. But I feel it's not structured, how a newbie knows there exist someone who is willing to help. Only way I can think of now is to spam lot of people randomly by looking at their github profiles.

Please suggest how to encouraging new developer to contribute more to opensource and help closing the open issues.


It's hard for experienced people, too. The issue is more about the lack of structure in some open source projects and the time availability to teach "noobs" a codebase. One thing that has worked for me in t he past is to join the development mailing list, try and understand what they are talking about and go look at the code to try and figure out the issue. Then trace back all the discussion to try and find if any of my questions/suggestions have been proposed. If not then I make a very simple case for the solution. If yes then I keep quiet and only comment when things need clarification. Slowly you will pick up the project and be able to contribute.

If the project lacks any kind of communication channels and is hosted on some online repo then by all means open an issue and ask about contributing. Make sure to ask about what are the most important issues ton fix and which are the smaller ones but most annoying ones. Offer yourself to document the project too.

It's not easy but it is fulfilling once you get underway.


http://up-for-grabs.net/ attempts to make contributing easier for new developers but it still falls flat IMO, there are few really bite-sized issues you can tackle and even those are going to require you to read a lot of project code and discussion to figure them out.


Django has a django-core-mentorship mailing list[0] for people interested in starting to contribute, a guide on contributing[1] and a selection of issues tagged as easy-pickings[2] that are suitable for beginners to work on.

I haven't personally tried it, but I did think it was cool when I stumbled over it.

[0][https://docs.djangoproject.com/en/dev/internals/mailing-list...] [1][https://docs.djangoproject.com/en/dev/internals/contributing...] [2][https://code.djangoproject.com/query?status=!closed&easy=1]


I don't think that list is very active unfortunately. Also the easy pickings list has been mostly completed which doesn't leave a whole lot of room for newbies to contribute.

Funnily enough, having Tim (a paid contributor, also Core dev) do so much of the community work means there is less low hanging fruit for new contributors to get stuck in to.


IRC is often your best bet. If you find a project you'd like to contribute to, see if they have an IRC channel, there will always be regulars there who have a lot of experience with the projects and will almost definitely have advice to give to beginners wanting to contribute.


Node is extremely friendly to new developers and has labels for "good for beginner" issues as well as a community very passionate about helping others. You should give it a try before giving up.

BTW, contributions can mean documentation or website markup. You probably won't fix a major bug right off the bat.


I wonder if one day GitHub will announce the World's Issue Closing Day. The day every programmer will try hard to close their issues. Though, isn't it what we do every day?


GitHub didn't announce it, but I'm a fan of Bit Rot Thursday: http://blogs.perl.org/users/zoffix_znet/2016/01/bit-rot-thur...


Tech Debt Thursday would be a little more alliterative.


/t/, /d/, and /θ/ are all pretty close, but I've never thought of it as alliteration. But I suppose it does count. Thanks for expanding my literary toolbox.


55,272 of those are marked as "help wanted" -- feeling bored?

https://github.com/issues?utf8=&q=is%3Aopen+is%3Aissue+label...


Can this be filtered by primary project language? I'm not going to be much help to anyone whose project is mostly C or Ruby.


Just add something like language:Python and you're good to go. :)


I am indeed a bit bored, how do you filter by "help wanted" not too clear on this interface.


You can use the "label" filter:

https://github.com/issues?utf8=&q=is%3Aopen+is%3Aissue+label...

Remember to wrap in quotes multi word labels.


Hah, after this commit, it'll make it 11,995,199 baby!


Until someone smarter than you discover it actually make it 11,995,205 at least...


They could be closing an issue.


I think Frenchgeek's joke was that closing one issue can introduce new bugs, resulting in a net increase of issues.


Don't you mean 11,995,236?


It has even more (20M) closed issues, which is a sign that on average the OS community is healthy and active :).


"Closed, works for me"


Even worse: Simply, "Closed." No reason given. Had to hop on IRC dev channel to find out.


For a second there I thought GitHub itself had 12 million internal issues


A bit late to the party. I find that many maintainers are left with a mountain of issues and very few eyeballs to help process them. I made a tool that helps others get involved with your open source projects to, hopefully, help keep your issue count manageable. Check it out: https://www.codetriage.com


Why is the default issue filter "is:open"? When I have an issue with a project, I never want to restrict focus to open issues. In fact, I'd much rather land on a closed issue where it turns out the issue was recently fixed, or there is a workaround, a better approach, etc.


Interestingly enough, when refreshing the count of closed varies wildly, and when looking at closed issues, the count of open varies wildly +/- a few million. I wonder what causes that.


My guess? They're giving an estimation based on talking to a few shards of a much larger sharded system rather than trying to actually get canonical results for every shard - since it's unlikely that you need a precise count across that many repositories (which would be really expensive to calculate in real time).


404 already

Edit: not sure what makes this comment so controversial (at least 5 downvotes already) , the link does indeed 404 if you aren't logged in.


GitHub now has 11,995,201 open issues.


It is not 404. You need to log in.


That is a terrible design by them. It should be 403 Forbidden.


No, 403 implies the resource is unavailable even after authorization. 401 Unauthorized maybe the right one here.


Giving a 401 indicates that there might be a resource, though, which can also be harmful.

It is fairly common to return a 404 to unauthorized users (or users with not enough permission) so you don't give away meta information. Granted, for the public search, it should return an appropriate error code but they should not do that for private repositories. Thus it think it is fair to assume that they have a policy: if user/guest does not have sufficient permission, always return an error 404.


It's a pattern to prevent information leakage


That makes sense for endpoints like /admin, but it's more confusing than it's worth for users when the endpoint is otherwise rather public. Well, just see this comment thread.

As an example, in this case with the /issues page, redirecting to `/login?redirect-to=/issues` would be more user-friendly since it signals that the page exists but you must authenticate.


I assume to prevent exposing the names of private repositories, correct? For the main(global) search page it would seem reasonable easy to just omit that from the search results.


This way it can't be brute-scraped either.


GitHub returns a 404 when you're not logged in, so ryanlol's statement is correct.

Just try it, before claiming it is not

curl -I https://github.com/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Ais...


This is a nice query[1] to view all open issues for your org's private repos:

https://github.com/issues?q=is%3Aopen+is%3Aissue+is%3Aprivat...


I love github ... but sometimes it also contains incredible idiocy stuff: The most commented stuff of nothing! +16000 comment of wind -> https://github.com/issues?q=is%3Aopen+is%3Aissue+sort%3Acomm... https://github.com/peej/to.uri.st/issues/128


This number sounds like the number of unread emails in some inboxes. Some have embraced Inbox Zero - is there a similar movement for issues, something like "Bug Zero"?


I've had a policy of no known bugs for a long time, no matter how trivial they are. I'm lucky though in that I don't have a manager sitting over me measuring my rate of feature creation.


So if creating Wikipedia took 100 million hours, closing the worlds GitHub issues might be a task about one order of magnitude smaller than creating Wikipedia...


Once done with that, Wikipedia kinda has a backlog too...

https://en.wikipedia.org/wiki/Wikipedia:Backlog


https://github.com/wting/autojump/issues/353 - Yay I'm mentioned in one of the github open issues (Which actually isn't a issue anymore). Wonder how many such open issues are present, which are worth closing!


So, roughly a third of all issues are open. I think it would be nice if GitHub create a daily/weekly/monthly/annual "State of the Hub" kind of analysis for the entire ecosystem with drill downs and stuff.


20,928,924 closed so we must be doing something right.


Interesting to see that whenever one refreshes the page - the number changes.

Curious to see issue-per-minute value :D


I wonder what the percentage is for "actual" issues?

I see a lot of support & pilot error questions.


Pretty misleading title, all open Github projects has that many issues altogether.


I love this one : "question1-what did you do in the past two years?"


On the other hand there's approximately 21 Million Closed Issues.


Got 11,995,200 problems but a repo aint one


Github itself has not that much open issues (nobody would use it) ;)


You can only see the first 400 pages, unfortunately :(


I find it particularly odd how the number of results in the top right corner changes depending on what page you're on, as well.


And it's growing


There is nothing at the URL specified. Its 404.

What did we miss?


It's a SHAME Github is trying to protect its search results.

I am often left in front of this situation when hunting for code using advanced search parameters -- they are preventing people from searching efficiently.

Does anyone know what is their motivation behind this?


Not really sure what you're getting at, but I'm assuming you mean searching for specific syntax or language aspects.

GitHub's definitely not "protecting" shit; it's just that search is a hard problem, and searching code is a really hard problem, at least at the scale they're at. They're running one of the largest Elasticsearch clusters in the world, and a lot of significant things in code are stop words (or not words at all) in most search databases. Not to mention you need to invalidate entire repo indexes when you force push, etc. It just takes a lot of resources, and like anything, will get better over time.


I was under the impression that since the page returned 404 after being posted here, they removed the ability to search using these filters, at the very broad range it was used at.

Now the page is back and I'm not sure what to make of it.


It's not going to be an easy job to be fair - I also find the search frustrating - I would appreciate the creation of an overarching (elasticsearch?) index across all their stores but I would quake at implementing it.

It's a frustrating thankless task to do it of course, but looking for a competitive moat - that will make gitlab and Atlassian quake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: