Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Language Trends (redmonk.com)
113 points by spatten on May 4, 2014 | hide | past | favorite | 65 comments



FWIW many projects written in web languages like PHP seem to be treated as javascript by Github because they also have javascript code (which happens to be more significant / larger than the underlying code). It's unfortunate that there is no way to specify the primary language of a project.

The Github language system is also somewhat unpredictable: https://github.com/SheetJS/test_files seems to alternate between AppleScript and Shell with each commit (even if no .scpt or .sh file was changed or added)


Indeed, Github's language detection is absolutely useless, and a manual setting is very much needed. One of my projects is a header-only C++ lib, which is detected as 100% C, another one is a fairly typical C++11 codebase, which is detected as 50% C and 50% C++, third is a Python project but has some big example 3D asset files in three.js format which is then listed as 1.2% Python and 98.8% Javascript. So at least for my github projects, the language classification is completely off.


You can avoid 3rd party libraries being counted in by following some conventions, see the patterns ignored by Github's linguist https://github.com/github/linguist/blob/master/lib/linguist/...


Why not just hash popular files and don't count ones that match?


Until December 2013, if you included non-minified Bootstrap Javascript files in your project, Linguist (the library Github uses to detect languages) would count them towards the Javascript LOC count for your project[0].

I had a number of small web server projects that were incorrectly classified as >95% Javascript for this reason.

Fortunately this was fixed, but I imagine there are a number of other low-hanging fruit that are causing projects to be misclassified as well.

[0] https://github.com/github/linguist/pull/856


If a project mainly consists of JavaScript it should be a JavaScript project, no?


Well, to take an example: I work on the Mozilla Developer Network (MDN). MDN is mostly a wiki, and the underlying codebase for that is all in Python. There's also the Demo Studio, which lets people upload web technology demos, and that's Python too.

But the wiki also has an embedded script-ish language (letting people create macros/templates for specific purposes and re-use them across multiple articles). That's JavaScript-based, and so is implemented in node.js. And there's a WYSIWYG editor for the wiki pages, which is a couple of MDN-specific plugins we've developed, plus off-the-shelf components (the editor + jQuery, which we also use for a few other things on-site).

So MDN is a Python project with a couple JS utilities attached. And if you set up a local copy, you can pretty easily see that Python is by far the heaviest-used language. But due to the way GitHub counts and reports statistics, it shows up as a JavaScript project.

To pick on an easy example: the copy of jQuery we have in our repository weighs in at just over 9,000 lines. So just having jQuery means you need to write over 10k lines of code in order to get the "real" language of your project recognized.


One could argue that jQuery, and other libraries shouldn't be included in the repository proper, instead it could be a submodule or included in a package manager manifest, such as bower.


From the bower docs:

"N.B. If you aren't authoring a package that is intended to be consumed by others (e.g., you're building a web app), you should always check installed packages into source control."


The Node core dev actually went back and forth on this, and came to the conclusion to not commit dependencies to the repo, and introduced "shrinkwrap" [1].

It felt wrong committing dependencies to the repo, which I agree with. I hate my diffs being drowned in external changes, I'd rather see that someone simply upgraded a dependency. Plus it does skew the project, both for what the primary language is, as well as how much a committer is contributing.

I would love to see this eventually in Bower! They have an issue for it [2].

[1] "Why not just check node_modules into git?" http://blog.nodejs.org/2012/02/27/managing-node-js-dependenc... [2] https://github.com/bower/bower/issues/505


I agree. Hovertruck quoted a piece from the Bower documentation, which I can see the reasoning behind, if you're working on a web application then committing the library removes the possibility that the dependencies cannot be resolved because a library has been removed from the package manager. There are benefits to both options, but my personal preference is to keep third party libraries out of the VCS.


:D Just to take this a little further, at my old employment, our Ops wanted the dependencies to be committed to the repo. To make rolling back faster, since `npm install` can take awhile. Also, there was a time NPM register went down wrecking havoc.

But that's shouldn't put weight in either direction. Instead, there's a much deeper issue here: Ops should be building packages for deployment. Whatever that means, RPMs, VMs, Docker images, ... Just separate the concerns. Seal it up, and store it forever and ever, perfectly.

Because even if dependencies are there, safe and happy, most applications have a build process. Which takes time, and could be different on the machine it's building on. Only you can prevent production fires.


No. The way a lot of languages handle 3rd party dependencies is via a manifest file of some kind -- requirements.txt. These get installed on the system when deployed.

JavaScript, however, tends to require (or at the least follow the practice) of having a copy of your JS 3rd party dependencies in your repo.

So you may have 30,000 lines of python dependencies and 20,000 lines of JavaScript dependencies, but only the JS shows up in your project files to get counted.

This isn't even asking the question of if the new code in the project or the total enabling code is what should be counted. If we count total enabling code, do we count the Linux kernel implicitly too?


Most HTML projects would then be "Images". My plain html project is marked as 'CSS', my arduino project (which contains a python file, but no java) is marked as 'Java'; that's 2/5 errors.

The files that get changed/updated are the ones that matter. If most of your diffs are PHP, then you're mostly workign with PHP.


> Violating all expectations and trends, new Java users on GitHub even grew as a percentage of overall new users, while everything else went downhill. This further supports the assertion that GitHub is reaching the enterprise.

This is more likely to be due to the rise of Android since 2009.


I would also expect a few CS students to push their homework to github nowadays, and most homework is Java.


Javascript is really hard to measure I believe, the numbers are always skewed because of 3 things:

1. A lot of repositories include 3rd party libraries.

2. A lot of software includes a web interface, even if the backend language is something else, but the LOC for Javascript can be equal or even higher because of 1.

3. JSON is counted as Javascript sometimes


It would be much more useful to show the absolute numbers. Or provide the raw data for others to do more meaningful graphs.


It depends on what story you are trying to tell... Absolute numbers might not be all that meaningful when you are talking about relative popularity. Or it might be. Data visualization is hard because you typically have to trade information density for comprehensibility.

I assume the data came from http://www.githubarchive.org/


I don't know this is a good indication of trends. As the article (somewhat shallowly) shows, there are stories behind of these graphs.

Ruby has probably just settled to a normal position post early adopter. Shows that Ruby is still strong.

Javascript is probably building the sorts of libraries that other languages already have. These guys have had a lot of work to do.


Yes, Ruby was very much "overrepresented" in the early days of github, so the decline in its growth relative to other languages is fairly normal sounding, rather than indicative of its actual decline.

Also, Javascript is something that is going to get checked in to lots of web projects one way or another. I wonder if they weed out duplicate copies of, say, jQuery.


I wonder if they weed out duplicate copies of, say, jQuery.

These stats have not in my experience been very reliable at all, and tend to massively overcount javascript. I've had a few projects tagged 'javascript' when in fact they were web projects and just had some common js libraries like jquery included (ruby and golang projects).


> I wonder if they weed out duplicate copies of, say, jQuery.

Linguist tries to filter out things like that:

https://github.com/github/linguist/blob/master/lib/linguist/...


"Tries" is the operative word here. There is a documented history of utter incompetence in Linguist.


Likewise for most of GitHub's other libraries. Sundown is some of the worst code I've ever seen.


Ah, it is only on internet forums where someone can argue that a sharply downward trending graph shows that a language is 'still strong'.


"GitHub is a specific community that’s grown very quickly since it launched [writeup]. It was not initially reflective of open source as a whole but rather centered around the Ruby on Rails community"

Github has moved out of a niche over the past five years, and the graph demonstrates that.


Not really. It is universal that statistics can easily misrepresent if one doesn't take a critical look at choice of data and how it is presented. For example in this case the focus on new repos/issues/etc. is suspicious to me. It could very well be that we just see a rise in one-off Javascript uploads to Github here, instead of a comparison of sustained development on mature projects.


This looks like commercial ecosystem (Java, JavaScript) migrating to GitHub rather than hobbyist ecosystems (Ruby, Python, Perl) being on decline.


I think a large part of the rise in Java code is because of a large amount of open source android projects. (applications and mods/forks of the operating system)


say what? Ruby isnt for hobbyists, neither is Python(dont know Perl).But you just wanted to hear yourself say that didnt you?


It isn't entirely untrue. Ruby/Python are more popular in the hobbyist space and less so in the enterprise. Java is the opposite.

It shouldn't be seen as a reflection of the platform.


You have set up a false dichotomy here between enterprise and hobbyist.


Does the scientific computing community count as hobbyist? It seems to be a significant niche for python these days, largely due to numpy.

Also, as another user noted, hobbyist/enterprise is a false dichotomy. There's shedloads of non-hobbyist code that also non-enterprise.


Github should really just enable the developers to specify the language of their repo by themselves. Bitbucket get this right in the first place. Auto detection for language sounds cool, while doesn't work for most of the web project.


Just need to have a dropdown of the primary language like Bitbucket does. Probably have different results.


GitHub is slowly starting to reflect the software world at large, although the true picture is Java and C leading by a huge, huge margin. I don't expect GitHub to ever fully reflect that, as most Java shops, and nearly all C shops, would never host their code on GitHub.


I think it's safe to say that most projects on github are libraries and snippets, not finished products. Not even Ruby shops push all of their actual work to github. But at least there is a culture of sharing libraries, which drives the Ruby % up, and I don't see this happening a lot with C.


The last 2 graphs look like they were drawn by someone who hates colour blind people.


Very interesting graphs, I was surprised that CSS has had a recent uptick but as somebody who has specialized in responsive layout in the past two years I guess that uptick represents my life too.

I can't get enough language statistics on Github! I run 'gitinspector' on my web server to compute language stats in individual git repositories, but one thing I haven't been able to figure out it how to chart the language stats for one git repository over time in a branch.

Does anybody know how you can chart language use over time in one repository?


I have projects (markdown repos) marked as CSS projects for which I haven't written more than 10 lines of CSS. Bootstrap makes it appear so, I guess.


To my mind, the big trend is toward polygot programming, which perhaps reveals what a transitional and perhaps revolutionary time this is in the world of computer programming. This paragraph struck me as the most important:

"Almost every language shows a long-term downhill trend... My initial guess is that users of languages below the top 12 are growing in share to counterbalance the decreases here. It’s also possible that GitHub may leave some users unclassified, which would tend to lower everything else’s proportion over time."


>Language detection is based on lines of code


This is important to note. As a huge ruby fanboy, one of my favorite things about the language are the shortcuts and syntactic sugar that are provided to make things easier.

I saw the decline and got a bit worried, then came here and had those fears assuaged. I do wonder if the LOC difference has anything to do with people getting more familiar with the language and doing more things in less code.


It would be interesting to see which languages are rising. In those stats, everything except Javascript seems to be declining, and the total relative decline is much larger than JS growth, so something must be growing - but what it is?


One of their reasons for the growth of Javascript:

> the JavaScript development philosophy that encourages bundling of dependencies in the same repo as the primary codebase

I wonder how much this is still an issue with the rise of npm, bower and other package managers.


node_modules directory is usually added to .gitignore. I don't think it is usually commited by accident as it results in a huge file list in diffs and git status command.


That's what he's saying. While the author's statement may not necessarily be incorrect, the trend these days is to manage your JS (both server and client side code) with package managers like NPM and Bower.


2010-2011 appears to be the "year of the great inflection point"


What is the cause for the correlation between Java and JavaScript?


Probably random noise. Or else because it kind of tracks the arc of github going mainstream, and Java and Javascript are very mainstream.


Java: Rise of Android ~2009

Javascript: Random noise.


Not sure if it's reasonable to make this call, but it seems the dynamically typed languages have significantly higher percentage of issues overall.


It could be that dynamic languages have a lower barrier of entry, which encourages many more people to fork the code, find bugs, and make pull requests. It's easy for a PHP guy to find and report bugs in a JavaScript project. It's not so easy for a JavaScript guy to find bugs in a C++ project.


I find the complete lack of Go surprising.


Go is a new language so not that surprising. It is also over represented here on HN. You don't see Go jobs in the wild for example.



I do not think Go as something that would go on itself as a job. Instead if you already have a site (like Drupal which is my primary skill) and you want to add a very performant REST interface then Go provides a pleasant way to write one.


I would really question the choice of drupal at first place,or any use of the PHP plateform.


There's still nothing, by far, that would match the versatility of how much can you do from the Drupal UI.


> You don't see Go jobs in the wild..

Do you fellas expect that to change over the next few years? I want to learn a new language, Go seems alluring, though I'd like to put myself on the right side of history.


> I want to learn a new language, Go seems alluring, though I'd like to put myself on the right side of history.

* if you want to learn a new language because you enjoy learning new languages learn Go since you find that interesting

* if you want to learn a new language to make you a better overall developer learn a language different than what you're used to (i.e. if you've only worked with Python learning Ruby won't benefit you as much as learning a functional language like Clojure or a lower level language like C)

* if you want to learn a new language solely (or primarily) for the job prospects (and it sounds like you do) learn one that presently has a large number of openings (like Java) instead of learning a new language and hoping it flourishes (you can always learn Go [or whatever] when/if it becomes widely used)


Except for HN, at the enterprise level, Red Monk main customers, no one cares.


Presumably the big Perl spike is the migration of CPAN modules over to Github en-masse.


In my experience, GitHub's so-called "detection" gets it wrong far more often than right. It's worse than useless -- it's misleading.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: