FWIW many projects written in web languages like PHP seem to be treated as javascript by Github because they also have javascript code (which happens to be more significant / larger than the underlying code). It's unfortunate that there is no way to specify the primary language of a project.
The Github language system is also somewhat unpredictable: https://github.com/SheetJS/test_files seems to alternate between AppleScript and Shell with each commit (even if no .scpt or .sh file was changed or added)
Indeed, Github's language detection is absolutely useless, and a manual setting is very much needed. One of my projects is a header-only C++ lib, which is detected as 100% C, another one is a fairly typical C++11 codebase, which is detected as 50% C and 50% C++, third is a Python project but has some big example 3D asset files in three.js format which is then listed as 1.2% Python and 98.8% Javascript. So at least for my github projects, the language classification is completely off.
Until December 2013, if you included non-minified Bootstrap Javascript files in your project, Linguist (the library Github uses to detect languages) would count them towards the Javascript LOC count for your project[0].
I had a number of small web server projects that were incorrectly classified as >95% Javascript for this reason.
Fortunately this was fixed, but I imagine there are a number of other low-hanging fruit that are causing projects to be misclassified as well.
Well, to take an example: I work on the Mozilla Developer Network (MDN). MDN is mostly a wiki, and the underlying codebase for that is all in Python. There's also the Demo Studio, which lets people upload web technology demos, and that's Python too.
But the wiki also has an embedded script-ish language (letting people create macros/templates for specific purposes and re-use them across multiple articles). That's JavaScript-based, and so is implemented in node.js. And there's a WYSIWYG editor for the wiki pages, which is a couple of MDN-specific plugins we've developed, plus off-the-shelf components (the editor + jQuery, which we also use for a few other things on-site).
So MDN is a Python project with a couple JS utilities attached. And if you set up a local copy, you can pretty easily see that Python is by far the heaviest-used language. But due to the way GitHub counts and reports statistics, it shows up as a JavaScript project.
To pick on an easy example: the copy of jQuery we have in our repository weighs in at just over 9,000 lines. So just having jQuery means you need to write over 10k lines of code in order to get the "real" language of your project recognized.
One could argue that jQuery, and other libraries shouldn't be included in the repository proper, instead it could be a submodule or included in a package manager manifest, such as bower.
"N.B. If you aren't authoring a package that is intended to be consumed by others (e.g., you're building a web app), you should always check installed packages into source control."
The Node core dev actually went back and forth on this, and came to the conclusion to not commit dependencies to the repo, and introduced "shrinkwrap" [1].
It felt wrong committing dependencies to the repo, which I agree with. I hate my diffs being drowned in external changes, I'd rather see that someone simply upgraded a dependency. Plus it does skew the project, both for what the primary language is, as well as how much a committer is contributing.
I would love to see this eventually in Bower! They have an issue for it [2].
I agree. Hovertruck quoted a piece from the Bower documentation, which I can see the reasoning behind, if you're working on a web application then committing the library removes the possibility that the dependencies cannot be resolved because a library has been removed from the package manager. There are benefits to both options, but my personal preference is to keep third party libraries out of the VCS.
:D Just to take this a little further, at my old employment, our Ops wanted the dependencies to be committed to the repo. To make rolling back faster, since `npm install` can take awhile. Also, there was a time NPM register went down wrecking havoc.
But that's shouldn't put weight in either direction. Instead, there's a much deeper issue here: Ops should be building packages for deployment. Whatever that means, RPMs, VMs, Docker images, ... Just separate the concerns. Seal it up, and store it forever and ever, perfectly.
Because even if dependencies are there, safe and happy, most applications have a build process. Which takes time, and could be different on the machine it's building on. Only you can prevent production fires.
No. The way a lot of languages handle 3rd party dependencies is via a manifest file of some kind -- requirements.txt. These get installed on the system when deployed.
JavaScript, however, tends to require (or at the least follow the practice) of having a copy of your JS 3rd party dependencies in your repo.
So you may have 30,000 lines of python dependencies and 20,000 lines of JavaScript dependencies, but only the JS shows up in your project files to get counted.
This isn't even asking the question of if the new code in the project or the total enabling code is what should be counted. If we count total enabling code, do we count the Linux kernel implicitly too?
Most HTML projects would then be "Images". My plain html project is marked as 'CSS', my arduino project (which contains a python file, but no java) is marked as 'Java'; that's 2/5 errors.
The files that get changed/updated are the ones that matter. If most of your diffs are PHP, then you're mostly workign with PHP.
> Violating all expectations and trends, new Java users on GitHub even grew as a percentage of overall new users, while everything else went downhill. This further supports the assertion that GitHub is reaching the enterprise.
This is more likely to be due to the rise of Android since 2009.
Javascript is really hard to measure I believe, the numbers are always skewed because of 3 things:
1. A lot of repositories include 3rd party libraries.
2. A lot of software includes a web interface, even if the backend language is something else, but the LOC for Javascript can be equal or even higher because of 1.
It depends on what story you are trying to tell... Absolute numbers might not be all that meaningful when you are talking about relative popularity. Or it might be. Data visualization is hard because you typically have to trade information density for comprehensibility.
Yes, Ruby was very much "overrepresented" in the early days of github, so the decline in its growth relative to other languages is fairly normal sounding, rather than indicative of its actual decline.
Also, Javascript is something that is going to get checked in to lots of web projects one way or another. I wonder if they weed out duplicate copies of, say, jQuery.
I wonder if they weed out duplicate copies of, say, jQuery.
These stats have not in my experience been very reliable at all, and tend to massively overcount javascript. I've had a few projects tagged 'javascript' when in fact they were web projects and just had some common js libraries like jquery included (ruby and golang projects).
"GitHub is a specific community that’s grown very quickly since it launched [writeup]. It was not initially reflective of open source as a whole but rather centered around the Ruby on Rails community"
Github has moved out of a niche over the past five years, and the graph demonstrates that.
Not really. It is universal that statistics can easily misrepresent if one doesn't take a critical look at choice of data and how it is presented. For example in this case the focus on new repos/issues/etc. is suspicious to me. It could very well be that we just see a rise in one-off Javascript uploads to Github here, instead of a comparison of sustained development on mature projects.
I think a large part of the rise in Java code is because of a large amount of open source android projects.
(applications and mods/forks of the operating system)
Github should really just enable the developers to specify the language of their repo by themselves. Bitbucket get this right in the first place. Auto detection for language sounds cool, while doesn't work for most of the web project.
GitHub is slowly starting to reflect the software world at large, although the true picture is Java and C leading by a huge, huge margin. I don't expect GitHub to ever fully reflect that, as most Java shops, and nearly all C shops, would never host their code on GitHub.
I think it's safe to say that most projects on github are libraries and snippets, not finished products. Not even Ruby shops push all of their actual work to github. But at least there is a culture of sharing libraries, which drives the Ruby % up, and I don't see this happening a lot with C.
Very interesting graphs, I was surprised that CSS has had a recent uptick but as somebody who has specialized in responsive layout in the past two years I guess that uptick represents my life too.
I can't get enough language statistics on Github! I run 'gitinspector' on my web server to compute language stats in individual git repositories, but one thing I haven't been able to figure out it how to chart the language stats for one git repository over time in a branch.
Does anybody know how you can chart language use over time in one repository?
To my mind, the big trend is toward polygot programming, which perhaps reveals what a transitional and perhaps revolutionary time this is in the world of computer programming. This paragraph struck me as the most important:
"Almost every language shows a long-term downhill trend... My initial guess is that users of languages below the top 12 are growing in share to counterbalance the decreases here. It’s also possible that GitHub may leave some users unclassified, which would tend to lower everything else’s proportion over time."
This is important to note. As a huge ruby fanboy, one of my favorite things about the language are the shortcuts and syntactic sugar that are provided to make things easier.
I saw the decline and got a bit worried, then came here and had those fears assuaged. I do wonder if the LOC difference has anything to do with people getting more familiar with the language and doing more things in less code.
It would be interesting to see which languages are rising. In those stats, everything except Javascript seems to be declining, and the total relative decline is much larger than JS growth, so something must be growing - but what it is?
node_modules directory is usually added to .gitignore.
I don't think it is usually commited by accident as it results in a huge file list in diffs and git status command.
That's what he's saying. While the author's statement may not necessarily be incorrect, the trend these days is to manage your JS (both server and client side code) with package managers like NPM and Bower.
It could be that dynamic languages have a lower barrier of entry, which encourages many more people to fork the code, find bugs, and make pull requests. It's easy for a PHP guy to find and report bugs in a JavaScript project. It's not so easy for a JavaScript guy to find bugs in a C++ project.
I do not think Go as something that would go on itself as a job. Instead if you already have a site (like Drupal which is my primary skill) and you want to add a very performant REST interface then Go provides a pleasant way to write one.
Do you fellas expect that to change over the next few years? I want to learn a new language, Go seems alluring, though I'd like to put myself on the right side of history.
> I want to learn a new language, Go seems alluring, though I'd like to put myself on the right side of history.
* if you want to learn a new language because you enjoy learning new languages learn Go since you find that interesting
* if you want to learn a new language to make you a better overall developer learn a language different than what you're used to (i.e. if you've only worked with Python learning Ruby won't benefit you as much as learning a functional language like Clojure or a lower level language like C)
* if you want to learn a new language solely (or primarily) for the job prospects (and it sounds like you do) learn one that presently has a large number of openings (like Java) instead of learning a new language and hoping it flourishes (you can always learn Go [or whatever] when/if it becomes widely used)
The Github language system is also somewhat unpredictable: https://github.com/SheetJS/test_files seems to alternate between AppleScript and Shell with each commit (even if no .scpt or .sh file was changed or added)