Mods- put a (2006) in there please. The number-crunching ecosystem has changed drastically in 10 years. He also didn't mention what compiler he's using. Intel's ICC had a huge advantage over gcc 2.95.3 (or whatever was used in '06) especially in numerical methods. (Though, I'm not sure when Intel's MKL became semi-freeware; at which point there was a significant performance jump if one opted to use the lib appropriately in tandem with gcc.)
This guy also could have at least linked to a repo of his code... though none of that fancy github stuff was around, Sourceforge and CVSweb was around back then. Without the source we can't really accurately judge, well, much of anything. Compiler flags, JVM tunes (was it warm?), source code, data sets, or hell, even mention of compilers.
Edit: Though his point held hm, not quite 'true' categorically, but you could argue the point. Around '06, Fortran was still largely used, at least bioinformatics (sequencing, attempts at heuristic folding .. stuff), weather prediction, and was a perfect fit for linear algebra.[1] It was accessible and fast enough to be used by post-docs who didn't write software for a living, so presumably the ecosystem was founded then continued to develop within Fortran. Even if it was possible to outperform your simulations in C++ (I'm sure you could get at least fairly close), all of the work you were building off of (e.g. with your colleagues, advisors, etc) were Fortran-based.
[1] This is all hearsay from my father's / his doctoral students work from the late '90s/early '00s when I was in high-school, so I can't really offer sources on hand, but if someone really wants me to, I can fire off an email to him to gather specifics.
Fortran is still used in Physics. They have a lot of libraries, and they usually have a some program of a previous paper that almost do what they want now, and they know the technology stack, so many physicist are still not planning to change.
(I work with a Physics group. Sometimes I found programs with an old part that still use numbered lines for the "do". We are still trying to get rid of numbered lines ...)
I work for a company working with hydrodynamics (Potential Flow) and all of our main hydro code is written in Fortran and we don;t plan on moving away from it. Its too easy for non-software oriented engineers to write performant code in with a low barrier to entry.
Indeed. This post is nothing more than "I did this and look at the results" but offers nothing for us to repeat his tests. Plus yeah it is a decade old, things have changed a hell of a lot in the past decade.
>> This post is nothing more than "I did this and look at the results" but offers nothing for us to repeat his tests.
That's a valid gripe to have, but it should be said that most scientific papers that report simulation or algorithm results do exactly the same. Reproducing results is totally left to whomever feels like (and has the time to) dispute the claims.
Which is why you don't often see much in the way of disputation of performance claims, btw (yeah, ok, also because people take part in competitions where you get the chance to put your code where your mouth is).
But scientific papers aren't comparing Language X v.2.7.5 vs Language Y v. 16.9.1. They're comparing Algorithm X vs Algorithm Y, and usually also give theoretical results. Unless there is a big dramatic change in computer hardware, results comparing some algorithms using good implementations in a single well-optimized language will likely stand the test of time well.
No, because the comparisons are usually "baseline version of algorithm I want to beat" vs "highly optimized and hand-tweaked version of the algorithm I have a vested interest in."
At least in my (and my colleagues') areas of CS, you usually ask the original authors of a paper you want to compare to. A large percentage of them will be glad to provide you with their code, for it goes a long way towards ensuring that their algorithms aren't grossly misrepresented. If they don't (because the code is a mess, they can't find it or get it to work any more, or for whatever reason), you'll have to re-implement it. We take great care to match their performance and have, on several occasions, beat it (compared to the published results, running on similar hardware). It's a lot more work, but in our group we really try to make fair comparisons. Maybe not everyone is so inclined, but I'm sick of reading these stereotypes over and over again.
>> Maybe not everyone is so inclined, but I'm sick of reading these stereotypes over and over again.
I'm sorry if my comment upset you. I'm really not trying to advance any stereotypes and rather just report what my experience is so far, as a Masters student (rather than a researcher). I've also discussed this with one of my professors, who does research mainly in NLP and he told me the same thing.
Personally I find it hard to understand why it's not compulsory for academics to point to a public repository where anyone can find the code that goes along with their papers. I've heard arguments for and against sharing code and particularly data, so I'm not saying that I am necessarily right, but it's really just weird to read "our system beats the best results reported so far" with nothing but a few figures in a table to back that up. And with so much infrastructure and tools around to support sharing code (just think of github) I really don't see what's stopping people.
I didn't mean to attack you personally - that sentiment gets echoed a lot on HN and I replied to your comment, meaning to address all of those that I read before but didn't reply to, so please don't take it personally.
My area is in algorithmics, which is a lot less of a "hot topic" and there's less competition and more cooperation all-around. That may be an important factor in the willingness to share code.
I do agree with you on publishing code and have open-sourced the code to some of my research. Unfortunately I never heard back from anyone, maybe my topics just aren't hot enough ;) I'm a bit unsure whether to publish code for ongoing projects (where some aspects have been published, but others are still in the works for future publications) - in that case there is some incentive to keep the source closed until the project is finished.
Here is the problem, it is possible to match or outperform Fortan in C++. But it is not simple. You need to use specific libraries and templates. Fortran can do this with no effort. If you're writing numeric code it is still the place to go for high performance.
An underlying point to this article that is still true is that C isn't inherently fast -- you have to work together with the compiler to make sure it generates what you want.
Why is this even worth pointing out? Well, in many languages communities (Common Lisp is a good example), the speed argument comes up, and it's pointed out that carefully working with the compiler (declares in all the right places and so on) makes the implementation competitive, and this is often dismissed as "too much work", ignoring the fact that even in C, there's no free lunch.
When one recognizes that good performance often requires a dialogue with the compiler, one starts to recognize that a criteria for performant language implementations is not "it's just like C" but rather "it's possible to tell the compiler to make this fast".
This makes many alternate languages viable choices for high-performance work (and eliminates some!), at least based on the current state of their implementations.
Imagine if as much work had gone into their compilers as has gone into the production C and Fortran compilers of today.
Except that the default, basic, recommended style for writing in C and writing in some other language X often produces very reasonably fast code in C, and horribly slow code in X. Which was the whole reason people advanced the argument that you need to use contortions to get X to generate code that even approaches the speed of normal, uncontorted C.
Except for the little fact that in the 80's and 90's most hobby coders still managed to write better Assembly than C compilers for home computers were able to generate.
C compilers are fast in modern times, because of the amount of industry money spent in writing C optimizers since the industry adopted it.
I agree, although I think this is mostly cultural.
If we're only talking about single-threaded, CPU-bound code, this is basically true of any imperative language without fancy data structures: even a fairly naive native compiler will produce "pretty fast" code in our modern times where the standard for pretty fast is pretty low.
It seems intuitive that the further a language is from the machine, the harder the compiler has to work to make that language's idiomatic constructs efficient on a machine.
The kicker here is that C is no longer so close to the machine. It's still a PDP-11, and today's machine is not. This idea is what I think is still useful about the original article, for all its other faults.
I admire compilers like MLton that actually deliver on the promise of higher-level languages without the usual performance compromises for abstraction. Hopefully we'll see more of this as time goes on, as well as more languages like Sequoia that are close to the new machine.
I imagine this is why so many game studios choose to use a mix of the two. C or C++ is used to write the engine code, which is kept just as simple as possible and is in charge of doing all the fiddly bits with the hardware. Unless you're on an embedded system with limited resources though (where the size of your code can actually matter) everything else is usually done in some sort of scripting language with its own reasonably fast JIT. This is possibly sub-optimal, but it enables really fast iteration on the parts of the code that need to change and go through rapid test cycles.
Deciding when to spend the energy and resources to optimize that one routine is the result of good benchmarks and code profiling. Identify your bottlenecks, and optimize the hell out of those. For everything else though, you need to ship a product, and the human effort is largely wasted on a less-called routine that makes the character blink every 173rd frame.
Yeah, but there is no "fast JIT" on iOS and PS4 (not sure about X1) at least because they forbid creating memory pages with "Executable" bit, so usually AOT compilation is still ftw in game dev.
The author is wrong about the details but right regarding the general point. Wrongness:
1. C99 has the restrict keyword to declare no aliasing.
2. there are still many places dropping down to assembly is better than what your compiler can do, mostly since C is too portable to provide access to processor flags directly (see strlen.s/asm for your platform for an example, e.g. http://lxr.free-electrons.com/source/arch/arm64/lib/strlen.S ).
3. (nitpick) Arrays are actually types distinct from pointers in C/++ (e.g. char a[5]; char * b="asdf"; a=b; will not compile, and arrays aren't interchangeable with const-value pointers like char * const either - try passing a pointer to a function accepting an array).
3. nitpick to the nitpick: your example is a bad way to prove that arrays and pointers are distinct:
a = a; // also doesn't compile
const int x, y;
x=y; // also doesn't compile :)
The following is more helpful I think:
a=b; //error
b=a; //fine!
sizeof a != sizeof b; // true
0.8 seconds for C and 2.3 seconds for C++? That discrepancy is a huge red flag in this analysis, given that the code is so naive and simple that the program should effectively be the same between C and C++, so I would expect them to have nearly identical timings.
We can't know how naive and simple the code is. As far as I can see, the timings have nothing to do with the code snippet in the blog post. Instead they are from some totally separate program, with the source code not available in any language.
If I read the OP correctly the benchmark used an implementation of the Longest Common Subsequence algorithm, expected to run in O(n³) time and O(n²) space. It's a dynamic programming algorithm and so not quite trivial, but, hey, it's an algorithm. The coding is never the biggest complication with those.
As a benchmark it's probably OK, but language benchmarks in general are never very useful. Like, the op essentially makes the claim "you can't code this any faster in X language". Well, maybe you can't, but someone else can. Unless you have a way to run exactly the same code in n languages, or somehow produce exactly the same binary from n languages, then there's no real comparison, sadly.
But- we're programmers and that sort of thing is our favourite sport, innit.
Something else that struck me as wrong: the op counts the time the JVM needs to startup in the time it takes for the program to complete. That should be completely wrong, right? Otherwise we should count the time the physical machine took to boot up -and then the OS to load- in the time each benchmarked implementation took to complete. The JVM is a virtual computer, so why treat it any different than the physical one?
No. The JVM should be included too as it is part of what the language needs to run. If you wanna compare including the boot of the real machine, then go for it as to run the JVM you still need a real machine to boot, but then you're adding noise as the machine is the same for the two programs and can, due external out of control reasons, have different timings. But still, JVM IS part of the language and the program unless you compile to native.
To add to this: The same goes for C's process startup (at least the OS won't call main with those arguments; the C stdlib performs some tasks before your code runs), or interpreter startup. Unless you're writing in assembler and call the OS directly, almost every programming language will have some overhead when starting. Granted, managed languages like Java are much worse than many others.
But that's the point, isn't it? You can't ever compare two programs written in different languages claiming to test "the same program" written in different languages. Because in the end either they're not the same program, so there's no comparison, or they're the same program, so what are you really comparing?
I think the reason of comparing two programs in two different languages is exactly to have an idea of how expressive one language can be to generate a performing program given its inputs and outputs are the same, the way on which the code is written is the second factor being evaluated. The benchmark game is the perfect exemple of this, anyone can write a better version of a program and improve the language "score" until no one will be able to improve it. At this moment we can see how the language design impacts the resulting program. This should be the only way to compare languages.
I maintain a programming language that can be built as C or C++. There is no significant difference the overall execution time of "make tests" when compiled either way, and the executable sizes are close. (No options are used to disable any C++ features; just the compiler command is switched.)
If you halfways know what you are doing C++ code has at maximum a few percent overhead. I hope that C/C++ will find a better successor but this kind of naive benchmark doesn't help.
If you halfway know what you're doing, C++ code ought to be at least as fast, as you have the advantage of a reacher type system to help the compiler – no taking void pointers everywhere.
The exact meaning of the statement "C is efficient" is that there is no built-in overhead: no virtual machine, no garbage collector, etc. Incidentally, the same is true for C++, with its mantra "you don't pay for what you don't use", except that there is a caveat: the mindless using of the C++ standard library classes could easily result in a significant overhead - mostly as a consequence of a liberal use of the heap for memory allocation. (But therein also lies an advantage: you could write C++ as if it was Java and still get a better efficiency.)
Reeks of an anti-C bias. This comment by the author is flat out wrong, and I quote:
"""
`int x[1000]` in C or C++ doesn’t have much meaning. It’s semantically equivalent to declaring “x” as `int *x`. You can reassign X any time you want, to make it point to a different location.
"""
C is also a pretty different language style-wise than C++. Aside from that, they're both good languages for building runtimes since their memory model resembles physical memory and they can run in an unhosted (runtime-less) environment.
It's not obvious what he was talking about. In context he was responding to this comment:
"While I know that C compilers are really conservative about this, doesn’t it matter as to how x and y are declared? After all if they are dclared as:
int x[1000], y[1000];"
Those array declarations aren't function argument declarations (except in pre-C89), so not sure why he would talk about semantic equivalence in response to that if it only applies to function arguments.
Ehhhhh.... I can't help but think that it's never completely necessary to use totally unrestricted pointers in 'C'. Might be convenient, but very nearly certainly not necessary. Even in systems calls and library functions, it's possible to never dereference a pointer in an unconstrained manner. But you have to be prepared to reason rigorously about those constraints.
Noting that the semantics of array access semi-equate to pointer dereference is well beside the point.
This fellow's "for" loop:
for (int i=0; i < 20000) {
for (int j=0; j < 20000) {
x[i][j] = y[i-2][j+1] * y[i+1][j-2];
}
}
shows that you don't even have to have unrestricted pointer access to have bounds violations - what's "y[i-2]" when i==0?
'C' is not for everybody. It's a systems programming language. As the space of activities related to computers grows, there's no good reason to continue to erect a strawman about the language and set fire to it.
You can't pass an array to a procedure without it becoming a pointer. But that is not even the issue ... the issue is that even if you know something is an array, any pointer of any type could be pointing anywhere into the interior of that array.
That's not the problem.
The problem is any other live pointer could point into that array, for all the compiler knows.
Unless the array was generated locally and never aliased, then if you ever call through some set of procedures that the compiler doesn't have complete visibility into, then it is possible that that code generates a pointer into that array and then uses it.
The compiler has to be VERY conservative.
True, the efficiency argument is wrong. Thinking about your algorithm and knowing where you actually need to be efficient are far more important.
What I appreciate about C in particular is that it seems harder to "program by accident" than it is in most languages. C was my first language. After I learned C, I programmed in a higher level language for a while later before discovering that its runtime threw exceptions on 'array subscript out of bounds.' I discovered it by looking over someone else's shoulder in a lab. I realized then that I had just learned not to do that because C was so unforgiving.
We can argue whether the kind of discipline of thought that you gain from that experience is good or even useful today, but I suspect it is. It's different from learning how to do mental math in a world full of calculators - that's just raw mental exercise.
I think that C is a very good first language to learn, not because it hardens you against errors, but because it has a very simple mental model that you can use to understand how other might languages work in terms of C semantics, which provides useful clues for what the costs are for features in those languages.
I'm pretty sure that if C was the first language presented to me back in college, I would have found a new major.
C was more convenient than doing assembler, but not much more, compared to say Pascal. (Having learned BASIC, FORTRAN 77, Pascal, PDP-11 ASM, Lisp at school, and some dBASE at work, before we switched over to C at school, which seemed like a big step backward)
var s : string; multi : integer;
...
s := 'Hello' + ' ' + 'world';
multi := 616;
writeln( s, ' ', multi : 4);
vs
char s[ 256 ]; int multi;
...
s = strcat( strcat( "Hello", " "), "world");
multi = 616;
printf( "%s %d4\n" s, multi);
It's a shame no work was put into good Modula 2 compilers (for various platforms) around 85-90, rather than running with C as the standard. Imagine all the security disasters that probably would have been avoided.
> In C and C++, there’s no such thing as an array – there’s just pointers, which you can subscript and a shorthand for pointer arithmetic and indirection
This is fallacy #1 about C. C does have arrays which constitute a contiguously allocated piece of memory, it's just that the identifier decays to a pointer in most contexts. When dealing with arrays, there is no indirection involved.
He's obviously making comparisons to Fortran's arrays, which would be equivalent to using `restrict` for pointers (in addition to passing the size). This enables some compiler optimizations that can't be done if there is aliasing (ie. two pointers pointing to overlapping areas of memory).
In other words, this claim in the original article is incorrect: "But there's no way to write code in C or C++ that guarantees that. "
I have a hunch the Python version did not use numpy because numpy calls to a BLAS library in either C or Fortran to vectorise stuff and should have been close to C and C++ in performance.
Edit: numpy was available in 2006, and even earlier than that.
Really good points! For me, the best approach to efficiency in terms of development speed and performance is to use a fast language for critical code paths and then make it accessible using a scripting language. Python and C++ work really great in this respect, as it's very easy to generate Python bindings for a C++ library using e.g. SWIG, SIP or Boost. Lua is also a good choice as an embedable scripting language and widely used e.g. in the game industry (it's faster than Python and has a tiny code footprint).
Recently I also investigated Go as my new "fast language", generating bindings for other languages such as Python is still very difficult though and many of the emerging tools (e.g. gopy) are not very well maintained our outright broken. If integration with other languages was easier I think Go would be a serious contender for this kind of scenario though.
It can be very fast by itself, and can also be written extremely high level & is very easy to form bindings to other languages, thanks to the multiple dispatch. Even small things such as having full unicode support is very nice to write maths code in a very 'pen and paper' kind of way.
Yes Julia is interesting, but for most use cases the languages is (to my knowledge) still significantly slower than C, while not having the great library ecosystem that e.g. Python has. This might change in the future though, and I know that really bright people like Stefan Karpinski are behind the language, so I'm sure it might become a serious contender to Python in the future.
Beware of putting too much into these benchmarks. They are far too simple to say anything about larger systems. In particular, large systems are to a great extent measured in how well they optimize interactions between different modules in the system. Stuff such as link-time-optimization is not going to be shown via the rather simple benchmarks in the shootout.
Loved the blog post. Too many people just love jumping the gun and saying that C/C++ are fast languages. I believe it was Chandler Carruth who said that C/C++ aren't performance languages, but rather languages that give you the largest control of your performance.
If there is a universal, relatively simple formula for general efficiency, then compilers may be constructed with such algorithm to render certain language more efficient than other. Alas, there is no such thing.
Efficiency, as current situation (before true AI), are always specific. The compiler may implement certain specific optimizations that enable efficiency for certain specific applications -- inline here and there, vectorization for trivial cases, loop unwrapping for some patterns -- but hoping the compiler can work out the true efficiency is naive.
So true efficiency often requires true intelligence. I'll list one example, the famous platform specific linear algebra library blas. It is really efficient but FORTRAN is not the real reason; it is due to the human expert tweaking.
Under the context that real efficiency require human tweaking, there is some advantage of choosing C over most other language, in that C allows more room and more straight forward path for tweaking.
Comparing C against other language (not FORTRAN, which is very much like C), the efficiency of the language is achieved by its compiler doing less rather than doing more. Because the compiler is doing less, it is more straight forward on how to reach efficiency (if the coder sees the efficiency). With C, getting efficiency is programming. In comparison, for some higher level programming languages, efficiency is having faith in the language designers and compiler implementers; and tweaking often requires specific knowledge of the language and compiler. For example, knowing that using higher-order functions can be more efficient than a straight loop -- that is beyond general knowledge of programming.
On this, I would like to differentiate C from C++. C++ language (and compiler) is much more complicated than C and much of its efficiency is tweaked by the compiler. For example the '-O' flag is often not so significant for a C program but it is often day and night kind of different for a C++ program (often -O2 or even -O3 is necessary). For C++, significant portion of efficiency is dependent on the compiler, and as such, requires additional knowledge on the coder, therefore, more difficult to reach the efficiency. Of course theoretically, one can write C in C++; and many use that theory to derive that C++ is as if not more efficient than C. That is not the case in reality. Few can contain themselves in a C subset when writing C++ (if they can, why C++ instead of C?) So in reality, if you are using C++, you almost certainly are not writing C and often the efficiency path is more complicated comparing to C.
Efficiency is measured by cost and result. Doing nothing end up with no result, how is that "most efficient"? And what is the definition of effectiveness? ... maybe the word you are looking for is convenience?
The claim that Fortran arrays are easier for the compiler to optimise than raw pointers in C/C++ may well be true. But I imagine that there has been progress on compiler technologies to address auto-vectorisation in C/C++ since the post was written (and since C99, C has the restrict keyword too). Can anyone comment?
FORTRAN disallows pointer aliasing, which allows the compiler to perform certain instruction re-ordering optimizations. As far as I know this helps more with instruction level parallelism, as opposed to auto-vectorization. C99 introduced the restrict keyword in order to solve the pointer aliasing / instruction reordering issue.
Auto-vectorization is a bit different, it uses mathematics of integer polyhedra to perform loop transformations allowing the compiler to replace scalar operations with SIMD vector operations. The Intel C compiler and IBM XL C compiler have been quite good at this for a long time (at least on simple loops that don't really require transformations). More recently GCC implemented the "Graphite" optimizer to handle this type of auto-vectorization, and LLVM has the the "Polly" optimizer. FORTRAN automatically benefits from these optimization improvements because all these
compilers have FORTRAN front-ends that translate these languages to an intermediate representation, where the optimizer don't "know" what the source language was.
I think the pointer aliasing issue still haunts C because the "restrict" keyword may not be commonly used. So FORTRAN seems even today to have a reputation of being "faster" than C for numerical codes.
That's not "C is inefficient", it's "Sequental computation is inefficient". Want a lot of "number-crunching"? Learn CUDA (or similar non-hardware-locked thing). It's quite simple to get into and you can get orders of magnitude speed boost. Screw Fortran, it's 2016!
The whole point is - it's not about a concrete language (CUDA-C syntax is almost the same as C, you know). It's about the way of thinking and completely different parallel algorithms like Blelloch scan.
There is no compiler for any language, which can turn an arbitrary sequental algorithm into parallel, and will never be.
Whilst C based apps can be made to run efficiently, its not efficient to code with the usual tools compared to other languages and IDE's (think big customer development bills), and thats even before you get into the debate of whether some code runs faster with multiple threads on multicore cpu's whilst bearing in mind the ram & bus limits, cpu cache collisions, available cpu instruction sets and so on.
More people really should get over the "awesomeness" of the clockwork turk machine, unless they are easily pleased?
I believe modern C/C++ has the restrict keyword which tells the compiler that pointers don't overlap. I'm not sure if this applies to arrays, but if so, it should be fairly trivial for a compiler to optimise parallel execution so that it matches Fortran or OCaml.
While I agree with the author about the main topic, the actual content does not illustrate the point well.
The author wrote a naive loop in a bunch of languages, and, got a faster result in OCaml (0.3 sec) than in C(0.8 sec). Right, but the code was a totally naive loop:
for (int i=0; i < 20000) { for (int j=0; j < 20000) { x[i][j] = y[i-2][j+1] * y[i+1][j-2]; } }
The fact that C has a bunch of libraries for vectorization of such code, or that there are million compiler directives that can speed that code a lot is somehow missing from the article.
So, the conclusion is that a compiled OCaml code is twice as fast as a naively written unoptimized loop in C.
The benchmark results are from several implementations of a longest common subsequence algorithm, the nested loops shown earlier are probably not part of that algorithm and were only meant to illustrate aliasing issues.
That's right: the author wrote a bad argument because they made an invalid performance comparison: naive vs. sophisticated. Is there a C code that can be written that compiles to assembly which is as efficient (or more) than OCaml? Likely so, and if the author doesn't even attempt to do that, I don't find their argument convincing.
I recently learned that Knuth created a virtual instruction set (MMIX) specifically because he found that comparing algorithms with high level languages was too prone to problems. Using a direct instruction set that executes on a virtual machine means that algorithm analysis can focus on the algorithm.
Now, there is something to be said for languages which produce code that is nearly as efficient as the most efficient code you can produce with C, especially if it's easier to write it. But that's not the author's point.
The numbers are indeed not independently verifiable, and it looks like they are actually from 2000, not 2006 when the article was written:
I can’t give you the code I did the LCS tests on – as I said, it was six years ago! It’s somewhere on a backup tape, and I’d need to get permission from my employer to release it.
However the OP denies your "naive C vs. sophisticated OCaml" allegation and claims it was rather the reverse:
At the time, I had just come off of a project where I was part of a team implementing a C++ compiler. C++ was definitely my strongest language at the time. The code in all of the languages was as carefully hand-optimized as I could make it.
And at the time, I had *never* used OCaml; I had used its predecessor, Caml-Light for some experiments about 4 years earlier. But I was *far* from a highly skilled Caml hacker.
Is there a C code that can be written that compiles to assembly which is as efficient (or more) than OCaml? Likely so, and if the author doesn't even attempt to do that, I don't find their argument convincing.
That is the point of the author, you certainly can achieve the same speed in C but you have to do extra work because the compiler is unable to do things like discovering the absence of aliasing. So you have to manually build this into the code which in turn makes it more complex.
Yeah, the thing about low-level languages like C is that they are so tied to the machine -- and CPUs are now immensely complex beasts --- that your program, to be fast, has to be written to take advantage of that machine's characteristics: using target-specific intrinsics (e.g. SSE, AVX) or designing data structures around the size of the cache line.
Whereas a higher-level language can truly abstract over of all these things, and perform target-specific optimizations where it sees them.
OK, so let's turn it around: are there cases where an expert user who wants to get the most out of the machine can generate faster code in OCaml than the fastest implementation an experienced C author can write?
I don't care about the average case- I care about the maximum case (because I work on performance sensitive code).
The fastest code possible is determined by the underlying hardware and is ultimately independent of the language you use. The only reasons that you would not attain this speed limit are that you are either not aware of the best machine specific implementation, that you don't know how to express it in a given language or that a given language does not allow expressing it.
Chances are good that even really good developers are not aware of the best implementation because it might depend on pipeline latencies, constraints on which instructions can execute in parallel and so forth. That is what compilers are good at and I don't think there is an in principle difference here between C and OCaml.
The other two points are more involved. On the one hand C offers, for example, things like intrinsics and seems generally more apt to control what machine code will get generated as compared to OCaml. On the other hand C is less expressive when it comes to conveying higher level semantics to the compiler to inform its optimization decisions.
So if you are aware of the optimal implementation - which, as mentioned above, you may not be - you are in a good position with C to get (close to) the desired result if you put in the extra effort to express the semantic details. On the other hand OCaml will certainly make it harder or even impossible to express your desired machine specific implementation. But then again the OCaml compiler can use information not (easily) available from C programs to guide its optimization and could come up with a sequence of instructions better than what the developer could think of.
In principle OCaml seems to have an edge over C because of the stronger semantics making it easier for the compiler to find a good implementation possibly beating the human. On the other hand, if the compiler does a bad job and you realize it, then C probably offers you the better tools to force the compiler to honor your will.
To finally answer your question, it seems highly unlikely that there are cases where one could in principle not write a C program as fast or faster than the equivalent OCaml program. On the other hand it seems not so unlikely that a developer might not be aware of what that C code would have to look like.
I don’t do numeric methods, but I occasionally solve performance-sensitive CPU-bound problems, in areas like CAD/CAM, 3D graphics, and GIS.
An average C++ CPU-bound code isn’t radically faster compared to e.g. C#. I suppose that’s true for other higher-level compiled languages.
But the good thing about C++, after I’ve profiled which parts of the algorithm take most time, I can gradually replace my average C++ code with SSE or AVX intrinsics. Sure it’s harder to read, write and understand, but if done right, on modern hardware it can become several times faster.
The gist if this article is that in 2006 some C compilers would produce less efficient machine code than some other language compilers in certain specific type of problems.
I don't think languages that are used as much as C and C++ are ever truly get "replaced".
But as for "being a viable choice instead", we intend for Rust to be so. The devil, of course, is in the details: for example, if you're on an obscure or proprietary platform, a C compiler may be your only choice. In the context of this article, while LLVM will autovectorize some things, direct control of SIMD hasn't made it into stable yet, and requires a nightly today. But as far as the language and its power goes, our goal is to be, yes.
Unfortunately that site is very questionable although still might be the best resource. The person who runs it changed the interface to make it difficult to compare languages as well as it used to, and he purposely doesn't include many implementations and languages that people send him.
There is no ISPC or Julia, the C and C++ benchmarks use SIMD intrinsics specific to gcc, the benchmarks are run on core 2 processors... etc. etc.
Either way the interface is not even close to being as useful as it was, and he has removed languages and implementations that were already up and running.
It's a shame no one has stepped up to provide a better alternative! I don't see any reason why it shouldn't accept, test, and post all submissions automatically using something like a Docker container.
"If you're interested in something not shown on the benchmarks game website then please take the program source code and the measurement scripts and publish your own measurements."
(And remember to include the license file in your redistribution.)
It's true that for extremely simple programs like the author's sole microbenchmark, non-C languages can get a speed boost over C from reasoning about aliasing. But this doesn't scale. The author's argument breaks down as soon as you ask for an example benchmark that is larger and smells like something real.
It's hard to take any post on efficiency seriously if the author doesn't indicate that they understand the difference between a cache miss and a branch-prediction. Things like that are so huge in modern processors that you will have trouble understanding how things work without it.
This guy also could have at least linked to a repo of his code... though none of that fancy github stuff was around, Sourceforge and CVSweb was around back then. Without the source we can't really accurately judge, well, much of anything. Compiler flags, JVM tunes (was it warm?), source code, data sets, or hell, even mention of compilers.
Edit: Though his point held hm, not quite 'true' categorically, but you could argue the point. Around '06, Fortran was still largely used, at least bioinformatics (sequencing, attempts at heuristic folding .. stuff), weather prediction, and was a perfect fit for linear algebra.[1] It was accessible and fast enough to be used by post-docs who didn't write software for a living, so presumably the ecosystem was founded then continued to develop within Fortran. Even if it was possible to outperform your simulations in C++ (I'm sure you could get at least fairly close), all of the work you were building off of (e.g. with your colleagues, advisors, etc) were Fortran-based.
[1] This is all hearsay from my father's / his doctoral students work from the late '90s/early '00s when I was in high-school, so I can't really offer sources on hand, but if someone really wants me to, I can fire off an email to him to gather specifics.