My experience with the computer language shootout

jerf · on April 3, 2011

It's true that it's arbitrary; the problem is, there's no non-arbitrary alternative, so it's a weak charge to make.

Unless you're going to allow every language to converge on simply blasting some bytes into RAM and executing optimized machine-language code and watch all the languages cluster to the exact same location give or take how long it takes to load the machine code, you have to draw some lines about what's a "real" implementation and what isn't. If I were running it, I'd be even more strict and insist that the solutions must be "idiomatic"... and what's idiomatic and what isn't would also be arbitrary, because there is no escape from the "arbitrary". Yet the end result is useful. Not totally determinative, but it's not anywhere near "useless" either.

akkartik · on April 4, 2011

Combining your comment and DarkShikari's (http://news.ycombinator.com/item?id=2404278), it seems like a good shootout should simply compare the simplest, most idiomatic solution possible in each language.

stewbrew · on April 4, 2011

The problem with a comparison of idiomatic solutions is that they rely in different degrees on what's in their libraries. IIRC the shootout tries to avoid this by comparing similar solutions using the same algorithm that are actually implemented with the interpreter/compiler in question. IMHO that's the more sensible way to do this.

akkartik · on April 4, 2011

Perhaps that concern is eliminated with good test programs? As an extreme example, if I wanted to compare how different languages do array slicing it's meaningless to force python to implement array slicing without the [:] operator. But then this whole test program is meaningless anyway.

You're right that there's a lot of nuance here, and perhaps there's no better solution. It's not even clear to me if there's a meaningful question to be answered. "This ocean's warmer than that one? What part? In summer or winter? Did you measure near a blue whale's fart?"

igouy · on April 4, 2011

If that's what you'd like to see then please make the measurements and publish the programs for that comparison!

akkartik · on April 4, 2011

No I think I'll just sit here and complain ineffectually, but thanks for the suggestion!

I can work on only one thing at a time, but I can complain about many things at once. I think both functions are useful.

I wasn't even aware of this particular problem. Now this thread's taught me (and others) to utterly ignore the shootout. That's useful even if nobody builds a replacement.

igouy · on April 4, 2011

What "particular problem"?

akkartik · on April 4, 2011

That the solutions shown are filtered through an arbitrary set of hoops.

igouy · on April 4, 2011

There's nothing arbitrary about wanting a Python program to work with CPython and Python 3.

akkartik · on April 4, 2011

I don't see how your one-line response is addressing the original article's criticisms, especially in the final paragraph.

I'm also disappointed that you respond to a rebuttal by switching your argument, without acknowledging whether the rebuttal is correct or not. I'm going to stop responding to this thread.

igouy · on April 4, 2011

Do you think that final paragraph is somehow The Truth?

>> "It's also not possible to send any messages once your ticket has been marked as closed, meaning to dispute a decision you basically need to pray the maintainer reopens it for some reason."

Truth: There's a public discussion forum!

I'm disappointed that Alex Gaynor never mentioned that the first problem with his program was a bug in CPython, but instead kept that to himself for his blog.

I'm disappointed that Alex Gaynor never mentions in his blog that the next version of his program didn't work on x64 - it hung and timed-out after 1 hour.

Joseph La Fata contributed a Python pi-digits program the same week - his program worked first time on x86 and x64, on PyPy and CPython and Python 3 and only used ctypes to get to GMP.

2 days ago Joseph La Fata contributed a Python spectral-norm program - his program worked first time on x86 and x64 on PyPy and CPython and Python 3.

Do you see the difference yet?

What do you think the blog entry "My experience with Alex Gaynor" would be like?

akkartik · on April 4, 2011

I wish you'd said all this up front rather than all the rhetoric so far!

igouy · on April 5, 2011

Rhetorical questions: What part of the blog could you have fact checked? What part of my comments could you have fact checked?

akkartik · on April 5, 2011

I hate rhetorical questions. Just say what you want to say. Most people won't react to them like me, but they won't expend the effort to parse them and understand your point either.

I'm not going to fact check every story I read. When I actually do something that requires choosing a platform I may look at the shootout. Or I may just try a few little programs myself.

You may think I'm being unfair, or moronic. But I suspect most people are like me. Even you, when you don't notice. PR feeds on the interested-but-uninvolved.

I find this thread tragic. People could have seen the shootout's side, but now most of them will remain uninformed because the herd passed through while you were asking rhetorical questions. It's the shootout's loss.

sambe · on April 4, 2011

I like your standard but I think the real criticism being made is not that it's arbitrary but that it's arbitrary in different ways for different languages. This starts to look like intentional dishonesty. Another standard closer to the status quo would be "anything portable". Why does one language require portability and not another? What is that actual problem with the example given? There doesn't appear to be one.

igouy · on April 4, 2011

> Why does one language require portability and not another?

Because more than one language implementation was measured for that language but only one language implementation was measured for the other languages.

If only one language implementation was shown for Ruby it would be Ruby 1.9 - not JRuby

If only one language implementation was shown for Lua it would be Lua 5.1.4 - not LuaJIT

If only one language implementation was shown for Python it would be Python 3 - not PyPy

PyPy and LuaJIT are being treated more favourably than other language implementations.

enneff · on April 4, 2011

The irony here is that the proposed PyPy benchmark is more Python than the previous CPython-oriented one.

DarkShikari · on April 3, 2011

In my experience language shootouts in general are often terrible by design -- with these being just a prime example.

A lot of the problem is in the questions they're designed to answer versus the questions people use them to answer.

For example, if I'm comparing Python and C, I typically want to know "how much slower would my program be in Python?", not "how much slower is my program in Python if I spent so much time hyper-optimizing it that I might as well have written it in C?"

But the test cases usually try to answer the latter, not the former.

dkarl · on April 4, 2011

It might be more reasonable than it seems at first glance. It's true that it's good to know how fast typical code runs, but there's another important question: when I run into performance problems and need to optimize a bottleneck, how fast can I make it before I have to resort to non-portable code or C extensions that complicate my build process?

dagw · on April 4, 2011

Anybody writing, for example, Python code to solve these sort of problems in the real world would instantly reach for numpy. Which, while not part of the core language distribution, is pretty close to being a standard library for most python programmers. I'm sure several of the other languages have similar libraries that are being ignored in these benchmarks. Without taking things like that into account, theses results don't say too many useful things about real world performance.

igouy · on April 4, 2011

They say what difference numpy might make!

http://shootout.alioth.debian.org/u64q/program.php?test=spec...

dagw · on April 4, 2011

Cool, I missed that

justincormack · on April 4, 2011

Languages/implementations also vary in how much overhead switching to C costs you, especially in loops say, which eg the pidigits benchmark does sort of measure.

davidw · on April 4, 2011

> In my experience language shootouts in general are often terrible by design

My experience is that this is the sort of thing where everyone rags on it, but no one actually attempts to provide something "better".

gaius · on April 4, 2011

Because there's always a "for what" in there. An an analogy, consider MMA. On the surface, it seems to answer the question "what's the best martial art?". But really, it only answers the question of what is the best martial art for fighting a single opponent in an octogon shaped ring in front of an audience aiming for a submission? And the answer is of course Brazilian Ju Jitsu, exactly the answer the founders of UFC wanted...

Former cop Rory Miller writes about this in his book, the police experimented with BJJ and found it useless. Why? Because in BJJ you pin your opponent on his back because it makes a better show for the audience, but as a cop you always pin your opponent on his front so you can handcuff him!

davidw · on April 4, 2011

Ok, sure, but I'd rather see a rough attempt at getting some numbers than just throwing your hands up in the air and saying "gee, that's a hard problem".

Also, I think that we all know enough about programming and languages and their many uses that we can talk directly about it, rather than about an analogy.

akkartik · on April 4, 2011

Why does somebody saying "I wouldn't use this." have to provide an alternative? If such a statement is backed by reasons I find it interesting to read. They're saving me trouble trying it out, just like any other review.

davidw · on April 4, 2011

Why? Because people want to know how fast languages are, and it's not a stupid question if you are considering it in a wider context.

And they're going to do benchmarks.

So you can either complain that they're not good, or you can try and improve them.

akkartik · on April 4, 2011

"So you can either complain that they're not good, or you can try and improve them."

Yes, that sentence is literally correct. But it sounds like it's saying one option is not useful. And I still haven't heard a single reason why reviews of benchmarks are bad.

Responding to a criticism with "those who can't do criticize" is super, super boring. It's been done to death. You're just tarring all criticism with an overly broad brush. If it's bad criticism why is it worth responding to? And if it's plausible criticism why aren't you focusing on the actual details?

davidw · on April 4, 2011

Bitching about language benchmarks has been done to death too.

akkartik · on April 4, 2011

Who are you saying is bitching? OP or DarkShikari or me?

kingkilr · on April 4, 2011

Maybe there's a space for "if you spent an average amount of time optimizing" :) For example my PyPy optimizations took maybe 4 hours total, with 0 time spent looking at assembly.

igouy · on April 4, 2011

> the questions they're designed to answer

You'd think there'd be some kind of statement about that?

http://shootout.alioth.debian.org/help.php#why

> the questions people use them to answer

You'd think there'd be some kind of advice about that?

http://shootout.alioth.debian.org/dont-jump-to-conclusions.p...

> I typically want to know "how much slower would my program be in Python?"

And we should all know the answer - It depends on how you wrote your program in C and it depends on how you write your program in Python.

akkartik · on April 4, 2011

The question marks seem to suggest that you're poking a hole in grandparent's argument, but it seems like you're both in agreement that the shootout is misused. What am I missing?

igouy · on April 4, 2011

"terrible by design"

Something can be well designed and yet still be misused.

akkartik · on April 4, 2011

'Terrible by design' isn't the same as 'terrible design'. I think he was saying it's deliberately bad for some use case.

glenjamin · on April 3, 2011

I don't see why the same program has to run on PyPy, CPython and Python3. Isn't the idea to do different implementations for each language. I know technically they expose the same (or a similar) language, but they're dissimilar under the hood.

I'd imagine the implementation varies across the other languages by more than just syntax.

The point that worries me the most, is the amount of microoptimisation being applied to these "benchmark" programs, making the results more or less pointless for real-world use.

igouy · on April 4, 2011

There's nothing to say that all of CPython, Python 3, and PyPy must be measured.

For the moment, they all are being measured, so it's interesting to see that programs written for CPython might perform badly with PyPy, and programs written for PyPy might perform badly with CPython.

glenjamin · on April 6, 2011

I think in general, programs written in "Python" will perform better in PyPy then CPython, but the current submissions are hyper-optimised for the implementation details of CPython.

igouy · on April 10, 2011

PyPy was shown for 2 years. PyPy performance with programs hyper-optimised for PyPy would have been shown if someone had contributed them.

Is libc.write a great example of programs written in "Python"?

wriq · on April 3, 2011

It's a shame that it all seems to be up to one person and what he will allow considering how often the shootout is referenced in discussions. I wonder if Mike Pall's experience was similar when he posted the Lua/LuaJIT versions.

mikemike · on April 4, 2011

Same experience here with LuaJIT. My submissions using low-level types (byte arrays and such) were put into 'interesting alternative', too.

Almost all other languages can use byte arrays, when they are the appropriate data structure for the job. The C submissions make heavy use of GCC extensions, Haskell gets to use mutable (OMG!) byte arrays and Free Pascal has about as much in common with Wirth's Pascal as the name.

But Python and Lua are not allowed to do that? Apparently not all languages are treated equally. Dismissing submissions by resorting to a flawed definition of 'standard' and then suppressing further debate is really lame.

I contributed almost all of the Lua programs to the shootout, but I do not feel particularly encouraged to continue contributing any programs.

igouy · on April 4, 2011

Here's how Alex Gaynor compared his program to your programs-

"It's not used to do any crazy hackery like the LuaJIT one, just to access the libc write() function."

mikemike · on April 4, 2011

So what? Look at the C, Ada or ATS submissions for even more 'crazy hackery'. My programs are quite straightforward in comparison.

igouy · on April 4, 2011

Please show that there's some substance behind your "suppressing further debate" accusation.

Obviously your opinion wasn't suppressed on proggit.

Obviously your opinion wasn't suppressed here.

And nothings been done to stop you posting in the discussion forum or commenting in the tracker.

scythe · on April 4, 2011

Lua still has no regex-dna because the standard string functions aren't proper regexes (no choice operator) and LPeg isn't available as a Debian package (which is the real problem).

'course, it'd be hard for LuaJIT to do much better in the shootout than it does now. It's beating C#.

rtaycher · on April 4, 2011

Better(actually faster on 1/3, smaller on a 1/3) then mono!!!/c# http://shootout.alioth.debian.org/u32/benchmark.php?test=all... , mostly slower then java -server http://shootout.alioth.debian.org/u32/benchmark.php?test=all... . Still impressive.

mikemike · on April 4, 2011

If the submissions using low-level types would be included in the total score, LuaJIT would show at around 1.5x instead of 2.5x.

pygy_ · on April 4, 2011

AFAIK, the LuaJIT entries use custom, optimized scripts.

igouy · on April 4, 2011

They certainly use Mike Pall's expert skill as a Lua programmer and implementer of LuaJIT ;-)

But they are Lua programs measured on both the Lua interpreter and LuaJIT.

(Programs that rely on the LuaJIT only FFI library and won't work with the Lua interpreter are shown separately.)

igouy · on April 4, 2011

It's up to you!

Take the measurement scripts (download from the Help page), measure programs and publish your measurements.

onan_barbarian · on April 3, 2011

The Shootout's C implementations do occasionally turn into a festival of SSE intrinsics and the associated unrolling. Some benchmarks, anyway.

Still, not using SSE when it's available is dumb and perhaps all the other languages need to start playing, too. A little tricker in Python, of course....

kingkilr · on April 4, 2011

Well, that's what the compiler is for! PyPy uses SSE under the hood for various float handing things.

onan_barbarian · on April 4, 2011

Unfortunately, SSE - especially in integer-land, so I suppose we're talking SSE2 and beyond - often requires more specialist care and feeding than any automatic method (compiler or run-time) can provide.

Some cases aren't hard to pick up (e.g. bulk operations on big arrays) but others require trickery of the kind that compilers usually don't (or couldn't) have.

This isn't made easier by the notoriously non-orthogonal nature of the SSE integer operations and the rather limited number of ways that you can get in and out of SSE-land (to, say, affect a conditional or get something into a GPR).

igouy · on April 4, 2011

Also "J2SE platform version 1.4.2 now uses SSE and SSE2 instruction sets for floating point computations".

Luyt · on April 4, 2011

I once wrote a function which uses MMX to convert a (possibly very large) string to lowercase. That was pretty much faster than the library's strlwr() function. http://codepad.org/BeDqS1Ws

onan_barbarian · on April 4, 2011

Great stuff. I think there are some potential tricks here to reduce the number of comparisons - maybe do a parallel subtract by k to pull down 'A' to -128 (smallest possible byte) then do your comparison against (ord('Z')-k). Or maybe push up 'Z' to +127..?

That way you can get a single comparison and can replace a pcmpgtb and pand with a single subtract. Then switch it to SSE2, unroll and you're good to go.

Alternately, http://www.azillionmonkeys.com/qed/asmexample.html in section 11 ("Converting Uppercase") contains a brainsmashing version of this entirely in SWAR ("SIMD Within a Register"), which could be adapted with a certain amount of pain (largely due to the absence of a double-quadword bitshift in SSE2, which is retardlepated).

robryan · on April 4, 2011

Maybe you should have to submit and have it ran against a secret program that covers many areas of an implementation, so it's unknown precisely what needs to be optimized to improve on the test. Have a limit for how often you can retest it to so the only real way to be the fastest is to optimize lots of stuff which is a net win for language users.

scythe · on April 4, 2011

Why not at the least accept the first submission for the PyPy entries? It shouldn't matter that it doesn't run on CPython -- if it's idiomatic Python and it runs on PyPy it looks valid.

At the same time, I can see where the guy is coming from with ctypes.