Hacker News new | past | comments | ask | show | jobs | submit login
Only fast languages are interesting (scottlocklin.wordpress.com)
130 points by spindritf on Nov 30, 2011 | hide | past | favorite | 86 comments



There are a few things wrong with this benchmark. In Lush, he's measuring the time it takes to sum 3 million pre-computed random numbers. In Clojure, he's measuring the time it takes to generate 3 million random numbers, and then sum them.

It also looks like he's using two different data structures. In Clojure, its a lazily generated linked list of objects; in Lush, it appears as if he's using a vector pre-initiated to the right size.

He also doesn't mention which versions of Clojure and Lush he's using, or how much memory Lush uses. He complains that a JVM with a 130M is too small, but the 30 million long array in Lush would have been almost twice that size if it was populated with 64-bit doubles.

For reference, it takes me 571ms to run the 300 thousand number example that took him 861ms. If I factor out the random-number generation from the Clojure benchmark, it cuts the time down to 120ms.

If then I use a pre-generated fast array and the specialized areduce function (to match his specialized idx-sum function), I get 16ms for 300k, 22ms for 3m and 230ms for 30m.

So performance is on par with Lush, so long as you compare like for like.


"The point is, Lush has datatypes for fast numerics: it’s designed to do fast numerics. Clojure doesn’t have such datatypes..."

He knows he's not comparing like for like. What he apparently doesn't know is that there is a better data type.


Whats the better datatype? I'm looking at doing some numeric programming and I wondered about Clojure for this.


If your aim is raw speed, not a lot beats a standard JVM array.


This is flamebait, especially the headline.

There are various tools you can use, and using a screwdriver instead of a hammer won't help you much if what you have is nails.

From my point of view, you may get back to me when Lush has data structures for concurrent programming and does (nearly) pauseless GC on multicore machines with heaps of 12GB composed of complex data structures, not just vectors of numbers. This is what I get with Clojure and for my applications it works great. Oh, we also do numerics, quite a bit in fact. But whenever I need actual raw-metal double-adding performance, I use an appropriate tool, e.g. an extension/library. For certain string operations we even went down to x86-64 assembly, with great results.

What you describe is a JVM limitation, not a Clojure limitation — and you're right in only one thing: Clojure is not the right tool for you.


Digression here, but I like the "(nearly)". This bit always amuses me about garbage collection wonks. The pauseless bit is a real time requirement. Saying your GC has great latencies or is "(nearly) pauseless" is tantamount to telling a real time engineer your system only fails some of the time. It makes you look dumb.

GC is great. GC makes a ton of things simpler. GC as implemented in popular environments still sucks for real time use.


The 'pauseless' bit is not just a real-time requirement. A website that stalls for 2 seconds once every 10 minutes is unacceptable to me. That application thus requires a garbage collector that is nearly pauseless: the pauses are small enough that your users won't notice them. So I don't think it makes anyone look dumb at all to speak of a 'nearly pauseless' garbage collector.


I can relate to this. Nothing is more disappointing than putting together a nice piece of code and then seeing it run too slow to be practical.

A while ago, I wrote some physics simulation in Python. The code turned out to be really neat and quite idiomatic Python code. The code ran very slow, I was seeing only 10-20 frames per second when I was shooting for something closer to 120 frames per second. All I was doing was simulating a cube suspended on a plane with four dampened springs, and I was only simulating one of those where I wanted lots of 'em.

The problem was that my code was allocating and freeing way too many objects. The proper solution would have been to go from a neat idiomatic approach where the code resembles the mathematical equations it simulates to some kind of structure of arrays -style code with Numpy. Needless to say, there isn't much fun in using Python that way, so I might as well write the damn thing in C.

Later I went on to rewrite that code in C and running it on my GPU with OpenCL. I did write it in a structure of arrays style, so maybe the Python experience gave me a valuable lesson. Now it runs fast enough for my purposes.


With a decent implementation you can write performant code in a dynamic language.

An example in Common Lisp using the SBCL compiler on an Intel Core2 Duo @ 2.66GHz w/ 6G of RAM:

  CL-USER> (defparameter *nums* (make-array 3000000 :element-type 'float :initial-element 0.0))
  *NUMS*
  CL-USER> (dotimes (i (length *nums*)) (setf (aref *nums* i) (random 30.0)))
  NIL
  CL-USER> (time (reduce #'+ *nums*))
  Evaluation took:
    0.375 seconds of real time
    0.379975 seconds of total run time (0.379975 user, 0.000000 system)
    [ Run times consist of 0.197 seconds GC time, and 0.183 seconds non-GC time. ]
    101.33% CPU
    1,000,511,208 processor cycles
    96,010,096 bytes consed
  
  4.4951772e7
  CL-USER>
C isn't always the answer (thought it often is a pretty good default).


I think it's naïve to speak of "performant code" in general. Sure, I used SBCL (and CMUCL, and AllegroCL) to get blazing double-type single-cpu performance. But then I wanted to make use of multiple cores, which is when I ran into a brick wall. Things are better with SBCL now than they used to be, but people still run into bugs related to multithreading, not to mention that CL has little support for concurrent programming.

The definition of "performance" is different for every application. For some, it's multiplying long vectors of doubles. For others, it's making use of 16 cores for complex data structures operations. For others it's about servicing tens of thousands of mostly idle TCP connections (tried to do that with AllegroCL, failed). For others yet, it's about achieving consistent response times, which means concurrent and nearly pauseless GC.


In other words, choose the right tool for the job.

However I think the OP was comparing dynamic vs static languages. Ey made a prototype physics simulation in Python whose source code was nice to read but performed terribly. Ey then decided to write it in C (not a bad choice mind you).

It seemed to me that the OP found the readability of Python (and dynamic languages in general) a useful trait to have while lamenting that their implementations were too slow to be practical for eirs purposes.

Therefore I proposed an example of a dynamic language with the property of being easy to read while being fast in implementation.

Otherwise I think we violently agree.


You'll get an extra factor of perfomance if you do a

  (loop for rand across *nums*
    summing rand)
instead of using the generic reduce.


if you are for perfomance don't use reduce, use loop and time get reduced from 0.38 seconds to 0.049 seconds.

  (time (loop for x  across *nums* sum x))
  Evaluation took:
  0.049 seconds of real time


I m in the same situation right now. Can you suggest a good plotting library for c/c++? I miss matplotlib.


I've used Qwt for decent results. It may be very heavy if you're not already using Qt, though, and the way its 2D plot is implemented makes it unusable if you need to update your data many times per second (I eventually just put my data in a texture and displayed it via OpenGl, doing the scaling manually). Another solution is to call gnuplot, matlab, or whatever through the command line.


> nice piece of code

Too many developers are paying way too much attention to the code being "nice" or it being in a "nice" language, and not enough attention to "does it work" "is it fast".


The problem is that when working with prototype code, you want to be able to understand what's going on. Code in structure of arrays style is harder to read. It would be nice that it would run with at least some kind of performance that's even close to what you need in the end.

We definitely need "nice" languages, because they give a huge productivity booster over writing stuff in C or other low level language.

However, many nice languages tend to be implemented using an interpreter, a crappy garbage collector and a giant lock to prevent anything with threads. This makes them unwieldy for applications where latency or throughput is important.

Recently the trend has been towards languages implemented with a compiler, using smart static typing disciplines and an LLVM-based backend. This is a very positive and welcome change.


Nice code is maintainable code. High-level languages exist for the sole purpose of making code "nice."

You pay enough attention to "is it fast" and you'll find yourself writing everything in assembler. There's a time and place for that, sure, but when you have a high level language that can theoretically optimize idiomatic code into something faster, it should opt to.


You pay enough attention to "is it fast" and you'll find yourself writing everything in assembler

Not everything. Not even then. Just the good bits maybe. Sometimes none of it. It might make me prefer C# over Java for example, because C# can lay down structs linearly in memory so I can throw them to the graphics card directly. Or maybe I'd do all the hard work in C and SWIG it so I could use Ruby.

But what I certainly wouldn't do is attempt to write a physics simulation in python, "idiomatic" or not.


"But what I certainly wouldn't do is attempt to write a physics simulation in python"

Depends on whether performance is a hard requirement or not - I've written engineering simulations of parts of power stations (including one nuclear plant) in Lisp and they weren't particularly fast but nobody bothered because they weren't used interactively.

Eventually I did write Lisp to generate C++ code for the same models when we did want better performance and it worked really well - but I only did that as an optional step when someone needed the extra performance.


> I certainly wouldn't do is attempt to write a physics simulation in python, "idiomatic" or not.

http://numpy.scipy.org/

Life can be good.


But that's the thing - I would argue that NumPy code is not "idiomatic" Python which uses built-in operations/structures, like list comprehensions, tuples, and dictionaries. I had a similar experience during extensive NumPy coding where I thought, why am I not just writing this in Fortran.


> pay enough attention to "is it fast" and you'll find yourself writing everything in assembler

Or microcode. Or dedicated hardware.


Your view might be skewed by sites such as HN.

Most of the code is written under such conditions that "niceness" of the code is 101 item on a priority list. Think thedailywtf and Dilbert.


Depends on what are you working on and what's the goal. I'm frequently on call for maintenance of a distributed system. During review I often return code that is not nice enough. Where "nice" means "will I understand this at 3am, seconds after I get woken up". I couldn't care less if it takes twice as long to execute. (please don't confuse it with not caring about complexity though - I do care if it's N^2 and doesn't need to be)

This doesn't mean it's not fun to wrap some long function into a reduce of a list comprehension. That goes into my fun projects, not stuff that's supposed to run 24/7 and worked on by others.


I can't remember the last time I complained about the speed of something other than my at&t data connection.


The title is link bait to the point of nearly trolling. He didn't do his research, he should have jumped on the Google group or Stack overflow or something before writing this. Someone named Mike in the comments on the OP comes up with a fix that uses Java primitives and that is very fast, in the range of his Lush examples.

http://scottlocklin.wordpress.com/2011/11/30/only-fast-langu...

There's almost nothing to see here, except maybe that there should be better documentation on this kind of optimization.


All the clojure examples test linked lists!

The first example lets lazy evaluation of the random numbers leak into the timing, toss a (dorun tmp) before the timing for a little more sense.

The type annotation ^doubles is useless - it's telling Clojure to expect the definition to get an array of doubles, but then binds it to the same old linked lists.


I was going to point this out. As always, if you aim for high performance, you should know your data structures. And use the right ones.


Can you fix his code and paste it? Why not have him run the bench and post it as an update?


Based on the work of a previous commenter, I posted this code in the blog comments:

  (defn add-rands []
    (let [ds (double-array 30000000)]
      (dotimes [i 30000000] (aset ds i (Math/random)))
      (time (areduce ds i res 0.0 (+ res (aget ds i))))))
This adds 30,000,000 numbers in 73ms on my machine. His Lush code added 30,000,000 in 180ms. I estimate my computer is twice as fast as his, putting them on par. Hopefully he will run the code so we can see all the run times on the same machine. (I never could get his Lush code running.)

Of course, the Clojure code here is fairly involved to do some basic stuff, but if you did things like this often it would not be hard to add some nice syntactic sugar over it.


To me, this reads like someone who used C to add a large list of numbers recursively and concluded C was trash because they ran out of stack.

There's no denying there is a specific problem. People are certainly entitled to decide that solving that problem is too much hassle for them. But how seriously should we take their opinions when they do that?

I'd say it depends on the quality of the effort they did invest. In this case, the OP admits that they used the wrong data structure in Clojure and the right one in Lush, which makes the entire exercise meaningless.


http://pastebin.com/2fGefex4

  $ gcc -o addalot --std=c99 -Os addalot.c
  $ ./addalot
  [1500117.315354] 0.000032 per iteration on average.
Works fine for me.


This is not the correct C program, this is a C program for the compiler with tail call optimization (or with the stack large enough for it to work). There's no TCO in the C standard.


There is no restriction against TCO in the C standard either. TCO is a valid optimization for a compiler to apply. So what is incorrect about it?


Because it relies on the particular implementation of C compiler, not on the C standard. Take -Os flag away or use a compiler that doesn't have TCO, and the program has a bug.

See also:

* using memcpy() for overlapping regions (relies on the particular implementation of memcpy(); was exposed by a change in glibc - http://lwn.net/Articles/414467/);

* memory aliasing assumptions;

* passing NULL, 0 to memcpy() (http://code.google.com/p/spiped/source/detail?r=8);

etc.


It is ansi C standard code. There is an implementation limitation (stack size) that this program runs into. However that doesn't violate the C standard in any way, because the standard doesn't specify available stack space.

So the program is compatible with certain C implementations in certain configurations. Given the C standard allows a C implementation to pick any stack size they like (including zero) we could claim that about all valid C programs that make any use of the stack.


> There's no TCO in the C standard. That doesn't make tail-recursive C programs any less correct.


They're not incorrect, but that's mostly because the C standard also allows you to assume an infinite stack, so recursive programs that ask for absurd stack depths don't violate the standard with or without TCO. But if in practice the program only runs successfully when compiled on a compiler with TCO, it's not very portable, in that it relies on non-guaranteed compiler features for it to actually work.


> In this case, the OP admits that they used the wrong data structure in Clojure and the right one in Lush, which makes the entire exercise meaningless.

This right here is the crux of the flawed hypothesis of the entire article.


Languages where the upgrade path to fastness isn't horrible are okay with me also. In Common Lisp, for example, you can start adding in type declarations in speed-critical areas, which Clojure also has something similar to (not as familiar with it, so not sure how it compares to CL's type declarations).


Probably should have added "for large-scale numeric computations" to the title :P


Yes, but then the flamebait/trolling factor would be gone. The title is such obvious flamebait that I hesitated whether to flag it.


Here is another place where Go hits the sweet spot between performance and expressiveness/simplicity.

Also by giving you control over how memory is laid out you can be much more efficient both in space and time, and when you need to go even faster you can optimize things quite tightly.


Guy Steele said it better though. He wants not the fastest language, but the fastest way to get to the solution. It may mean that the language isn't as fast to run, but faster to program.


Interesting that the people who complain about performance are almost always in, ahem, 'quantitative finance'? The LMAX stuff, this guy...

I'm just a simple caveman, but I can't help but think the world wouldn't be dramatically worse off if the HFT bots ran a touch slower.


Learn to be more calm and patient. Maybe you need to think in another way, if you use another language and another VM. Maybe this language or VM really isn't strong for what you want to do, but for what other tasks.

Also you might stop thinking in absoluta. There are no fast languages. Every good and well known language has their strengths and their weaknesses and all optimizations are a tradeoff. If you get stronger in one thing you MUST get worse something else. So if you have a fixed set of problems you really just need to find out which language solves your problemspace best. And if you find it, that doesn't say this language is better then the others.


Synthetic benchmarks are rarely useful.

If you want to benchmark two languages you need to do this:

Find N coders skilled in language X and N coders equally skilled in language Y. Have all of them code several small but realistic systems of various kinds, then benchmark the results and compare. Better yet, benchmark real, live systems of comparable purpose built using different languages. Anything else leaves you open to having your results dominated by differences in level of coding skill, quirks specific to your synthetic benchmark, and effects that are only prominent for tiny, overly simplistic programs.


Very well-written, but it's a non-story.

Check the comments. Mike has contributed a clojure solution using `double-array` that runs fast, without blowing up the heap with a linked list.


But there are no fast languages, only fast implementations...


... and benchmarks that compare linked lists to ontiguous vectors


Not sure what he means by "interesting". I'm often delighted to learn new languages.. and even ones that are only on paper.


One more benchmark for Clojure fans:

  R> system.time(sum(runif(3000000)))
     user  system elapsed 
    0.100   0.000   0.101 
  R> system.time(sum(runif(30000000)))
     user  system elapsed 
    1.037   0.020   1.058 
not as good as lush, yet human syntax and Scheme scoping can compensate.


Mathematica

  In[1]:= AbsoluteTiming[
   Nest[RandomReal[] + # &, 0, 3000000]]

  Out[1]= {0.3276005, 1.50067*10^6}

  In[2]:= AbsoluteTiming[
   Nest[RandomReal[] + # &, 0, 30000000]]

  Out[2]= {2.9952052, 1.5001*10^7}
Not bad I think.


Oh god, you know what's just as bad as a slow language? That Wordpress iPad theme. Please stop using that, it's ridiculously unusable.


So I think you should stop with clojure and start to code in asm.


This guy should look at Scala: Functional, runs on the JVM, and has performance comparable to "normal" Java code.


You mean like this guy? https://news.ycombinator.com/item?id=3292555

Scala vs. Clojure is not something to discuss for this use case, I guess. As https://news.ycombinator.com/item?id=3293261 points out in that other thread, you won't write (recognizable) Scala (or Java. Or Clojure) code anymore if you want the best performance.


With respect to the second link: It always helps to know your data structures & their purposes--no matter what language you use, recognizable or not.

With respect to performance: you might want to try writing a c-like solution and run the code distributed over several machines.


A language can't be fast; a language is a notation. Only implementations can be fast or, perhaps, generate fast code.

For an example, look at GHC versus some toy-Haskell whipped up in Common Lisp. Now, with those two implementations to hand, is Haskell 'fast' or 'slow'?


A language can't be fast; a language is a notation. Only implementations can be fast or, perhaps, generate fast code.

Theoretically, maybe. But it is nearly impossible to come up with an implementation for Ruby, that is as performant as a compiled C program (and retains the characteristics of Ruby).

So, it is definitely true that it easier to compile some languages to fast machine or byte code than other languages. As a consequence, people will call Ruby slow, and C fast.


> But it is nearly impossible to come up with an implementation for Ruby, that is as performant as a compiled C program

PyPy is getting there.

JIT, Runtime type detection and path optimization go a long way. And it's much easier to write correct Ruby than correct C.

When I went to college, the HP-41 was HP's top-of-the-line calculator and all the rich kids had them. I had a BASIC-programmable CASIO PB-700. In the end, I was able to finish tests in less time than it took the HP guys to program their calculators to spit the answer. I think it was the first time (1986 or so) I realized a more expressive programming language could be a decisive advantage over your competition.


"And it's much easier to write correct Ruby than correct C."

No, it isn't. C may be spartan, but its failure modes are predictable and manageable. A ruby program can be incorrect for a host of reasons completely unrelated to the program (interpreter bugs, garbage collection problems, etc.) and the language as a whole provides features (dynamic typing, method injection, exceptions) that make it harder to write correct programs.

Ruby may make it easier to write programs, but it definitely doesn't make it easier correct programs.


Eh, Ruby's failure modes are predictable too. I've never tracked a bug down to a "garbage collection problem", whatever that is. But I certainly have come across problems with C's lack of type safety, lack of memory safety, lack of GC, etc.


PyPy is getting there.

...on some be benchmarks. It's not near the stage when it beats C or C++ in general.

And it's much easier to write correct Ruby than correct C.

C is not very type safe, but there are languages that are and have comparable performance to C (Ada, C++, etc.).


Rubinius?


"A language can't be fast; a language is a notation. Only implementations can be fast or, perhaps, generate fast code."

That's not entirely true. A language semantic limits the possible implementations.

And yes, I know about the sufficiently smart compiler. ;)


This is true, and I'm often peeved by exactly the same conflation. However, in cases like Clojure, where there's (as far as I know) really only a single implementation out there to be used, it's easy to blur the lines. Python is getting less so thanks to PyPy, though my perception of that is that thus far it's more on an "awareness" level than it actually being in widespread production use (though hopefully as it matures we'll see more of the latter).

The languages mentioned in this blog post present an interesting sort of in-between case though, since at some level you could almost regard them as being a collection of implementations of the same language (Lisp), rather than as actually distinct languages. I realize that's stretching some definitions a bit, but I think there's kind of a continuous spectrum there between "different language" and "different implementation of the same language". I mean, GCC implements extensions to C that aren't in the C standard and aren't (to my knowledge) supported in, say, MSVC; some refer to this as "GNU C", though I'd guess most non-language-lawyers looking at it would probably still call it "C" anyway.

On the whole though, I agree with you.


I agree with your main point but think Clojure is not a good example as there are 3 "official" implementations: the JVM one, the CLR one and ClojureScript.


Haskell is fast ;-) (and beautiful)

   main = putStrLn . show . sum . take 3000000 $ repeat 1
(Ignoring the random generation, that does not play a role here, does it?)


Neat!

What if random generation played a role, how would that look and what times do you get, just curious?


With System.Random loaded,

    putStrLn =<< show . sum . take 3000000 . randomRs (0, 100::Int) `fmap` newStdGen
::Int is necessary to nail down exactly which numeric type we're dealing with, as nothing else in the expression nails down an exact type.

Personally I prefer loading Control.Applicative and using <$> where the `fmap` is, but I thought I'd keep the dependencies low. This demonstrates both how adding IO to an expression can be a bit of a pain, and that Haskell has come up with tools to deal with this without turning it into the naive "do" block with three or four lines a newbie would turn out.

I wouldn't expect miracles on the random number generation speed. Also in this context you may not be getting the best optimizations that you would get if you really cared. It so happens that Real World Haskell uses a virtually identical example in its optimization chapter, where by the end they get down to what is essentially the optimal non-SIMD assembler, if you're really interested in the dirty details: http://book.realworldhaskell.org/read/profiling-and-optimiza...


   putStrLn =<< show . sum . take 3000000 . randomRs (0, 100::Int) `fmap` newStdGen
That is truly beautiful.


In Matlab:

   >> tic;sum(rand(1,3e6));toc
   Elapsed time is 0.059096 seconds.


If you want to play golf I bet J can beat that. Haskell is not a domain-specific language as Matlab is.


More practical example: CPython vs. PyPy. Also applies to CPython and CRuby vs. their JVM and CLR counterparts like Jython, IronPython, JRuby, etc.

However, a language will impose some limitations on the implementation, be it a compiler or an interpreter. Dynamic typing is perhaps the most concrete example of this.


The idea of a 'fast language' is still nonsense, given how much faster the best interpreters of dynamic languages are compared to the code generated by the worst compilers of static languages.


OK. So what is the faster Clojure implementation for him to try this out on? I don't know of one.


Or you could just use java...

The title says only fast languages are interesting, then rules out using Java because it's not lispy enough :/

Either you want speed, or you want your favorite syntax and high level stuff. You can't have both.

FWIW My favorite syntax happens to be c/java/assembly type stuff. I'd hate to be one of the developers who hates those, must put you at a big disadvantage - as shown in the original post.


Why not? Apparently lush gets it right: C-like speed on numerics with a Lisp-ish syntax this guy likes. Yes, C-style stacks are going to beat Lisp-style closures; but there is no reason why a "fancy" language can't do fast matrix operations, which is 99.9% of numerical code anyway.

Even Scheme has (rather un-Lisp-ish) vectors in addition to the standard linked lists.


> C-style stacks are going to beat Lisp-style closures

It has been a long time since the "λ the ultimate..." series of papers gave the lie to this myth.


I once made the joke that a Java web app usually catches up with a Django one in about six months. The Java app is faster, but, the Django one stars serving users three months before the Java enters live beta.

And if you are serious about number crushing, my friends tell me you should use FORTRAN, not Java.


Here's the summation problem in Fortran, quite succinct, IMO:

    real, dimension(30000000) :: a

    call random_number(a)

    print *, sum(a)
That's 30 million doubles, takes 0.04s on my machine.


Except if you're like 99% of engineers whose favorite syntax is C/C++-ish and are happily using C, C++, C#, Java or maybe Python.


Remeber kids "real" programmers use FORTRAN


    real, dimension(30000000) :: a

    call random_number(a)

    print *, sum(a)
That's 30 million doubles, takes 0.04s on my machine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: