Timing data comparing CClasp to C++, SBCL and Python

jlarocco · on Aug 1, 2015

People are missing the point with the Python code. The C++ code and the Lisp code aren't particularly optimized, either. The point of the benchmark is to compare the relative speeds of roughly the same Fibonacci code out of the box.

On SBCL, a memoized recursive fibonacci is about twice as fast as the Lisp code given also running in SBCL on my machine, for example.

I'm more suspicious about why the C++ code is so slow.

Edit: I wrote my own C++ Fib code and tried benchmarking it outside of Clasp (https://gist.github.com/jl2/4d74958b02b3caea2f5c). It routinely ran in less than 0.005 seconds, which seemed too fast, so I looked at the assembly output. AFAICT, the compiler is smart enough to realize fib() has no side effects and is "pure", and is computing the value at compile time, reducing the function call to essentially be myfib = 8944394323791464. It almost seems like an unfair comparison, but since it's comparing compiler performance, I think it's relevant information.

drmeister · on Aug 1, 2015

Thank you, you summed it up better than I could.

Fede_V · on Aug 1, 2015

I was a bit curious, so I did a short experiment:

http://nbviewer.ipython.org/urls/dl.dropbox.com/s/l9naqibqyt...

Using numba, and adding a one line decorator to the function (without any changes whatsoever) we get around 2 orders of magnitude speed up.

Note - this doesn't involve any fancy re-writing, annotating, etc, you literally just add a decorator.

Writing really fast numerical code in Python is very easy. There's absolutely no reason not to use numba if you have small functions that just do number crunching. numba can even inline other numba functions - so you don't even pay the function call overhead.

agumonkey · on Aug 1, 2015

It's funny, reminds me of old arguments when CL code could be made fast by adding a tiny (declare ...). Now it's python turn. That speed increase is damn impressive though.

lqdc13 · on Aug 1, 2015

I love numba, but I am never successful at compiling it.

Using the Anaconda distribution results in some issues as that distribution is not compatible with some other libraries and requires installing everything through conda.

Most packages use Cython for this reason instead.

ycnews · on Aug 2, 2015

I failed to get numba installed on a Raspberry Pi 2, and I'd been wanting to try Nim, which turned out to be easy to compile on the RPi running Ubuntu.

  nim -r -d:release c rw_fibn.nim                                                  
  ...

  Hint: operation successful (12152 lines compiled; 2.632 sec total; 8.870MB; Release Build) [SuccessX]
  /home/rw/git/Nim/examples/rw_fibn

  Result = 8944394323791464
  elapsed time: 5.376946926116943

My first try at translating Python to Nim:

$ cat rw_fibn.nim

  import times

  proc fibn(reps: int64, num: int): int64 =
    var z: int64
    for r in 1..reps:
      var
        p1, p2: int64 = 1
        rnum: int = num - 2
      for i in 1..rnum:
        z = p1 + p2
        p2 = p1
        p1 = z

    return z

  var start: float = times.epochTime()
  var res = fibn(10_000_000, 78)
  var finish: float = times.epochTime()

  echo("Result = ", res)
  echo("elapsed time: ", finish - start)

once-in-a-while · on Aug 1, 2015

Really impressive, but in CClasp (which will get even faster in future), you'll have (once finished) this kind of optimizations for ALL of the functions, both built-in and your own.

srean · on Aug 2, 2015

http://c2.com/cgi/wiki?SufficientlySmartCompiler

dr_zoidberg · on Aug 2, 2015

Your code runs about 85x faster than the original Python code, that makes it just 1,20x slower than the C code they took as base, so about 3x faster than the CClasp code. With just a decorator.

robmccoll · on Aug 1, 2015

In Python 2.7, shouldn't you be using xrange() rather than range()? xrange() is a generator whereas range() will actually create the entire list and iterate it.

In case anyone isn't aware: in Python 3, range()'s implementation was effectively replaced with that of xrange().

0942v8653 · on Aug 1, 2015

For me, range (the posted source) took 95 seconds, and xrange was only a little better at 85 seconds.

I think most of the benefit of xrange comes from the decreased memory usage, not from lower CPU usage. But xrange is definitely closer to what the other code is doing.

dekhn · on Aug 1, 2015

that's almost a 10% improvement- not just a little better!

dr_zoidberg · on Aug 2, 2015

It's Python 3 code, he used the print function without importing from __future__.

wtbob · on Aug 2, 2015

CClasp is looking really, really interesting. I'm still enjoy SBCL, but on the right project—why not try CClasp?

rch · on Aug 1, 2015

Cython would make sense in this context (~19 seconds for me).

igouy · on Aug 1, 2015

>>would make sense in this context<<

Not really -- "I don’t want to start an argument about the speed of SBCL vs C++ here, my point is that CClasp has come a long way from being hundreds of times slower than C++ to within a factor of 4."

rch · on Aug 1, 2015

I meant only that Cython results would tell me more about relative CClasp performance than a comparison CPython does.

I didn't want to get into the optimization game either, but I'm happy someone here reminded me to try numba.

markikka · on Aug 1, 2015

Using Numba (you just adnotate the function with @numba.jit) speeds it up from 51 sec to 0.46 for me

rch · on Aug 1, 2015

Brilliant. ~0.49 seconds for me too.

lake99 · on Aug 1, 2015

Or even Pypy. This is how they measure up:

    % python2 fibtime.py
    elapsed time: 83.330840 seconds

    % python3 fibtime.py 
    elapsed time: 89.325043 seconds

    % pypy3 fibtime.py
    elapsed time: 5.799757 seconds

igouy · on Aug 1, 2015

http://pybenchmarks.org/

ngoldbaum · on Aug 1, 2015

Should be a lot faster with type annotations.