Hacker News new | past | comments | ask | show | jobs | submit login
Timing data comparing CClasp to C++, SBCL and Python (drmeister.wordpress.com)
45 points by OopsCriticality on Aug 1, 2015 | hide | past | favorite | 22 comments



People are missing the point with the Python code. The C++ code and the Lisp code aren't particularly optimized, either. The point of the benchmark is to compare the relative speeds of roughly the same Fibonacci code out of the box.

On SBCL, a memoized recursive fibonacci is about twice as fast as the Lisp code given also running in SBCL on my machine, for example.

I'm more suspicious about why the C++ code is so slow.

Edit: I wrote my own C++ Fib code and tried benchmarking it outside of Clasp (https://gist.github.com/jl2/4d74958b02b3caea2f5c). It routinely ran in less than 0.005 seconds, which seemed too fast, so I looked at the assembly output. AFAICT, the compiler is smart enough to realize fib() has no side effects and is "pure", and is computing the value at compile time, reducing the function call to essentially be myfib = 8944394323791464. It almost seems like an unfair comparison, but since it's comparing compiler performance, I think it's relevant information.


Thank you, you summed it up better than I could.


I was a bit curious, so I did a short experiment:

http://nbviewer.ipython.org/urls/dl.dropbox.com/s/l9naqibqyt...

Using numba, and adding a one line decorator to the function (without any changes whatsoever) we get around 2 orders of magnitude speed up.

Note - this doesn't involve any fancy re-writing, annotating, etc, you literally just add a decorator.

Writing really fast numerical code in Python is very easy. There's absolutely no reason not to use numba if you have small functions that just do number crunching. numba can even inline other numba functions - so you don't even pay the function call overhead.


It's funny, reminds me of old arguments when CL code could be made fast by adding a tiny (declare ...). Now it's python turn. That speed increase is damn impressive though.


I love numba, but I am never successful at compiling it.

Using the Anaconda distribution results in some issues as that distribution is not compatible with some other libraries and requires installing everything through conda.

Most packages use Cython for this reason instead.


I failed to get numba installed on a Raspberry Pi 2, and I'd been wanting to try Nim, which turned out to be easy to compile on the RPi running Ubuntu.

  nim -r -d:release c rw_fibn.nim                                                  
  ...

  Hint: operation successful (12152 lines compiled; 2.632 sec total; 8.870MB; Release Build) [SuccessX]
  /home/rw/git/Nim/examples/rw_fibn

  Result = 8944394323791464
  elapsed time: 5.376946926116943
My first try at translating Python to Nim:

$ cat rw_fibn.nim

  import times

  proc fibn(reps: int64, num: int): int64 =
    var z: int64
    for r in 1..reps:
      var
        p1, p2: int64 = 1
        rnum: int = num - 2
      for i in 1..rnum:
        z = p1 + p2
        p2 = p1
        p1 = z

    return z

  var start: float = times.epochTime()
  var res = fibn(10_000_000, 78)
  var finish: float = times.epochTime()

  echo("Result = ", res)
  echo("elapsed time: ", finish - start)


Really impressive, but in CClasp (which will get even faster in future), you'll have (once finished) this kind of optimizations for ALL of the functions, both built-in and your own.



Your code runs about 85x faster than the original Python code, that makes it just 1,20x slower than the C code they took as base, so about 3x faster than the CClasp code. With just a decorator.


In Python 2.7, shouldn't you be using xrange() rather than range()? xrange() is a generator whereas range() will actually create the entire list and iterate it.

In case anyone isn't aware: in Python 3, range()'s implementation was effectively replaced with that of xrange().


For me, range (the posted source) took 95 seconds, and xrange was only a little better at 85 seconds.

I think most of the benefit of xrange comes from the decreased memory usage, not from lower CPU usage. But xrange is definitely closer to what the other code is doing.


that's almost a 10% improvement- not just a little better!


It's Python 3 code, he used the print function without importing from __future__.


CClasp is looking really, really interesting. I'm still enjoy SBCL, but on the right project—why not try CClasp?


Cython would make sense in this context (~19 seconds for me).


>>would make sense in this context<<

Not really -- "I don’t want to start an argument about the speed of SBCL vs C++ here, my point is that CClasp has come a long way from being hundreds of times slower than C++ to within a factor of 4."


I meant only that Cython results would tell me more about relative CClasp performance than a comparison CPython does.

I didn't want to get into the optimization game either, but I'm happy someone here reminded me to try numba.


Using Numba (you just adnotate the function with @numba.jit) speeds it up from 51 sec to 0.46 for me


Brilliant. ~0.49 seconds for me too.


Or even Pypy. This is how they measure up:

    % python2 fibtime.py
    elapsed time: 83.330840 seconds

    % python3 fibtime.py 
    elapsed time: 89.325043 seconds

    % pypy3 fibtime.py
    elapsed time: 5.799757 seconds



Should be a lot faster with type annotations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: