Aren't many bash programs written in C? So this is implying Julia is somehow faster than C? Obviously that can't quite be true, but I can definitely imagine that the algorithms implemented in Julia could be fast faster than other algorithms - since the community has such a heavy influence of very hardcore mathematicians that have a string stress towards speed. Probably most of the algorithms in Julia are state of the art and push the boundaries on time complexity.
I was talking about bash scripts. But outperforming C with Julia is perfectly possible. Julia JIT compilation means you can remove overhead of a lot of function calls which C cannot do. A simple example would be sort taking a function pointer doing object comparison.
High level functional style code with things like map and filter can frequently be JIT compiled to optimal machine code.
Fortran is considered faster for numerical code than C and well polished Fortran libraries like BLAS is already getting outperformed by Julia.
For typical systems programming with need to tight control of memory and real time system C will still have the edge. But for anything crunching lots of numbers like data analysis or machine learning Julia will likely outperform everybody else.
And the "how" behind Octavian.jl is basically LoopVectorization.jl [1], which helps make optimal use of your CPU's SIMD instructions.
Currently there can some nontrivial compilation latency with this approach, but since LV ultimately emits custom LLVM it's actually perfectly compatible with StaticCompiler.jl [2] following Mason's rewrite, so stay tuned on that front.
Thanks. But how LoopVectorization.jl is helping here, say comparing to C/Fortran
optimized w.r.t. to the CPU? Is there somewhere in their doc mentioning this?
The basic answer is that LLVM doesn't do as good a job with some types of vectorization because it is working on a lower level representation. There are several causes of this. One is that LoopVectorization has permission to replace elementary functions with hand-written vectorized equivalents, another is that it does a better job using gather/scatter instructions.