> "I don’t see Julia a good replacement of Python. Julia has a long startup time. When you use a large package like Bio.jl, Julia may take 30 seconds to compile the code, longer than the actual running time of your scripts."
I think this is the crux of it. Python may not be the fastest language out there, but it's definitely one of the fastest languages to develop with, which is why it's such a hit with the academic research crowd.
Static typing and compilation are great features for many use cases, but if you are just prototyping something (and most academics are never going to write "production" code), it's nice to be able to try something, test it and then iterate and try again. When you have to compile, that iteration loop takes longer.
A few people here mentioned GoLang as a good alternative. Personally, I love the language, but it's really not very good for quick prototyping. As an example, the code won't compile with unused variables. This is great for production code but very impractical when you are testing some new algorithm.
I don't think this is entirely accurate. When you develop software, you don't repeatedly unload and load your dependencies (which is what causes Julia's startup time). You probably only develop one package, which you then reload one in a while. That causes very little lag in Julia. Similarly when working interactively, there you only suffer from startup time when you launch your session and use your heavy dependencies the first time.
Also, for what it's worth, I recreated his benchmarks in Julia and found a startup time of 5-6 seconds on my laptop. Not nothing, but compared to the time any FASTQ-processing operation usually takes, it's insignificant.
Julia' compile time latency does prevent you from e.g. calling a short Julia script in a loop, or from using Julia scripts that both have large dependencies and do very short-timed tasks. But I find that to be not very common in my workflow - especially since I usually don't use Julia as a small glue layer between C programs the way Python is often used.
A "prototype" software is sometimes a nice way to name some junk code. When I read that most academics are never going to write production code, they're prototyping, they need something to develop fast... it looks like most academics value fast writing more than correctness, which is not my experience.
As an anecdote, a friend of mine is an applied mathematician specializing in PDE and fluid dynamics. He used to code with C++, because that was what was used in the labs where he worked. Then he discovered Python and he thought it was so much easier, and still quite fast thanks to optimized libraries (written in C++ with a Python interface). But after a few months his enthusiasm had disappeared because of the runtime errors of his Python code. He didn't want to go back to full C++, though, and he still codes mostly with Python.
> "He didn't want to go back to full C++, though, and he still codes mostly with Python."
Exactly. It seems like he understands the tradeoffs and prefers quick iteration over "correctness".
In my experience, for computational "academic" code, the hard part is often coming up with the correct algorithm, not implementing that algorithm correctly. In software engineering, it's often the opposite.
You said it's not your experience, but then you give an anecdote which says where because fast writing was prioritized, he ended up with runtime errors...
Maybe the two-language problem is not a problem after all. I've been coming around to the idea of writing the core algorithms in a fast language with a lot of features around correctness, and the rest in whatever language is most productive. Learning another language isn't so bad as long as it's not an utter beast like C++.
Ideal if you can swing it. One of my professors used to prototype in Python for development speed and then re-implement in C++ for runtime speed. You get the best of both worlds in a sense.
That looks super fun! Are they still using this? I was at MS a few years before that, still dealing with a few legacy things in A+. There were hopes that we'd switch to something like F#, but we ended up just re-doing most of the mortgage analytics in kdb/q, which was fun in its own way.
Yep still used (pre and post trade, processing millions of orders and billions of market data events per day).
I'd like to grow it into a successor to kdb, though kdb is very entrenched in finance and the company is extremely litigious (they threatened to sue me).
F# is great too, and all of the other ML variants. We'll get there eventually, it's inevitable, but there's a lot of institutional inertia.
Note, Heng Li[0] is a significant figure in Bioinformatics software. Most notably, he is the author of BWA[1] (Burrows-Wheeler Alignment). BWA performs a large percent of all sequence alignments worldwide.
The computer language benchmarks game[1][2][3] may be of interest here. It benchmarks C, Python, Javascript, and Julia, in several tasks involving FASTA input (regex-redux, k-nucleotide, and reverse-complement), but implementations are bespoke rather than relying on libraries. Relative timing is much more favorable for Julia in these benchmarks. Looks worse for Python outside regex-redux.
I recently spent some time in the Lua section of the benchmarks game. It is a sad place for a few reasons:
- Lua programs cannot use shared memory concurrency or subprocesses with 2-way communication with the master process.
- Lua programs run on a very slow runtime compared to the fastest Lua runtime.
My impression after this is that for languages that aren't super fast and don't include all the primitives one could want, benchmarks like reverse-compliment are mainly measuring whether the language's standard library includes some C function that does the bulk of the work.
I would love it if Isaac included LuaJit and pypy in the benchmarks game, but ultimately I get it; it's just one guy's project, and he doesn't want to spend the time to maintain it across the entire incredible diversity of programming languages/implementations[1].
To a great extent any "language" benchmark (for languages that don't compile to efficient machine code) is certainly a benchmark of the language's standard library. I'm not sure there's a way around that reality. Are there external Lua libraries that allow shared-memory concurrency? If so, it's probably worth opening an issue asking whether those libraries could be allowed[2]; it might just be that nobody has submitted a program making use of Lua shared concurrency.
No, it does not mean that. I'll submit my multicore pfannkuchen-redux and reverse-compliment when I get around to it, and look at other problems after that.
The pfannkuchen-redux is just a bit hampered by uneven work sharing.
For reverse-compliment, it's a bit more trouble to work around the lack of 2-way communication. My implementation writes the entire input to stdout, then workers use fseek on stdout, which only works if you are piping the output of the command to a file. That is, it generates correct output if you run "lua blah.lua > out" but not if you run "lua blah.lua | cat > out" Additionally, since there's no pwrite and no way of getting a new open file description for stdout, I must cobble together a mutual exclusion mechanism to prevent workers from seeking while another worker tries to write.
A design objective of PUC-Rio Lua is to be pure ANSI C. I'm not certain, but my impression is that this imposes some unreasonable restrictions on the implementation. An additional design objective is to be small.
I think people don't usually write Lua programs intending to run them inside the binary you get when you build PUC-Rio Lua without any additional C libraries. Libraries like LPeg and lua-gumbo are Lua wrappers around C code. For C libraries that do not have Lua wrappers, people can more or less paste the preprocessed C header file into their Lua source file and use Luajit's FFI to use the library. This last approach is similar to how the Python regex program mentioned elsewhere in these comments works :). It's also common to use frameworks like Openresty or Love2d that provide the innards of some complex threaded program to user Lua code.
Outside of benchmark games and work, I'm working on some code that uses threads and channels, but the threads and channels are provided by liblove.
So I guess I can say, it has been addressed, but it won't be addressed in the standard library.
I'm aware. To a large extent both regex-redux and pidigits are measures of the overhead of FFI for all non-C/C++/Rust languages. Rust is very cool in that one for actually using a regex engine implemented in Rust; definitely has my admiration.
But yeah, just wanted to make sure people are actually looking at the code. The top submission for Python, for example, is not how I've ever seen anyone use regexes in Python. That's an important dimension to evaluate in these discussions! (But not the only one, of course.)
> Rust is very cool in that one for actually using a regex engine implemented in Rust
One wonders how long it will be until someone submits a Rust program that uses PCRE2.
Looking through the julia code, I saw quite a bit of memory management, which did surprise me. Generally with Julia, you want to allocate memory, once, to avoid penalties of reallocating memory and gc.
I pulled the code down, placed the data on a ramdisk so as to not impact benchmark measurements, with physical issues.
I built the C code, and ran the two julia codes. My timing looked like this:
version t(raw) t(gz)
c1 1.47s 8.31s
jl1 3.80s 15.82s
jl2x 5.85s 17.86s
py1 fails
py2 6.92s 29.62s
I don't have lua, nim, or crystal on my machine. This is Julia 1.4.1 BTW. Running Linux Mint 19.3 on my laptop, 5.3.0-51-generic kernel.
Beyond putting the data and compressing it with pigz for the compressed version, no optimizations were done. Putting the data on ramdisk optimizes all codes.
My thoughts: The author noted that Julia has long startup/run times. This is true, for the first compilation of modules you use. As a reflex these days I (and I am guessing most Julia users) do a "using $MODULE" after adding it. This makes the startup times less painful, for most modules used. Plotting, with the Plots module, is still a problem, though it has gotten dramatically better over time.
Basically, if you run your code more than once, with the modules being compiled into your cache, the nice part is that startup time is significantly better. Such that Python reverts to its lower performance than Julia. If the startup time on first run is important (think of it like a PyPy compilation step along with a run of the code), and you'll only ever run a code once, and for the less than 1/2 minute that it will take for this example, use whatever it is you are comfortable with.
FWIW, the author noted, with implied disdain, that Julia users are telling them that they are "holding the phone wrong." Looking over the code, specifically all the memory allocation bits, I could see that. Basically, I'm not sure how much, if any of that, is actually needed.
That said, very limited critique of the tests. I like to see "real world" examples of use. Kudos to the author for sharing!
[edited to "fix" table ... not sure how to do real tables here]
I would be curious to see what the performance is like in Go. I've been experimenting with it a bit lately, and coming from a mostly C and Python background, I have thus far found it easy to pick up. I haven't tested the performance for my own use cases very much yet, but I am told it compares very well.
I think the real challenge for scientific computing (I'm a graduate student, so this is most of the programming I do) is that there is already a huge network effect around NumPy + SciPy + matplotlib and friends. Golang just doesn't quite have the library ecosystem yet, although gonum[0] shows some potential.
In my limited experience so far, I think Go is good in an environment where most people are going to have experience with C and/or Python. It also makes it much harder to write truly crappy code, and it's much easier to get a Go codebase to build on n different people's workstations than C.
Having written a lot of Python, and relatively little Go, I think I would prefer to write scientific code in Go if the libraries are available for whatever I'm trying to do.
It's also much easier to integrate Go and C code, compared to integrating C and Python.
Personally I think numerical code should still mostly be written in C++, right now it still has by far the widest choice of options for doing so. It is also relatively easy to interface with python. For example both xtensor, libtorch/ATen, arrayfire have straightforward python interoperability via pybind11.
Finally no other language except for maybe FORTRAN has seamless parallelisation support and first class low level numerical primitives developed by vendors. Sometimes you will get a massive performance increase by #pragma omp parallel for.
Even for visualization some python libraries will suddenly fall off a cliff (Altair) once you reach a moderately large number of datapoints.
I would definitely agree that it depends on what kind of scientific computing you are doing.
For big numerical stuff and things that need to run on supercomputers, C/C++/FORTRAN are definitely very relevant and I don't see that changing. Likewise for edge stuff that has to run on bare metal or embedded, I think we're still going to be using C/C++ for a long time to come.
"Scientific computing" is a huge range of different use cases with very different levels of numerical intensity and amounts of data. I doubt very much that there would ever be a one-size-fits-all approach.
However in the context of the OP, I'm arguing that Go would be preferable to Python for the purpose of writing bioinformatics models, and certainly more suitable than Lua or JavaScript.
Of course Python can sometimes be very performant if you leverage NumPy/SciPy, since those are ultimately bindings into the FORTRAN numeric computing libraries of yore. But if we're talking about writing the inner loop, and the choices are Go, Python, Lua, and JavaScript, I think Go is going to win that on the performance and interoperability fronts handily (I omit Crystal, as I am not familiar with it).
Even in the context of bioinformatics my comment applies. With modern C++ libraries you are able to replicate the numpy user experience almost line by line. A baseline FASTQ parser in modern C++ would look nothing like the fastq.h C parser the author presented. Naive versions of sequence alignment algorithms like Needleman-Wunsch are easily implemented in C++ aswell and you can even do most of your development in a Jupyter notebook with cling/xeus.
I'll take your word for it; I haven't worked in that field.
I do still think it would interesting to see a comparative benchmark though. I know the Go compiler tries to use AVX and friends where available. I doubt it will ever beat a competent programmer using OpenMP to vectorize though (though Goroutnes might be competitive for plain multithreading).
A relevant consideration too -- OpenMP seems to be moving in the direction of supporting various kinds of accelerators in addition to CPUs*, so your C++ code has a better change of being performance-portable to an accelerator if you need it.
Note the go compiler is actually fairly immature compared to C++ compilers. It does not have any AVX autovectorization. Any AVX optimization is manual assembly by a library author.
I would expect Rust to be faster. I never claimed Go would be faster than Rust, but that it would be faster than Python.
I think for rust, the barrier of entry is high in that it is a difficult language to learn. Admittedly, I don't know rust, but people I know who do have said as much.
I think Go strikes a good balance of being easy to learn and use, having a rich standard library, and also being performant.
I reviewed the first Julia benchmark. The second one is based on this not-very-thorough "experimental" lib I've never heard of from a person not very well-versed in Julia, so I don't think that's representative at all.
For the uncompressed FASTQ benchmark, my relative timings differ from his, with Julia being ~30% faster. It's probably because he installed an old version of FASTX.jl, that would also explain his seemingly outdated comment about the FASTX.jl source code. Another 30% time can be shaven off by disabling bounds checks, which would put Julia as the fastest program on the list, around the speed of C. However, I don't think the speed increase is worth it, since FASTQ files are basically aæways compressed in real life.
For the compressed FASTQ file, it seems this is entirely explained by CodecZlib.jl being 2x slower than whatever his C solution is using. However, when profiling CodecZlib.jl, it just spends all its time calling `zlib` written in C (calling C has near-zero overhead in Julia). So I have no idea why his benchmark is slower there.
This does not seem to be a "real" library. It has a github, but no readme. It seems to be a julia implementation of some other c library, probably experimental.
Without having dug into the specifics of this particular package, I can say generally speaking that average package quality has in my view been improving a lot over the past few years; two years ago the language was pre-1.0 and people were still figuring a lot of things out. Some packages of course were excellent even then, but those tend to still be actively maintained.
I know many people don't consider Common Lisp as particularly fast language, but actually in my experience you can make it quite fast depending on what you need.
When using SBCL, for example, you get your application to be compiled to native instruction set. Moreover, you get control over optimization level for each piece of the code separately. Even more, you get complete control over the resulting native assembly. Something you don't get to do with other higher level languages.
One of the production applications I did in Common Lisp was parsing a stream of XDP messages (https://www.nyse.com/publicdocs/nyse/data/XDP_Common_Client_...) with a requirement for very low latency. I made the parser in Common Lisp so that it generates optimal binary code from XML specification of message types, fields, field types, etc. using a bunch of macros.
The goal of application was to proxy messages to the actual client of the stream. The proxy was there so that it was possible to introduce changes to the stream in real time without requiring to restart any components. Using REPL I was able to "deploy" any arbitrary transformation on the messages. The actual client of the messages was a black box application that we had no control over but we sometimes had problems with when it received something it did not like.
I liked Common Lisp in particular because it does not force you to make your performance decision upfront. You can develop your application using very high level constructs and then you get the option to focus on parts that are critical for your performance. Macros allow you to present DSL to your application but then have full control over code that is actually working beneath the DSL.
If everything fails, calling C code is a breeze in Common Lisp compared to other languages.
I explored Common Lisp for high performance code once (I work in HFT), the biggest issue was that it didn't have native support for "arrays of structs" (where all the structs would be values stored "unboxed" adjacent in memory). I know it'd probably be possible to write a library to do that, but that'd be a huge amount of work compared to just using a language with existing support for unboxed arrays.
Well, I did some algorithmic trading (not really HFT but still monitoring the market and responding within 5 microseconds of incoming messages).
I would not use Common Lisp on the critical path because I would end up basically rewriting everything to ensure I have control over what is happening so that some kind of lazy logic does not suddenly interrupt the flow to do some lazy thing.
A large part of the application was basically about controlling the memory layout, cache usage, messaging between different cores, ensuring branch predictor is happy, etc which would be really awkward in Common Lisp (technically possible, practically you would have to redo almost everything). We have also experimented with Java with the end result being that the code looked like C but was much more awkward.
I have, however, successfully used Common Lisp to build the higher layer of the application that was orchestrating a bunch of compiled C code and also did things like optimizing and compiling decision trees to machine code or giving us REPL to interact with the application during trading session.
>giving us REPL to interact with the application during trading session
This to me is what the big appeal of Common Lisp for a trading system could be, particularly if it allowed live recovery from errors (dropping somebody into the debugger, rather than just core-dumping), which could save a lot of money by reducing downtime. But as you say it would require redoing everything to make the code fit latency constraints and be cache friendly, which would be a lot of work.
Crystal is really exciting, it somehow 'fell under the radar' but when I heard 'fast, and ruby-like with types' I became very excited.
There's even a Rails-like [0] framework being developed!
Last weekend I spend a couple of hours getting Lucky up and running (it took some doing, I had to borrow a lot from people's Docker images to get it booted and working).
It's a big 'watch this space' situation. The macro system for metaprogramming is very easy to understand in Crystal [1] as well.
I'd recommend you check out https://amberframework.org too. I also like Crystal but Lucky kind of turned me off considering the steep learning curve coming from Rails whereas amber is much more similar.
I think it would be much better if someone asked a group of experts for each language to come up with their best implementation for the tasks. If I just spent a few hours implementing a benchmark in a handful of languages I wasn't proficient with then I think the benchmark rankings would be nearly random or merely dependent on my priors.
> A good high-level high-performance programming language would be a blessing to the field of bioinformatics. It could extend the reach of biologists, shorten the development time for experienced programmers and save the running time of numerous python scripts by many folds.
The author might like to look at J for specific calculations (and Futhark for similar tasks).
I use Haskell, it's not quite on par with C though. Other MLs are wonderful too.
C++ can be a nice language if you are disciplined with it, but being disciplined means not using every feature and using features only when they make code clearer or higher quality.
Unfortunately, it's not only you who must be disciplined, but all of your colleagues/collaborators and to a lesser extent the authors of any libraries you happen to depend upon (or will depend upon). Which is a shame, because writing C++ can be a whole lot of fun, but the aforementioned invariants rarely hold.
Interesting and roughly in-line with what I've read elsewhere. One must note, however, that performance on a couple of benchmarks does not generalize to all problems. Still - perfect is the enemy of good, and it may be that when looking for a high-level language one might have to settle for "good enough."
Forgive my lack of specificity, as I am quite new to Julia, however, I believe that after the initial compile takes place, you do experience significantly faster runtimes, so you can "forgive" the one time delay.
I think you can also have VS Code just load up the packages you expect to use on a regular basis at startup.
but I agree. If you complain that compile takes 11 seconds when the program runs in 30, then I wonder if that's really the use-case where you need every last bit of performance.
Now, on a program that runs for two days straight in Python or Matlab, and Julia reduces that time by half, I can deal with a bit of compile time.
The author got good results from the language that he says he’s proficient in, and bad results from those he says he’s a beginner with. (Read some comments here below explaining his Julia mistakes.) He ran each benchmark one time only—not a very good methodology.
Yeah, because his benchmark is also his ability to quickly write efficient code.
> I am equally new to Julia, Nim and Crystal.
He might be more familiar with a certain style or paradigm, but he wants to switch from Python to something significantly faster jumping through as little hoops as possible.
This is sort of interesting but isn't informative without any idea as to why some languages are slower. What is the difference between the LLVM-IR/assembly produced by C and the one produced by Crystal/Nim/Julia?
Compilers aren't different from pretty much every other kind of software, for which it's generally painfully obvious that two programs that target the same data _format_ will not necessarily output the exact same _data_, especially when they have wildly differing end goals.
I don't think every article discussing benchmarks has to restate that the differences between programming languages are not just syntactical to be informative.
I'm not sure I follow. What I meant is that a benchmark over a specific task is not really informative without a comparison of why it is slower in some languages compare to other.
Very interesting. If I understand correctly, this article is mainly focused on bioinformatics and performance of high level languages in that area.
I wonder why BioPerl (https://bioperl.org/, https://en.wikipedia.org/wiki/BioPerl) has not been included, it would have been interesting. Perl is considered surprisingly fast for certain classes of tasks (for an interpreted language of course, no point in comparing to C for example).
Perl isn't really an interpreted language. It's compiled to a sort of bytecode when you run the program, and executed on a sort of virtual machine. Similar to Java or C#, but much faster to compile. So fast that most people don't even know it's happening.
As for performance, it really helps if you write idiomatic Perl code. There may be more than one way to do it, but some ways are better. For example, an explicit for loop over a list has to be compiled to byte code that steps through the looping, but if you use map or grep the byte code calls a pre-written optimized function that loops as fast as code written in C would be. The more idiomatic your code is, the more optimized calls like that you get, and the faster your code will run.
'Interpreted' is often used to mean 'not compiled to machine code'. Perl fits that bill. So do, say, CPython and Lua, in contrast to JIT-compiling runtime systems like PyPy or LuaJIT or AOT-compiled languages like C or Rust.
> Nim supporters advised me to run a profiler. I am not sure biologists would enjoy that.
We need better profilers! Ones that don't require anything more complicated to use than passing a --profile-me parameter to the interpreter/compiler, and whose output can just be dragged into a pretty, user-friendly and fast application (included with the language runtime), and where reported results are both trustworthy and correspond to actual locations in the source code.
The Luajit implementations should perhaps use FFI C structs, especially if the main issue for one of them is the lack of arrays of structs (other than FFI C structs)
It's perhaps worth pointing out that programming languages are not themselves fast or slow; it is the programming language implementations that are fast or slow. (This is a particularly sore point in the programming languages research community.)
There are certainly languages that it is essentially impossible to implement to run fast.
Like, one can certainly write a slow C... but writing a fast Python (real Python, I mean, with full reflection, run-time introspection, etc..) would be...near impossible.
Yes, I know there are projects that make large subsets of Python run fast, but it's that last 5% that kills you.
Wouldn't that require some code to redefine `int.__add__` or `int.__radd__` between iterations of the loop? Which I would file under "bizarre shit that shouldn't normally happen." Before the loop starts, you'd have to override `int.__add__` to modify itself every time it's called, or something crazy.
If we're talking about custom classes and not ints, maybe it's a bigger problem. But if PyPy doesn't allow the required introspection to make this work, how does it run anything at all?
Most languages have a single implementation and definition of the language and especially its ecosystem is married to it. Some have more, but usually an ecosystem is popular for only one of them, because they are not entirely compatible.
Nowadays by a language people usually mean the whole package.
If the goal here is maximum performance, that's going to require an AOT-compiled language, which implies static typing. Is that acceptable to scientists used to writing Python and R scripts?
Bioinformatician here. It's certainly a drawback. There is no quesiton that Python etc. are more expressive and quick to write in than C are. Some bioinformaticians swear to static languages for reasons other than speed (e.g. static type checking).
There is always a need for scripting languages. Not just for speed of development, but for interactive data manipulation and visualization. Static languages are a no-go in that regard.
My bet in on Julia. Although slower than C by itself, I think in practice, Julia code written by bioinformaticians will be faster than C code wrapped in Python, or C libraries used in inefficient workflows from the shell. That's certianly been my experience so far.
Why not take a look at Kotlin? High level, statically typed, fast iteration, good runtime performance, great support for parallelism (via Java's APIs for it).
I very well might! As I said, I really do need an interactive language for some (most?) tasks, but it would be nice to complement it with a static language. I've been drawn to Rust since it seems to be enjoyable and less full of arcane obscurities than C/C++, but I'll take a look at Kotlin.
There's a Kotlin REPL and IntelliJ has this feature called 'workspaces' that's meant for interactive use. But it's not the primary focus of the team, for sure.
Nah, that’s not true. Something like Common Lisp (AOT compiled, btw) you can write without types and then add them in later to bump performance by facilitating unboxing and removal of inline type checks. It is a tradeoff because you lose some safety, but it can lead to some pretty fast code.
The inclusion of javascript makes me want to dismiss this whole article. If that's a serious candidate then I know he didn't look very thoroughly. But no, go would be just as horrible.
I actually got to the end and said "never heard of Fortran?".
Modern Fortran would be a good option; 'right' is a different matter. Certainly, Fortran 2003/2008/2018 are expressive languages (way more than F77), has an extensive ecosystem with good integration with C, has compilers that generate very fast code, handles parallel computations and SIMD and first-class language features, is already well established in the STEM world, etc., etc.
The biggest downside of Fortran is it's called 'Fortran' and so many people are unwilling to believe it's changed since the 70s.
One note, Fortran 2018 compliance in the compilers is still evolving, so not all features are in all compilers yet. Fortran 95/2003 support should be solid and most all have all of 2008 in.
Treating sequences as strings of characters is a peculiarity, but it really isn't much like text. The operations you would expect to perform on text like uppercase, lowercase, tokenize at word boundaries, etc. are irrelevant. The operations that you perform on sequences like reverse, reverse complement, align, edit distance are separate. No reason why FORTRAN couldn't do great for that.
My first ever professional program was a source code formatter (for Jovial language) written in Fortran (77 I think). Writing a parser in Fortran (early 80's) is an exercise I am glad I never had to repeat.
It is the right option for anything numerical and array-based. Any super-cool array-based feature that people see in MATLAB and Python was inherited from Fortran 90. Yes, that is true. Sadly, modern Fortran is treated quite unfairly within the field of computer science and engineering because of people's lack of knowledge of modern Fortran, which reminds me of the famous quote: "People hate each other because they do not know each other, and they do not want to know each other, because they hate each other".
It doesn't come as a shock but more of a sigh - there's always one. 90% of devs I know use macs. Of them, basically 100% use homebrew. So for 90% of developers, installing crystal is as easy as I said. So the criticism that it's hard to install is, for the most part, invalid.
Of course the build-everything-from-scratch gentoo linux crowd is going to have a harder time but isn't that part of the masochistic appeal?
I mean, all well and good if the only package manager you've ever heard of on macOS is Homebrew, but there are others _and_ Homebrew is of questionable enough quality/has made enough questionable decisions (especially with the last major upgrade) that many people are justified in abandoning or not using it in the first place.
And that's beside the fact that 1) outside of your bubble, more devs use Windows than any other OS, 2) the person who wrote this article isn't even a software engineer, and 3) the tests weren't even run on macOS.
The author has a lot of familiarity with JVM based tooling for the operations he describes in the blog. I'm not aware of a Kotlin implementation but have seen him commenting on both Java and Scala implementations over the years. My assumption is the performance would be similar to those.
It can really depend how you write it. In some instances you can just throw things together and it can look a fair bit like you might write it in what many would consider high level languages.
On the other hand, there are times when you really need to get into the nitty gritty of things for performance reasons, and it ends up feeling quite low level.
It's longer than the Python version[0], I'm going out of my way to not allocate for every string. But, and maybe I'm biased here, it doesn't really feel like it's low-level code.
I would qualify a language as high level if you're not expected to deal with pointers in 'normal' use. Yes, Rust has pointers, but using them is NOT normal.
C/C++/C#/D, etc are not high level by this criteria.
There are also of course degrees to all this. Rust is lower level than a lot of languages by virtue of having native pointer uses, even if it's frowned on.
There's also something to be said for language features, but I can't quite put my finger on it.
I disagree. Rust still requires thinking about low-level details like lifetimes, aliasing, whether/when to use `Box` or `Rc` etc., Even when these are not explicitly spelled-out sometimes (like in lifetime elision), the programmer still has to be aware of these. Plus, modern C++ also abstracts away raw pointers by your definition, and has `unique_ptr` and `shared_ptr` which are the same as `Box` and `Rc` respectively. Bare pointers are frowned upon in modern C++ too. Furthermore, the OP talks about languages high-level enough to be used by biologists, while maintaining an acceptable performance.
Problems like the benchmark in the post are actually really simple to write in rust. They're all essentially streaming algorithms with some simple parsing.
Fwiw rust has completely replaces cpp for high performant bioinfx code in my workflows. Sooo much easier to write than cpp!
The difference between pointers and references can easily become muddled. If you have a language with pointers that are non-nullable, type- and memory-safe, is that not high-level?
I think a good real-world example that exposes the problem in your definition is Go. Go has pointers, using them is normal. However, go does not pointer arithmetic and outside of unsafe (like Rust) they are memory safe. I consider golang to be higher level than C/C++ for this reason, and many others (GC, channels, defer, etc, etc) -- I'd also consider it lower-level because of its non-answer/cop-out to error-handling.
But what is special about pointers compared to references? If you have a language with pointers that are type-safe and memory-safe how is this distinctive?
I mean, even Python has references to variables. You can't escape references (as opposed to values).
The difference is one of model. Pointers are an exposure of the underlying computer architecture. Whereas references are more of a property of common language design. In theory, you could not have pointers but still have references.
> Pointers are an exposure of the underlying computer architecture. Whereas references are more of a property of common language design.
Sorry, I just don't follow. How are the pointers in Go more exposing of the underlying architecture than a reference? (I'm using Go as an example to make it concrete, but any language with similar properties will do).
The syntax and some of the semantics of assignment and rebinding are different between say go pointers and python references, but that's the point I'm contesting, I don't see how one is necessarily higher level than the other. If you put automatic memory management, null pointer checks, removal of any "undefined behavior", pointers aren't necessarily low-level. It wasn't the pointer, it was the memory safety.
I personally think that once you tease it out that it becomes a semantics argument that unfortunately doesn't shed much light on what is "high-level".
I feel that defer is lower-level than C++ RAII which happens automatically in destructors without manually specifying. C has compiler-specific defers, and you can emulate defer using C++ RAII (though clunky).
i think there needs to be a new set of terms. ah, that probably wouldn't help, because people would then mismatch eachother's idea of what terms mean and which set of terms correspond to which ideas. still, if people are calling c low level then what should assembly be called?
Low-er? High and low are relative adjectives almost everywhere they're used. C is a low level programming language relative to most other programming languages; it has fewer layers of abstraction than say, Java. In a conversation discussing x86 assembly, LLVM IR, and C, then C would be high level. But that's usually not the conversation being had; there are a hundred times as many Java programmers as there are x86 assembly programmers.
We don't need a new set of terms. Relative terms are fine, they're just context sensitive.
Imo the problem is that 'high' is a relative term. It's calibrated according to whatever the commonly used abstraction level is for the term-speaker.
The other problem is that level doesn't really mean anything, or it means something different in each language. Is it feature count? Abstraction potential? Memory-addressed vs objects? Statically or dynamically typed? Closeness of fit to the machine it's running on (imagine a lisp on a lisp machine, or x86 on an emulator - is that high or low level?)? Etc.
If "level" is ill-defined then low and high aren't relative terms, they're meaningless. But there is a lowest either way, the machine code. Then high level programming languages have the programmer writing code in a model independent of machine architecture. Those models are't one dimensional, so let's not bother trying to put them in order from low to high, let's talk about their features.
That's how i'd have it if it were for me to decide, but this isn't how language evolves.
OP had a specific high-levelness in mind, that Rust does not fall into. C is provided as performance baseline, which is a common benchmark. The languages are compared against Python - Nim and Crystal are closer to Python than Rust. For example GC is a requirement at language level implied by the author.
Rust is cool, but it does not have a horse in this race I'm afraid.
Both of these are good development languages, particularly for large projects. However, for bioinformatics or other science/academic projects, they have neither the wide range of libraries and ease of writing quick scripts that python has, nor the speed of something like C.
I think that for most academic uses, ease of development is really important. Remember, most of these people don't have CS backgrounds, which makes python a good choice because it's easy to learn and doesn't require much setup. It's main drawback is that it's slow, but this doesn't matter in many cases. When I was in grad school, I'd use a small dataset to try to figure out the best algorithm and then if I had to leave my computer running overnight to run on a larger dataset, it usually wasn't a big deal.
On the other hand, sometimes you just need something really fast. In that case, C is your darling.
Java/C# make lots of tradeoffs on both sides, which works well for software developers but not necessarily researchers (who don't have to worry about code being maintainable).
Trying to get biologists to care about computers is like telling someone to be excited that they need to get all their wisdom teeth pulled and have a cavity filled on the same day. I was pretty much the oddity throughout all my schooling and any of the biologists I met while working that actually knew much of anything about computers and programming beyond what needed to be done to write reports and enter data.
Most biologists I met were far more comfortable in the pissing rain, up the mountain, in the middle of nowhere collecting animal shit than in front of a keyboard.
If there were a Venn diagram of biologists and computer and technology enthusiasts the overlap would need a micrometer to be read.
Disclaimer: Please take this extremely generalized and likely offensive to one of those small few in that tiny overlap I mentioned, statement with a grain of salt. Please don't take this too seriously, it's just from my own narrow sampling of people i've interacted with which is may or may not be representative of the overall population.
I know about bioinformatics and the contributions of biology to computer science and vice versa. I've personally worked in both field sampling and data analysis. My overly offensive generalized statement was a light jab towards field biologists i've known, including my own friends who i've debated on this subject with, who tend to dislike computers and will go as far to avoid even excel and hand write their data while doing all math on a calculator. It really wasn't meant to be taken seriously and I suppose would go above the head of those who haven't spent a lot of time in the field with biologists.
Okay, sorry about the misinterpretation, but it should have been clear to you that ggp was referring to computational biologists or bioinformaticians, making your observations rather unrelated.
Bioinformaticians use Python,R,Java,Julia,Groovy,Js and more (I use kotlin and python mostly but that's in a niche of bioinformatics I don't touch genes nor proteins)... Same as in software dev, there are a lot of beliefs. So you have the club of the js people that hate java, the club of pythoners that hate R, and people that just use what allow them to solve their problem and they are the ones that achieve the most but also that you hear about the less.
And that is the problem. I understand Julia and Python there. I'd understand Haskell (as every biologist is a mathematician too). But to list JS and Lua, and not have C# and Java?
AFAIK a lot of chemistry equipment actually is built with Java and C#.
I think this is the crux of it. Python may not be the fastest language out there, but it's definitely one of the fastest languages to develop with, which is why it's such a hit with the academic research crowd.
Static typing and compilation are great features for many use cases, but if you are just prototyping something (and most academics are never going to write "production" code), it's nice to be able to try something, test it and then iterate and try again. When you have to compile, that iteration loop takes longer.
A few people here mentioned GoLang as a good alternative. Personally, I love the language, but it's really not very good for quick prototyping. As an example, the code won't compile with unused variables. This is great for production code but very impractical when you are testing some new algorithm.