My favorite things that are coming with Julia 1.0

ScottPJones · on June 1, 2018

The creators of Julia have been focusing on technical, numerical, scientific computing first, however, the language was always intended to also be a good general purpose programming language (which it is) [I was lucky enough to attend Jeff Bezanson's Ph.D. thesis defense at M.I.T. 3 years ago - and was able to ask him that question there]. I used to be a full-time C/C++ programmer, focused on performance of large systems, however, since learning Julia I haven't had to go back to writing C or C++ even once in over 3 years, since I can write even the sort of low-level code I typically do in Julia, faster and easier than in other languages, and get the same performance.

nur0n · on June 1, 2018

I'm strongly considering Julia as an alternative for C/C++. A few questions if you don't mind: * How does Julia's GC play into the performance characteristics of these systems? * What tools does Julia offer for fine-grained control of memory? * It appears that Julia uses a rich runtime, do you notice it "weighing down" on performance? * Do you ever feel the need to drop down to C/C++ for hot code paths?

ScottPJones · on June 2, 2018

I do have to be aware of the GC, and try to use techniques to avoid lots of allocation. I haven't needed much in the way of fine-grained control of memory, what sorts of things were you looking for? A lot of that "rich runtime" (i.e. including the kitchen sink for linear algebra stuff) has been moved to stdlib, but still packaged with Julia, but even before, it really didn't seem to affect performance except for slowing down the time to build julia (esp. on the Raspberry Pi!). In over 3 years, I've never needed to write anything in C/C++ (except a couple of times I wrote something out in C just to demonstrate the Julia was generating as good or better code (and I mostly write rather low-level stuff). I have it in mind to learn Rust, which I think would be much better than either C or C++ for that sort of stuff (to make a small robust library of functions for some hot code paths), and it integrates well with Julia (using ccall), but the day that Julia's performance hasn't been good enough as not yet come, at least not for me.

nur0n · on June 9, 2018

Thank you! I was not looking for anything in particular. Julia's design is everything I've been craving from a language; good performance seemed too good to be true. Due to never having worked on large systems, my judgement is of limited use. Your feedback convinced me to take a serious look at Julia. Looking into the discussions regarding Julia's implementation (so cool that they are easily accessible on GitHub/Discourse) convinced me to commit to learning it.

EDIT: grammar

twic · on June 1, 2018

> It will be interesting to see exactly how nothing and missing play out in-practice. I feel like if all things go well, then in most circumstances operating with a missing should return a missing (i.e. properation), and operating with a nothing should throw an error (Much like a NullReferenceError).

We all know that having a null value in your language is a mistake, but i have to admit that it had not occurred to me that the fix was to have two null values.

zimablue · on June 1, 2018

The root of "null is a mistake" isn't that null is a mistake, it's that it's a mistake to allow null into a type system in such a way that it can cause non-compile-time type errors.

So Julia is typed ( I think) but it's not a mistake in any way to have null in a dynamic language by this argument.

Null is a thing and if you don't have it you reinvent it, it's the typesystem breaking that's the problem, maybe monads effectively have nulls it's just that they then get protected by the typesystem.

one-more-minute · on June 1, 2018

Right, Julia is dynamically but nonetheless very strongly typed. If you write `x::String` then x is a string; you won't ever get a surprise null pointer error from that code, and `nothing` doesn't tend to propagate far from where it appeared (the core problem in languages like Java).

In future we'll likely have shortcuts for `T?` as `Union{T,Nothing}`; effectively a dynamic version of optional/maybe types, as in Swift and others.

masklinn · on June 1, 2018

> We all know that having a null value in your language is a mistake, but i have to admit that it had not occurred to me that the fix was to have two null values.

Javascript is even better as it has 3 (null, undefined and NaN, and while NaN is available in most every language it is pretty common in JS as its conversion APIs tend to return NaN instead of error).

petre · on June 1, 2018

How is 3 null values better than one? Fortunately Dart has fixed this nonsense.

jimbokun · on June 1, 2018

(That's the joke.)

parvenu74 · on June 1, 2018

What is the target market or prime use case for Julia? What makes it better than alternatives? I've _heard_ of this language but haven't studied it; I'm trying to understand if this is more of a niche language or something which could see more mainstream adoption like Go or Rust.

amrrs · on June 1, 2018

It is targeted at places where Numerical Scientific computing is required. It's not just Matlab, Julia also targets Python and R where as these languages are used primarily as prototyping languages, Julia's pitch is that are good enough to become a production level code. For that matter even Google in its Tensorflow for Swift article referred Julia's this link. Also another advantage of Julia is its not trying to replace Python or R but to work alongside with them and interoperability.

From Swift for Tensorflow doc:

Julia is another great language with an open and active community. They are currently investing in machine learning techniques, and even have good interoperability with Python APIs. The Julia community shares many common values as with our project, which they published in a very like-minded blog post after our project was well underway. We are not experts in Julia, but since its compilation approach is based on type specialization, it may have enough of a representation and infrastructure to host the Graph Program Extraction techniques we rely on.

eigenspace · on June 1, 2018

this discussion [0] showed up recently on the julia forums regarding what julia pacakges can be considered 'state of the art' ie. just don't have equivalents out there in other langauges and there was some interesting discussion, especially in the 3rd comment from Chris Rackauckas where he evaluated many of the unique things out there in the julia pacakge ecosystem.

I think the sort of pacakges showcased here is a good indication of the sorts of niches julia is worming its way into.

[0] https://discourse.julialang.org/t/what-package-s-are-state-o...

Certhas · on June 1, 2018

I can second that. Rackauckas DifferentiaalEquations library is pretty much peerless out there. And it is enabled by Julias blend of features.

Edit: It's just amazing to have genuinely lightweight abstractions in your language. I could make Python code fast, but it precluded using much of the language. Switching to Julia has vastly simplified some of our code. If not for Julia we might have ended up going for C++ (or Python+Rust) but I can take a scientist with little knowledge of programming and get them productive in Julia in a fraction of the time it would take for C++ (or Rust).

danso · on June 1, 2018

When you say "Switching to Julia has vastly simplified some of our code", can you talk about the scope of what you use Julia for? Application code? Notebooks?

Certhas · on June 1, 2018

Something like: home written library + notebook. The user of the library needs to be able to specify the behaviour of objects critical to the performance. This makes the C library + Python for scripting paradigm thoroughly unsuited.

We've been limiting our research to make it fit the programming paradigm.

aschampion · on June 1, 2018

It's mainly targeting MATLAB, and to a slightly lesser extent scientific and numeric programming in Python and R. It's a well thought out language that allows you to write MATLAB-like high level code with an easy gradient for progressive typing and optimization to near C level performance. It's a bit harder to sell versus Python, since Python has enormous value in the ecosystem, community, and ubiquity. Also, because it primary targets MATLAB a lot of the standard libraries try to have similar ergonomics, which is a bit of a waste of a great tool to recreate a poor interface.

stabbles · on June 1, 2018

It's definitely targeting Python and R to the same extent as MATLAB, in the sense that it claims to solve the two-language problem that is so apparent in these languages. MATLAB, Python and R are easy scripting languages, but as soon as you have to do heavy computations, you're forced to call C / FORTRAN libraries. Julia on the other hand is prove that we can have a high-level scripting language that runs as fast as C and Fortran. Combine this with Julia's generic programming and type system, and you can easily run your algorithm with floats, complex numbers, arbitrary precision, etc etc.

Even if Julia wraps a library like Tensorflow, its API is looking really nice compared to Python [1]:

  using TensorFlow

  sess = TensorFlow.Session()

  x = TensorFlow.constant(Float64[1,2])
  y = TensorFlow.Variable(Float64[3,4])
  z = TensorFlow.placeholder(Float64)

  w = exp(x + z + -y)

  run(sess, TensorFlow.global_variables_initializer())
  res = run(sess, w, Dict(z=>Float64[1,2]))
  Base.Test.@test res[1] ≈ exp(-1)

[1] https://github.com/malmaud/TensorFlow.jl

blablablerg · on June 1, 2018

To compete with R it needs something like the tidyverse.

elsherbini · on June 1, 2018

I agree - declarative in memory dataframe manipulation is extremely powerful. And the composability of plotting in the tidyverse is really nice as well.

It looks like there are the beginnings of both of these in Julia:

[0] http://gadflyjl.org/stable/

[1] https://github.com/JuliaStats/DataFramesMeta.jl

ViralBShah · on June 1, 2018

I often ask myself the question, how can Julia do things that R cannot do. After all, when something is good at doing something, why replace it?

Part of this is why we did JuliaDB: http://juliadb.org/ and continue to try push the boundaries on parallelism, missing data, OnlineStats.jl and making data manipulation and modeling that much easier.

ScottPJones · on June 1, 2018

In some sense, it doesn't really need to compete with R, many times it's better just to use the R, Python, Java, C++, packages via RCall, PyCall, JavaCall, Cxx, or use the built-in ccall to use libraries written in any number of languages that conform to the C ABI (C, Fortran, Rust, ...).

I've joked before, about how there is no such thing as a "One Language To Replace Them All", however, I feel Julia is the best candidate for the "One Language To Rule Them All", since while it solves the "two-language" problem in many cases, you can use it bind code written in many languages together (hopefully in a bit nicer fashion than the "One Ring" bound the other rings and their users!)

psandersen · on June 1, 2018

Agree completely with this, tidyverse is whats keeping me in R when I'd prefer to mostly use Python.

bunderbunder · on June 1, 2018

Right now it’s a hard sell vs Python, but I can imagine Python running out of runway soon. A lot of it’s existing scientific and statistical computing stack is built around the assumption that you’ll be working with data that conveniently fits in memory. Once you’ve sized out of pandas/scipy/scikit, your next major option is Spark, which is certainly powerful, but is also unwieldy.

I could see something like Julia earning a lot of mindshare if it had a really polished solution for the space between, “my data is hundreds of megabytes”, and, “my data is hundreds of gigabytes”.

throwawaymath · on June 1, 2018

Speaking as someone who uses Python with half a terabyte of memory, I think you're underestimating how much memory these labs will use. In my experience most HPC architecture is optimized first by rewriting the code in the same (already fast) language or library, then by increasing hardware resources (especially among distributed nodes), then by seeking a new library in the same ecosystem, and finally by moving to a new language if they have to.

Moving to a new language has more friction than basically anything else unless there's a real language feature missing or the budget doesn't allow for more compute hardware. Hundreds of gigabytes is well below where academic and industry labs will start having to think about these problems. It's going to be really tough to displace Python with anything equally as general purpose.

This is all to say that I buy that Julia can shine more than Python for I/O bound HPC, but it really shouldn't be I/O bound until you have terabytes of data (and likely tens of terabytes). And aside from that, the Python numerical computing ecosystem includes a lot more than just Numpy and Pandas. As other commenters have mentioned, you can use Dask if your hot data has grown into the terabyte range. Anaconda includes a lot of libraries which can bail you out of situations once you've left the familiar world of Pandas data frames.

Avshalom · on June 2, 2018

It's not so much hard drive I/O as it is network I/O. Both the obvious waiting for non parallel data, but also the time it takes to get something physically across a room from one cpu to another.

https://en.m.wikipedia.org/wiki/Gustafson%27s_law

For instance.

VHRanger · on June 1, 2018

All high performance code is IO bound.

Even highly tuned c++ code spends most of it's time on the CPU waiting on cache misses. Its a pretty exceptional usecase where your code is not IO bound

throwawaymath · on June 1, 2018

Wait what? I'm using "I/O bound" in the sense of CPU operations waiting on reads/writes to e.g. a disk. If you consider an operation waiting on the cache to be I/O bound, what do you consider to be CPU bound or cache bound? And what terminology would you use to refer to operations which are waiting on the disk as opposed to the cache? What about memory instead of the cache?

I think it's useful to differentiate between an operation which is purely CPU bound (i.e. it's just constantly calculating without a reference) and an operation which is cache bound (faster than memory but still bound over the CPU). But calling operations I/O bound when they're sufficiently optimized that they live in the cache and don't even hit memory, let alone disk is an abuse of terminology. In the context of what I'm talking about, most HPC is absolutely not I/O bound unless it's using SATA/SAS drives instead of cache and memory.

And circling back to my original point, most research labs which can afford it will sufficiently optimize their code and hardware so that they don't hit the disk unless they're working with terabytes of data. Python, C++ and R provide numerous packages between the three of them for numerical computing across each of these bottlenecks, so I don't think Julia can rely on differentiating itself by shining in an I/O bound setting (i.e., waiting on disk). And if it does, "hundreds of gigabytes" isn't really the data size in which people are (in my opinion) going to overcome the friction of a new language and ecosystem just to harvest those benefits.

VHRanger · on June 2, 2018

CPU bound in HPC circles usually means you are somewhere in the order of magnitude of the maximum FLOPs your cpu can do (eg. GHz clock of the cpu * cores number of operations in terms of GB/s of data). That's almost never the case.

Operations waiting on the disk are "disk bound", IO bound is the generic term for data access, instead of cpu processing, being the bottleneck (which is usually the case, just a question of where).

I agree that Julia probably doesn't have a huge advantage, that said, having been stuck with slow Python code before, you're often stuck with rewriting large parts of the system in c++ or another low level language. That's the reality of Python, but at least it's not the most painful thing to do.

dnautics · on June 1, 2018

the old joke is "HPC is the art of taking a cpu-bound computation and making it I/O bound".

repsilat · on June 1, 2018

It's not an especially weird use of terminology. "I/O bound" means "waiting on reads and/or writes." Cache or RAM or disk, it's all about communication throughput and latency, efficient access patterns etc.

In a compute-bound workload the CPU spends the bulk of its time actually retiring instructions, not stalled waiting on data.

Think about it from the perspective of what the FPU sees -- once it has done that FMA operation, does it have the data it needs to do the next one, or does it need to sit on its hands for a while?

The cache hierarchy, cache-friendly data structures and algorithms -- they all aim to reduce time spent waiting on IO.

throwawaymath · on June 1, 2018

I understand that, but I don't think it's useful to use use the term "I/O" in the theoretical sense of the concurrency problem. It's also not typical nomenclature - for example, see https://stackoverflow.com/questions/868568/what-do-the-terms.... We have terms like "CPU bound", "cache bound", and "memory bound", but we don't really have "disk bound" in common usage. This is because the common usage is "I/O bound".

Theoretically speaking we can model any process as one which has to wait and one which doesn't have to wait. But in modern usage we have a variety of types and speeds for reads and writes. When I/O simply means reads and writes, you lose all the practical granularity you'd otherwise get by decomposing the reads/writes into different bottlenecks. It's philosophically elegant, but practically unhelpful for optimizing HPC and distributed systems when, as the responder said, it's rare to be CPU bound.

I also think that the context of my original comment is pretty clearly using I/O in the modern sense of disk usage. Responding with a correction that everything is I/O bound is vacuous, not insightful.

ChrisRackauckas · on June 1, 2018

>it's rare to be CPU bound.

Not when solving (partial) differential equations, which is what I am using Julia for.

repsilat · on June 1, 2018

Also: CPU-bound problems used to be much more common. We spent a lot of effort making CPUs fast though, and memory access didn't keep up. It's why we have these deep caches, it's why we have out of order execution and speculative branch prediction -- to keep processors fed with data.

Used to be you could count cycles, now that's only really true in the simplest of cases with trivial memory access patterns. Now high-information branches are much more expensive than cycle counting would have us believe, ditto pointer-chasing.

ssfrr · on June 1, 2018

Better cache access patterns are one of the reasons Julia's dot-broadcasting[1] is super cool. If you have big vectors `a` and `b`, the expression `sin.(a .+ exp.(b))` will do a single pass over your data, calling `sin(a[i]+exp(b[i]))` for each element, rather than creating big temporary arrays for the intermediate expressions and looping through multiple times.

Then because putting all those dots can be unwieldy, there's the `@.` macro which puts a dot on all your function calls.

[1]: https://julialang.org/blog/2017/01/moredots

CyberDildonics · on June 1, 2018

I'm not sure how this makes sense. Tuning C++ for speed is mostly weeding out cache misses through memory access patterns. If memory is accesses linearly the prefetcher will get it ahead of time. If cache sizes are taken into account, you can not only cut down on memory latency, but memory bandwidth as well.

wenc · on June 1, 2018

> Once you’ve sized out of pandas/scipy/scikit, your next major option is Spark, which is certainly powerful, but is also unwieldy.

There's also Dask [1], a native Python framework for distributed computations (by Anaconda). Irina Truong gave an excellent talk at PyCon 2018 about it [2]. I had never thought to look into Dask because Spark worked well for my use cases, but it has a lot of advantages over Spark (e.g. speed -- it's faster and more lightweight than PySpark and has no JVM serialization overhead) if you're using Python. Dask also runs on Kubernetes clusters, so scaling is not an issue.

And yeah, a huge amount of important data analysis work will continue to be done on data that fits in memory. Data analysis on distributed datasets is important, but from what I can tell, outside of certain domains it's certainly not the majority of the data analysis work out there.

[1] http://dask.pydata.org/en/latest/spark.html

[2] https://www.youtube.com/watch?v=X4YHGKj3V5M

mindB · on June 1, 2018

Julia actually has that, though polishing is still ongoing. Checkout JuliaDB [0]. It works well for noodling around in a REPL but can also smoothly deal with huge data and processing distributed across multiple computers, all while leveraging the native Julia ecosystem which is much nicer than numpy and with lower overhead.

[0] https://juliadb.org

costrouc · on June 1, 2018

Python does not make these assumptions. There are Python tools that exist to solve these problems that are equally as powerful as other language's solutions. The two that I believe right now address these problems best are dask and mpi4py. Mpi4py can achieve very low latencies but given that it's based on MPI it can be complex to use. dask is the most user friendly and as a Python user is clearly easier to use than spark. Paired with numba you can get equivalent performance to distributed C programs.

mattip · on June 1, 2018

Dask is another choice for distributed out-of-memory data structures but still within the python ecosystem

metaobject · on June 1, 2018

Language implementation-wise, can anyone explain why/how Julia is able to get close to C-level performance? Is it doing some extra steps under the hood (JIT compilation?) that Python and R aren't doing?

ScottPJones · on June 1, 2018

Julia's JIT compilation is rather different than what is referred to as JIT compilation in other languages, such as Java or JavaScript, where the language is interpreted (which may be interpreting instructions from a virtual machine such as the JVM), and the run-time decides if some code is being hit frequently enough to warrant compilation to native code. Julia first compiles to an AST representation (also expanding macros, etc), performs type inference, etc. When a method is called with types that haven't been used before to call that method, that's when Julia does it's magic and compiles a version of that method specialized for those types, using LLVM to generate the final machine code (just like most C and C++ implementations these days, as well as Rust and others). That also means that it's rare for Julia to have to dynamically dispatch methods based on the type of the arguments, which is one of the things that can really slow down other languages with dynamic types.

johnmyleswhite · on June 1, 2018

The easiest way to think about Julia's performance is closely related to the observations that inspire tracing JIT's for many languages -- most code in dynamic languages doesn't make use of the features that make efficient compilation impossible. Julia's response to that observation was to build a dynamic language that lacked some of the most extreme features in Python or R that act as barriers to efficient compilation.

simonbyrne · on June 1, 2018

It's also worth noting that Julia's JIT isn't tracing: it does all its compilation before the code is run (unless it hits a path which hasn't been run before, or wasn't inlined, in which case it runs the compiler again). I've heard it described as "really an ahead-of-time compiler that just runs really, really late".

vanderZwan · on June 1, 2018

BTW, is there a write-up of what those blocking features are? I don't recall ever seeing a blog about that. Could be an interesting e"if you want to make a JIT-friendly language, don't do this, do this instead" type of article.

simonbyrne · on June 1, 2018

I agree that would be great. The crude answer is: make it easier for a computer to figure out what will happen when you run the code.

The example I usually use is allowing integers to overflow, instead of automatically promoting to arbitrary precision (Python), or converting to a sentinel value (R). Integers are used in a _lot_ of places, so inserting these checks (or worse, access to heap-allocated memory) makes it difficult to optimise. (throwing an error might be a reasonable alternative in some cases).

Another is that you make it easier for the compiler to figure out things about an object, such as its size (e.g. you can declare the types of the fields of a Julia struct) and whether or not it can be mutated (immutable objects are easier to optimise).

ScottPJones · on June 1, 2018

I wouldn't use that as a primary example (allowing integers to overflow), because one of the great things about Julia is that it is incredibly easy to define your own types that will simply work, that for example, do checked arithmetic on integers (SaferIntegers.jl, I think is one, or don't want a limit (BigInt, which is included in Julia). Julia gives the programmer the choice, and not only that, allows the programmer to create their own choices.

vanderZwan · on June 1, 2018

> The example I usually use is allowing integers to overflow, instead of automatically promoting to arbitrary precision (Python), or converting to a sentinel value (R).

IIRC Julia used to automatically promote integers, is this the main reason why this was dropped?

simonbyrne · on June 1, 2018

No, I don't believe integers ever promoted on overflow (or at least not since 2012).

If an operation involves two different integer types, they do promote to the larger one (i.e. an Int64 + a BigInt will give a BigInt).

vanderZwan · on June 1, 2018

Oh no, I'm very certain of this: I distinctly recall a github issue where people complained that addding two 32bit integers resulted in a 64bit integer, which was justified as giving more correct answers due to potential integer overflow.

StefanKarpinski · on June 1, 2018

You're talking about different types of promotion. `Int32 + Int32` did once upon a time give an `Int64` on 64-bit systems. However, that was true regardless of the values of those integers—it was entirely predictable from their types alone. The type of promotion that's being talked about here is promoting `Int + Int` to `BigInt` but only when the values being added are too big to be stored in an `Int`. Julia has never done that.

vanderZwan · on June 1, 2018

Ah, ok. I can imagine that that is more JIT-friendly.

repsilat · on June 1, 2018

One of them is being able to override `setattr` and `getattr` at runtime in Python. It can be pretty tricky to prove it can't happen, so (unless you have optimistic and pessimistic codepaths) you get into a situation where every attribute lookup makes indirect function calls and hash-table lookups.

improbable22 · on June 1, 2018

Yes, it's JIT compiled.

And my (very crude) understanding is that the stronger type system makes this much easier than in Python. The compiled version of any function is specific to the types of its inputs, and thus need not contain any further checks: simple functions often end up with literally the same assembly as C would produce.

dnautics · on June 1, 2018

It's "extremely lazy ahead of time compiled", is one way I've described the compilation model, since you're basically never executing code in an interpreted fashion (usually jits let you do either). Also, typically jit's choice of when to but may be non-deterministic, or deterministic but difficult to understand. When Julia chooses to compile is pretty easy to understand

ScottPJones · on June 1, 2018

I believe though that there is some work being done on actually directly interpreting the AST, in cases where going through all the work of generating LLVM IR and compiling that to native code is unnecessary, particularly when it is code that is only run once when a package is compiled the first time.

skolemtotem · on June 1, 2018

If anyone wants to do more research on this, the keyword is "monomorphization".

StefanKarpinski · on June 1, 2018

Perhaps the best way to understand what makes Julia fast is to watch these two videos about Python and R and what makes them so hard to optimize:

https://www.youtube.com/watch?v=qCGofLIzX6g

https://www.youtube.com/watch?v=HStF1RJOyxI

Take everything mentioned in these videos that make Python and R really hard to optimize and don't do those things :D

jabl · on June 1, 2018

Yes, it's using JIT compilation (last I checked, they are using LLVM as the backend). Combined with a language design that takes JIT compilation into account from the get-go, making the problem much easier than trying to use a JIT later on (see e.g. PyPy).

fareesh · on June 1, 2018

I often wonder what inspires folks to start from scratch in the face of a gigantic ecosystem like the one that Python brings with it, which will also keep improving.

mbauman · on June 1, 2018

In the case of Julia, you can check out the original motivation back in 2012 or this recent answer on the message board.

https://julialang.org/blog/2012/02/why-we-created-julia

https://discourse.julialang.org/t/julia-motivation-why-weren...

Gravityloss · on June 1, 2018

If you have some Matlab background, working with Python is frustrating. It is hard to explain, but vectors and matrices should be the primary concepts, with absolutely minimum extra glue needed.

goatlover · on June 1, 2018

That's definitely a thing. I like Pandas, but the syntax is a bit cumbersome compared to R or Julia.

prestonh · on June 1, 2018

I don't have much experience with Matlab, so forgive me if this is incorrect, but numpy should be able to do everything that matlab can at comparable speeds. No one performing matrix/vector-like operations is doing so with standard Python lists if numpy is available.

Gravityloss · on June 1, 2018

I'm not talking about execution speed but the human interface. The syntax of Python just is not nice and using libraries just adds more and more boilerplate.

I don't expect anyone who has not spent a lot of time with Matlab to "get" it.

ChrisRackauckas · on June 1, 2018

Here's how I explain it to people who don't get it.

https://cheatsheets.quantecon.org/

Python works, but it's far from elegant in this domain.

repsilat · on June 1, 2018

I know what you mean. In MATLAB

  [A, B]

concatenates two matrices/vectors, whereas in Python it "wraps" them in an `n+1`-dimensional "matrix".

That said, you get most of what you want with libraries. In NumPy you won't write

  [aRow + bRow for aRow, bRow in zip(aMat, bMat)]

because you'll just call `numpy.concatenate`.

Also, Python is mostly not used for mathematical work, and programmers tend to assume matrices are scary or only useful for mathematical work, so it has a "boring" syntax more suited for operating on single items at a time, with lots of loops.

geoalchimista · on June 2, 2018

> "The syntax of Python just is not nice and using libraries just adds more and more boilerplate."

I think that is a necessary trade-off for Python as a "general-purpose" programming language. I had used MATLAB and IDL quite intensively before I moved on to Python and R. When writing MATLAB, I felt like a scientist and did not have to bother with programming practices, like coding style, unit tests, writing functions instead of scripts, etc. But Python forces me to think like both a scientist and a programmer. (For example, every time you write `np.array([1, 2, 3])` instead of `[1, 2, 3]` it reminds you that array operation is not a free lunch offered by the language; it comes from the NumPy library. Also, it keeps the namespace pure.) I personally like this way better. But I also agree that not everyone likes it. (In my institution, researchers are kinda split half-and-half between Python and MATLAB.)

acdha · on June 1, 2018

It's interesting because those were the reasons I heard from MATLAB users for switching to Python: moving to a language with a cleaner, less ad-hoc design and less boilerplate / copy-paste code made a big difference once you had more than a little code.

Has the language improved dramatically in the last few decade?

ChrisRackauckas · on June 1, 2018

Python is much better than MATLAB in non-mathematical domains. Hell, MATLAB didn't even have arrays of strings until 2017! But in the mathematical area, SciPy is really lacking syntax-wise. Here's a quick side-by-side comparison which is also useful for teaching between the languages:

https://cheatsheets.quantecon.org/

You can go as far as to say that Python is verbose in many cases, and non-intuitive in others (A @ B?).

That said, you would never want to write a webapp in MATLAB, so as people expand from "math scripting" to "programming" they run into MATLAB issues which have absolutely no good solution. This is where the Python pickup comes from: it's still decent for scientific computing, but it also is an actual programming language. However, Julia keeps the nice syntax of MATLAB in the mathematical domain, keeps the technical computing focus of its community, adds some speed, and also is a general-purpose languages where webservers etc. are being written. In that sense, Julia is a really good fit for people looking to ditch MATLAB.

(A lot of MATLAB's bad syntax was bolted on later. It started as MATrix LAB, and later became a programming language. You can easily see the elegance of its initial design, and the terrible choices when extending it.)

simonbyrne · on June 1, 2018

Oh no, it's the 90's: we need classes, right now!

Gravityloss · on June 1, 2018

Matlab is good for non-software engineers and scientists to explore and solve problems. Once you have a solution there, write it up in some proper language.

I don't think it is likely for a software engineer to understand Matlab's niche and effectiveness. It comes from a different direction.

A ball point pen, a paint brush and a piece of drafting graphite all have their uses.

twtw · on June 1, 2018

I'm an electrical engineer oriented towards signal processing and controls, and I'm also incapable of understanding Matlab's niche and effectiveness. I wish Matlab could just vanish. It's a glorified calculator that has mutated over the years into a crap programming language, with random features just bolted on and the strangest semantics of all time.

It also has the drawback that most of its users generate write-only code, so everyone that learns it also learns to write code that way.

geoalchimista · on June 2, 2018

I second this. And to make matters worse, a lot of MATLAB users are not aware of coding style. Poorly readable MATLAB code stinks. I sometimes rather wish to read Fortran 90 instead of MATLAB code.

goatlover · on June 1, 2018

But numpy is the underlying library used by the rest of the SciPy stack, right? I use Pandas, so taking that as an example, it can be slow when you're doing stuff that isn't mostly leveraging Numpy. If I have to loop over a relatively large dataset to do some complicated row checking/filtering stuff, then it can be very slow, and I might as well take a coffee break. You can rewrite that into using just the numpy values array and it will be performant, but you lose all the nice Pandas features when you do that.

Also, if I'm just restricted to using Pandas on a laptop or small server instance, then loading in a several gigabyte csv file can really tax the memory.

ScottPJones · on June 1, 2018

Sometimes languages get stuck by their history, and (without really becoming a rather different incompatible language) the only way forward is to start from scratch. Also, Julia is good at letting you use those old ecosystems, C and Fortran libraries, Python, Java, R, all from the comfort of home (Julia)

ChrisRackauckas · on June 3, 2018

Well, there are big areas where the ecosystem is quite underdeveloped, like differential equations, which have a lot of holes that need new algorithms and improvement. It would be extremely difficult to develop all of the necessary algorithms in C/C++/Fortran, and pretty much impossible in Python/MATLAB (I tried at first), but it's a breeze to tackle this in Julia. So for these kinds of scientific computing areas where there's tons of work with few people with the necessary expertise, Julia is a great way to start getting some good implementations out there for people to use.

kgwgk · on June 1, 2018

Hubris. Which is one of the three great virtues of a programmer.

simonbyrne · on June 1, 2018

I'm pretty sure the other two were as well.

cdsousa · on June 1, 2018

In part, the fact that there are major problems moving the "giant" to where one would like (speed), e.g., unladen swallow, pypy, pyston, ...

ptero · on June 1, 2018

Scientific computing is a major use case. Just as a data point, last month I needed a small library to test a signal processing approach and tried to write that in Go (a good, real problem to better learn Go). I did not get far before I gave up and re-did it in C++. Julia or numpy would work just as well (I just did it quicker in C++).

What killed me with Go was lack of generics (a lot of copied code to implement exactly the same integration approach for real and complex functions; different accuracies, etc.); clumsy way to express formulas (no operator overloading to provide a clean syntax for matrix operations) was a pain as well.

I have seen many times that the use cases of scientific computing would benefit from the functional programming paradigms, efficiency with basic math operations and ability to define operations that match standard scientific notation (matrix algebra, convolutions, etc.), but there are few languages that do well in all of those.

drjesusphd · on June 1, 2018

It's for high-performance numerical work. The language of choice in scientific computing is still Fortran, and for good reason. Julia has become my go-to language for new codes, but Fortran still has its place. In particular, Julia is slow to be adopted by supercomputers.

StefanKarpinski · on June 1, 2018

Slow but not non-existent:

https://juliacomputing.com/case-studies/celeste.html

It's tough to crack the old school high-performance computing world, which is very slow to change programming tools, but Julia is the first and only high-level dynamic programming language to do it. Now that it's been shown to be possible to exceed a petaflop/second in Julia without having to write any gnarly, low-level C++ code, I suspect we'll start to see it happening more in the future.

hpccurious · on June 2, 2018

Did Celeste rely on pure Julia code to achieve its petafloppiness or were there other libraries at play, say Intel MPI, OFED and IB drivers, RDMA functionality, any custom C++ at all, etc? How much in the way of low-level distributed bit twiddling was part of the development of that application? Would anyone not named Keno be able to write a similar application?

StefanKarpinski · on June 5, 2018

Libraries like LAPACK and MPI were standard C/Fortran ones wrapped from Julia but all of the custom application code was written in Julia. No doubt the application would not have been so successful without the extremely talented team that worked on it—including Keno, who was described by the project PI as a “one man army”—but that’s true of any record-setting supercomputing project.

tavert · on June 2, 2018

All of the MPI communication was happening in C code (the built-in distributed code in Julia exists, and sometimes works, but that's the nicest thing you can say about it - it has not been leaned on heavily at all, and is often awkward to use). Lots of low level bit twiddling was involved in Celeste. Without Keno that application would have been lucky to top 100-200 teraflops.

tejtm · on June 1, 2018

We are in luck as Julia seamlessly integrates with FORTRAN(and C)[0]. That is, if set up properly, LLVM does not care what language it is being asked to JIT compile even when they are mixed together in the same file.

I am hoping for a nice web assembly[1] emitted from Julia

[0] https://docs.julialang.org/en/stable/manual/calling-c-and-fo...

[1] https://webassembly.org/

ChrisRackauckas · on June 3, 2018

This has some working prototypes:

https://github.com/MikeInnes/WebAssembly.jl

kgwgk · on June 1, 2018

> Fortran

> codes

I think there is a large overlap between users of that language and users of that word.

dnautics · on June 1, 2018

You can very easily deploy Julia in supercomputing centers via singularity containers (assuming the SC center supports singularity).

geoalchimista · on June 2, 2018

> "The language of choice in scientific computing is still Fortran, and for good reason."

Except that this is not always true. R, Scala, and spark are heavily used in bioinformatics. Physicists and quantum chemists nowadays use C++ more often than Fortran when they build new models and libraries. Astronomers use Python. The only scientific fields that get stuck forever with Fortran seem to be climate modeling and weather forecasting (and they continue to use Fortran 90 and pretend the most recent Fortran 2008 standard does not exist).

Edit: I'm not saying Fortran isn't good. It's the best for bit twiddling. Offloading the performance critical part from Python or other high-level languages to Fortran is great. The problem is, in an academic setting, a lot of scientists untrained in programming choose to write programs from scratch in Fortran, which is too taxing on the debugging, testing, and maintenance time.

zem · on June 4, 2018

when i was a high-energy-physics grad student back in 1997 we were already using c++ rather than fortran

oblio · on June 1, 2018

It's meant to be primarily a scientific language and the developers are primarily focusing on this. Post 1.0 who knows? It definitely looks like something which could be used as a general purpose programming language.

tavert · on June 1, 2018

Academic research. Or prototyping where you want higher performance than Python.

For anything else that requires long-term maintainability and has a business or someone's job depending on it, Julia is a risky choice to make. Few, if any, large organizations have done so yet, and usually don't want to go first.

Well after 1.0, if Julia Computing (who employ almost all of the core developers) is on sustainable financial footing and any large organizations have made investments into adopting and supporting Julia, it might make some sense for non-academic use cases.

IshKebab · on June 1, 2018

It's a competitor to Matlab. So scientific computing. You probably won't see applications, servers, etc. written in it like you would with Go or Rust.

gaius · on June 1, 2018

It’s really a competitor to Q, not so much Python or Go or whatever

noobermin · on June 1, 2018

I'm surprised the author didn't touch on the massive changes to Pkg. Now Pkg is a lot more like npm, allowing for "project" level packages rather than requiring global packages.

oxinabox · on June 1, 2018

Good point, that is huge, I was intending to mention it as the start (and have done so now).

It like the getfield overloading and a few other things really deserve a post of their own, so I wasn't going to try and push them into this one.

ScottPJones · on June 1, 2018

He explained that at the beginning of his post - things like the Pkg changes are getting quite a lot of press already, he wanted to highlight the things that people might not have realized were coming in v0.7/v1.0.

photon-torpedo · on June 1, 2018

Also the new Pkg is massively faster.

dyu- · on June 1, 2018

Great ffi perf too! From the benchmark over at github.com/dyu/ffi-overhead, julia is on par with luajit.

ChrisRackauckas · on June 3, 2018

Wow! Thanks for sharing! Julia and LuaJIT's JITing strategy really shine here, calling shared libraries faster than C itself!

_jordan · on June 1, 2018

Phenomenal language. compiles applications to a single binary which makes it super convenient to move around environments

oblio · on June 1, 2018

How? Everything I've read about Julia seems to show that you're supposed to pass the source code files around.

https://medium.com/@sdanisch/compiling-julia-binaries-ddd6d4... -> notice that's a third party package.

ninjin · on June 1, 2018

Indeed, which comes with its own advantages and disadvantages. As someone that has to always care about performance, Machine Learning, I am very happy not to have to worry about dependencies not being compiled to take advantage of the latest and greatest CPU instructions on a specific box in a very heterogeneous cluster. Now, you could just compile it all on each deployment, but something as large as say TensorFlow takes ages and not everyone gets to enjoy having great devops at your beck and call.

oblio · on June 1, 2018

I'm not sure I understand what you mean: you prefer compilation on the target machines or you prefer a static binary?

ninjin · on June 1, 2018

Compilation on the target machine, in my case. Performance trumps portability for me, but I can totally see why this is not the case for everyone.

oblio · on June 1, 2018

In theory that can be solved with the best of both worlds: have the runtime and JIT in the final binary. That way you still get easy deployment with 1 big file and high performance after the JIT warms up, which should be fast.

I think it's just a problem of manpower on the Julia developer side.

ninjin · on June 1, 2018

I wholeheartedly agree, one could also note how this is Java-esque. My hope is that someone far more well-versed than me in this area will find the time and push for it. On that note, one wonderful thing about Julia is how easy it is to contribute to the core language since it is written in Julia. It really is amazing to see people from so many disciplines come together to push a “language for science as a whole”.

pkofod · on June 1, 2018

What do you mean that it's "third party"?

oblio · on June 1, 2018

It's not part of the official Julia package. javac comes from Oracle, rustc comes from the Rust developers, etc.

Sean1708 · on June 1, 2018

JuliaLang are the Julia developers, PackageCompiler.jl is no more third-party than cargo is.

oblio · on June 1, 2018

Oh, my bad, the original link was: https://github.com/SimonDanisch/PackageCompiler.jl. Now I see that redirects to JuliaLang.

However, unless I'm missing something, that argument isn't that much stronger.

1. Is the package actually promoted and actively maintained by the Julia developers?

Or even better:

2. Is the package part of the default Julia installation?

And even:

3. Is the package documented on the Julia site, as part of an officially recommended process?

ChrisRackauckas · on June 1, 2018

>1. Is the package actually promoted and actively maintained by the Julia developers?

Yes.

>2. Is the package part of the default Julia installation?

No, mostly because it's not done yet and only experimental right now.

>3. Is the package documented on the Julia site, as part of an officially recommended process?

No, mostly because it's not done yet and only experimental right now.

oblio · on June 1, 2018

In that case, sorry for the inaccurate statement.

However, go to my original comment and replace "third party" with "it's not done yet and only experimental right now" and the meaning stays the same... :)

ChrisRackauckas · on June 1, 2018

Indeed, but it shows that it has moved up a notch and people are taking it more seriously now. The biggest issue is that all of the compiler guys have thus far been working on making breaking changes to the language. Once v1.0 is out (alpha is already tagged! Features are frozen!), then they will have some open hands :).

oblio · on June 1, 2018

I guess my comment was a sign of Julia moving to the big leagues now. Expectations are high :)

Sean1708 · on June 1, 2018

Ah sorry, I didn't realise that the original link pointed somewhere else.

adrianhel · on June 1, 2018

I had a look at Julia like 2-3 years ago and thought it seemed really nice to use.

The performance is of course also awesome.

So I hope we get more traction for Julia as a general purpose language. But for now I'm sticking with Node.

a-dub · on June 1, 2018

Can you run it with the plotting libraries loaded and have whos() return in less than 10 seconds yet?

I was excited about v0.6 but then when I learned it took >15s to do whos(), I completely gave up...

kescobo · on June 2, 2018

Depending on which plotting package you use. Plots.jl itself is pretty slow to get started still, though there are ideas on how to improve that.

That said, once you're up and running, plots (and everything else) are super snappy. When I'm doing plotting stuff, I'm usually doing it interactively, and when I'm scripting it, it's because I'm plotting hundreds or thousands of things (and then the startup time is vanishingly small).

a-dub · on June 2, 2018

I first thought the whos() issue was some JIT thing... but no, after multiple runs it was still taking 10s of seconds.

For context, I come from using MATLAB interactively. When doing that kind of work, the MATLAB 'whos' command is like /bin/ls, it's hard to do much without it.

Is there a different command in julia for showing all numerical arrays/matrices/tensors in memory and their dimensions from the repl? whos() seems like another direct analog of MATLAB, but maybe there's something better that isn't unreasonably slow?

ivirshup · on June 6, 2018

I'm a little late to the party, but I think something like this should do the trick:

  [names(Main) typeof.(eval.(names(Main)))]

a-dub · on June 8, 2018

runs crying back to MATLAB

sidcool · on June 1, 2018

And we are struggling to connect Julia to Postgres DB.

StefanKarpinski · on June 1, 2018

Have you looked at https://github.com/invenia/LibPQ.jl? The folks at Invenia are very sharp and generally produce excellent, reliable packages, so I would expect this to be of high quality. The primary author, Eric Davies, in particular, is a well known high caliber Julia contributor :)

sidcool · on June 1, 2018

I am using it, but it's quite inconvenient. The return types are wrong. Documentation is sparse. I would like contribute to it.

mbauman · on June 1, 2018

Please do! Not only are the folks at Invenia technically talented, they're also really kind and wonderful. I'm sure I'm not speaking out of turn to say that I think they would very much appreciate bug reports, API feedback, and especially doc and code contributions!

sidcool · on June 1, 2018

I have started with documentation and coffee samples that I m trying. Let's see where it goes with code contribution. I am especially trying to efficiently convert the hideous NamedTuples to DataFrames.

jb1991 · on June 1, 2018

Would anyone recommend using Julia for things like web servers? Python is used for web servers, and is slow. But Julia doesn’t really seem marketed to that application domain, despite that I would expect it to do a better job with the demands on a server due to its performance.

I know web frameworks exist for Julia, but I’m wondering how practical it is to actually use this language for that purpose?

goatlover · on June 1, 2018

If your only requirement is performance, then you'd probably be better off using Go or Node. My guess is you would use Julia for a web server when it ties into other things you're using Julia for.

But maybe I'm wrong here. It seems that "use the right tool" turns into use Python or JS or Excel for all the things.

oblio · on June 1, 2018

I guess you’re not up to date: Excel is getting JS scripting soon and most likely Python scripting not long after that :D

goatlover · on June 2, 2018

Haha, well then! World conquest is surely right around the corner with that combo. Maybe allow for some Rust plugins and have it run on blockchain, just to be on safe side.

ffriend · on June 1, 2018

Almost certainly yes. Although it doesn't mean it's the best choice for your specific task.

Python had a long history of web server development. I think, aiohttp may be considered state of the art now. Let's measure its performance:

    from aiohttp import web
    
    async def handle(request):    
        return web.Response(text="Hello")
    
    app = web.Application()
    app.add_routes([web.get('/', handle),])
    
    web.run_app(app)

using `wrk` for testing:

    $ wrk -t1 -c1000 -d30s http://127.0.0.1:8080/
    Running 30s test @ http://127.0.0.1:8080/
      1 threads and 1000 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   164.59ms   17.12ms 537.35ms   83.97%
        Req/Sec     6.05k     0.95k    7.74k    74.33%
      180749 requests in 30.08s, 26.72MB read
    Requests/sec:   6008.75
    Transfer/sec:      0.89MB

So we have ~6k rps on a single CPU core. As far as I remember, Tornado has ~4k rps, while built-in Flask server can process only about ~1k requests per second . Yes, you are unlikely to use Flask dev server in production, but for aiohttp it's indeed a recommended way.

Now let's measure Julia's HTTP.jl server:

    using HTTP

    HTTP.listen() do request::HTTP.Request
       return HTTP.Response("Hello")
    end

which gives:

    $ wrk -t1 -c1000 -d30s http://127.0.0.1:8081/
    Running 30s test @ http://127.0.0.1:8081/
      1 threads and 1000 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   105.95ms  108.35ms   1.99s    98.40%
        Req/Sec     9.65k     1.69k   13.66k    80.21%
      271917 requests in 30.09s, 16.10MB read
      Socket errors: connect 0, read 0, write 0, timeout 274
    Requests/sec:   9035.73
    Transfer/sec:    547.67KB

So it's 9k (with a few failed requests though).

This doesn't include any routing, input data parsing, header or cookie processing, etc., but it amazes me how good the server is given that web development is NOT considered a strong part of the language.

The downside of Julia web programming is the number of libraries and tools (e.g. routers, DB connectors, template engines, etc.) - they exist, but are quite behind Python equivalents, so gotchas are expected. Yet I'm quite positive about future of web programming in Julia.

eigenspace · on June 1, 2018

I'm not knowledgeable on this topic at all, but I can tell that the bulk of julia development hasn't involved a lot of thought towards web technologies. So I wouldn't be surprised to learn that the infrastructure just isn't really there to support web servers well in julia.

That said, my personal favourite thing about julia is not actually its speed but the fact that it's an incredibly expressive, flexible and composable language (I'd say julia learned all the mot important lessons lisp had to teach). If I had to guess, I'd think that building the infrastructure for devs to work on web servers from the ground up may very well be easier in julia than in most other langauges, including python. I may be off the mark though.

One other note since it was mentioned elsewhere, there actually is work being done right now to have julia compile to WebAssembly code which could be pretty cool!

rz2k · on June 1, 2018

I remember reading some posts early on, either on Reddit or in the mailing list, where people asked about its suitability for web frameworks and servers. The idea was that what it let you abstract away, and what it made easy to do at a low level would work well in that problem space, too.

However, it was pretty actively discouraged. I suspect they didn't want a lot of voices influencing the direction of the language while it was still being formed. A lot of what make R a poor general purpose language help make it effective in its specific domain for its specific users.

It might be worth people taking another look at it after 1.0 is released, as long as they have low expectations about influencing the direction of the language's development.

st1ck · on June 1, 2018

It'd make sense for numerical computations. Otherwise, there are like 10 faster-than-Python languages I'd consider instead of Julia (incl. Erlang/Elixir, Go, Rust, Java/Scala, Haskell).

wenc · on June 1, 2018

> Python is used for web servers, and is slow

What is the specific context for this? Serving concurrent users? Or raw processing?

If it's the former, there are ways around this. If you'd like to replace Python for this, I'd look at something like Go rather than Julia.

If it's the latter, Julia might help, but there are a bunch of tradeoffs to consider. Julia is a language that was primarily designed for numerical computation.

jb1991 · on June 1, 2018

Honest question, just how fast is Julia anyway? Because performance is typically the main reason anyone would want to use it over alternatives. A lot of people justify Julia because, as noted in other comments, it is both high-level yet claims to be nearly as fast as C:

> "Julia is the fastest modern open-source language for data science, machine learning and scientific computing...with the speed, capacity and performance of C, C++..." [4]

That's a bold claim! And there is much evidence to the contrary. I often find that the language falls into a similar trap of many other languages where the creators are evangelists not quite sharing the whole picture. Writing fast Julia code is not always a pleasant experience as you often need to fight the easier idioms. It is marketed as fast, but really, how fast is it?

[0] http://www.zverovich.net/2016/05/13/giving-up-on-julia.html

"What’s disappointing is the striking difference between the claimed performance and the observed one. For example, a trivial hello world program in Julia runs ~27x slower than Python’s version and ~187x slower than the one in C."

[1] https://www.ibm.com/developerworks/community/blogs/jfp/entry...

"We can code in Julia in a way similar to Python's code. However, that code is slower than it should. One way to speed Julia is to take into account the Fortran ordering it uses by looping on j before looping on i in the second loop. We also add decorators to speed up the code."

[2] https://www.codementor.io/zhuojiadai/julia-vs-r-vs-python-st...

"...comparing R's sorting speeds to Julia's is not the complete story, even though on the surface R appears faster, and from a users' perspective, (once the data is loaded) R is still the king of speed."

Stackoverflow has quite a few user posts [3] from folks trying to get their code to perform at the advertised speed vs popular alternative languages. It's not so easy.

[3] https://stackoverflow.com/questions/20613817/julia-julia-lan...

[4] https://juliacomputing.com/

montalbano · on June 1, 2018

Regarding [0], startup time is a commonly acknowledged issue which affects the 'hello world' benchmark. (Can you even call it a benchmark? Maybe if startup time is what you're comparing then it makes sense, other wise I don't agree.)

I used to call Fortran code from Python for a particularly complex function (Mittag-Leffler function) from Python, I then rewrote everything in Julia and it runs approx 10-20x faster. So there you go, I'm a primary source.

Edit: just for clarity: The Mittag-Leffler function itself runs just ever so slightly faster in Julia than Fortran-called-from-Python (due to the probably inefficient way I was sending and receiving the data to/from a Python subprocess, which itself is an example of the difficulty of combining two languages). It is the program as a whole which runs 10-20x faster.

ziotom78 · on June 1, 2018

My experience tells that computationally intensive codes are roughly 10% slower that their C++/Fortran counterparts. So fare we are implementing a data analysis pipeline for a cosmological telescope (LSPE/STRIP), and we are migrating a number of codes my team developed for the Planck/LFI instrument. As data formats and a few details of the instrument changed, we decided to rewrite everything from scratch instead of manually fix and recheck every line of code, and we chose Julia 0.6.

The 10% figure comes from a comparison of the performance of the codes whose logic has been left virtually unchanged, of course.

What is awesome is the ability to run codes within Jupyter (like Python), and the ability to run "for" loops on large arrays fast and with little memory consumption (unlike Python). The 10% increase in computational time is fully absorbed by the reduced development/validation time!

ChrisRackauckas · on June 1, 2018

This 10% is due to some aliasing optimizations that haven't been added to the compiler yet, at least for the differential equations code we've looked at. But now that the compiler guys have some more free time...

Certhas · on June 1, 2018

This is mirrored in my experience with dynamical simulations and mathematical optimization work.

stabbles · on June 1, 2018

Concerning the "Giving up on Julia" article, it's very dated and dubious. Please read [1].

About the second quote: this is not about Julia being slower than Python; it's about avoiding cache misses. Obviously you should exploit contiguity of the data in memory. Secondly, the 'decorators' like "@inbounds" just ensure that Julia does not redundantly checks bounds, otherwise the inner loop has branches and cannot exploit SIMD.

[1] https://tk3369.wordpress.com/2018/02/04/an-updated-analysis-...

jb1991 · on June 1, 2018

That's a fair rebuttal. But if the original author struggled to get fast Julia working, maybe it's a similar experience to the very many users of Stackoverflow who ask how to get their code to run at marketed speed. It appears to not be obvious or idiomatic in Julia to really get the speed it advertises.

stabbles · on June 1, 2018

Fortunately there is documentation about performance [1], which is about everything from type stability to SIMD.

[1] https://docs.julialang.org/en/latest/manual/performance-tips...

improbable22 · on June 1, 2018

Last week's discussion on matrix multiply:

https://news.ycombinator.com/item?id=17164737

I guess the claim is that you could write all that in Julia, and hope to be competitive, while you could never do this in (pure) Python. It would still be a lot of work, and would require you to think about the cache, there's no way around that.

The selling point here (it seems to me) is that you can easily doodle up the most naive version of whatever algorithm you're thinking about, and then start optimising to the degree needed, where needed, without having to start over in C or something.

jb1991 · on June 1, 2018

I thought Python offers an entry into the performance space via Cython, which seems to nearly map directly to C. Is Cython not a viable alternative to Julia?

simondanisch · on June 1, 2018

I wouldn't say so. Cython is quite literally C for python, basically making all your python code look like C with most disadvantages, while not always reaching C performance.

Julia in comparison is a fully featured language, with a lot to offer - you can write highly generic + fast code in a very high-level way. E.g. have a look at https://medium.com/@Jernfrost/defining-custom-units-in-julia... You can see, that Julia can be more elegant than python while being a lot faster.

ChrisRackauckas · on June 1, 2018

One of the main issues is that, because you compile things separately with a context-switch managed by python in the middle, if you pass a Cython compiled function to code that is calling compiled code (C/Fortran/etc.), then you still get a huge overhead. We tested this with ODE solvers, and things like Numba+SciPy odeint were still about 10x slower than they should be because of this phenomena ([1] mentions some of our tests). In the end, we found a very good reason to make sure the whole stack can compile together!

[1]: http://juliadiffeq.org/2018/04/30/Jupyter.html

improbable22 · on June 1, 2018

Yes, this is aimed at the same people, I believe. Doodle in Python and then work harder on the critical part.

I think this works best when you have some loops pushing arrays of Float64 around. And worst if you'd like to pass some library a million little functions to optimise which depend on some strange data type you just defined... Cython is quite a limited sub-language. But certainly many current Julians made a living this way in the past.

eigenspace · on June 1, 2018

To be fair, the majority of the stackoverflow questions on how to speed up julia are from people brand new to the language coming from Python or whatever. Its relatively easy to optimize julia code, but I think its understandable that someone fresh out of Python might struggle for a little while to get all the performance benefits one might expect.

Especially when you consider what would result from that same user spending that same amount of effort trying to learn C.

oxinabox · on June 1, 2018

I actually could hardly care less about julia's speed. It is fast enough.

What matters to me is that it is actually really nice to write. The syntax is elegant, the community is excellent, and the core of the language's semantics being multiple dispatch is a game changer.

stochastic_monk · on June 1, 2018

This is absolutely correct. The benchmarks they claim are misleading, and under fair conditions, Julia simply isn't the speed/efficiency solution they market themselves to be.

I think that the ML community's decision to write core operations in C or C++ and provide Python wrappers is the way to go about it if you want the flexibility of scripting.

eigenspace · on June 1, 2018

That is absolutley false. Look closely at the 'benchmarks' shown in the above article and tell me if they are at all compelling. They measure Julia's startup time and compilation time. IF you wanted to compare Julia to C there, you'd want to include the time it takes to compile your hello_world script.

Julia's performance optimizations have to far been mostly focused on intensive numerical computations where the startup time and compilation time are merely a constant overhead that are irrelevant to high performance numerics.

civility · on June 1, 2018

I do not have to recompile my C program every time I run it. Suggesting I'd want to include that is just silly.

eigenspace · on June 1, 2018

:shrug:

I don't have to recompile a julia function every time I run it so long as I don't close my julia session. If I do need to close and reopen my julia session a lot for some reason, I'd just statically compile the function. In practice, one rarely needs to close a julia session and recompile functions and if one does do it occassionally, the compile time while a little annoying is not too bad.

Compile time will also be actively worked on to reduce it post 1.0 once all the breaking changes are done and pressing bugs are squashed.

stochastic_monk · on June 1, 2018

No, it's not. My comment is unrelated to startup and the issue in this article but more generally about Julia as a language of failed promises.

The biggest issue with the standard Julia benchmarks is that they're not using what would be optimal C code to compare against.

In addition, if you look at the Julia issues on GitHub, you'll find hundreds of performance regressions where code performs more than 10 times as slow as what they expected/claimed at one point. It's not reliably fast, even when written by Julia experts/developers.

eigenspace · on June 1, 2018

> The biggest issue with the standard Julia benchmarks is that they're not using what would be optimal C code to compare against.

Yes, the C code benchmarks are not optimal. Neither are the julia benchmarks or any of the other languages for the matter. The julia devs made a hard decision with those benchmarks and decided that if they allowed arbitrary optimization, the benchmarks would become more a measure of who spent the most time and knowhow writing the benchmarks for ______ language. Instead, they tried to keep the code for all the languages at a reasonable level and avoided super specialized magic. That may make some uncomfortable, but keep in mind that the julia code used in the benchmarks also has a lot of room for improvement. Some julia devs are absolute wizards and getting performance out of julia code if you let them go crazy.

> In addition, if you look at the Julia issues on GitHub, you'll find hundreds of performance regressions where code performs more than 10 times as slow as what they expected/claimed at one point. It's not reliably fast, even when written by Julia experts/developers.

Julia 0.7 (which is still in its alpha build by the way) included a ground-up replacement of julias iteration protocol and a reworking of a ton of code optimization routines. If you think you or anyone else in the world could take a codebase as large as julia's and replace fundamental parts of it without seeing performance regressions anywhere you're delusional.

There are a number of performance regressions, some mysterious and some not mysterious and they will all be worked on because the julia devs take regressions very seriously. It won't be instantaneous, but I do not doubt that the bulk of them will be eliminated promptly.

There are also orders of magnitude more performance improvements in 0.7-alpha than there are regressions, they just aren't filed as issues. Some of these performance improvements, especially with broadcasting, are state of the art and not seen in other languages.

Calling julia a "language of failed promises" because the alpha build of a pre 1.0 version has some performance regressions is sensationalist and disingenuous.

Certhas · on June 1, 2018

This is ridiculous. On top of eigenspaces comment, the promise was never that any odd Julia code would be fast. You can write untyped/dynamic code and that will be faster than Python but not near C, or you can put a little work into ensuring type stability (from experience this is easier than writing in C from the start, especially as you already have a working prototype that you are iterating in the same language) and you get excellent results.

stochastic_monk · on June 1, 2018

I was very careful to state this the people writing the code I was talking about were Julia experts/developers, not just anyone writing arbitrary Julia code.

When you're doing that much typing, you might as well write C or C++.

ninjin · on June 1, 2018

Okay, with this comment your comment on preferring C++(11?) and pybind11 [1] I think I am finally getting your angle. Let’s see if I can bridge the perspectives.

If I am reading you correctly, you are a library builder, an infrastructure creator, and comfortable caring about the bottom line when it comes to performance. You most likely prototype an algorithm in some high-level language, ensuring that it works, then push it down into C++ in order to make it scale to cool problems, lastly you may create bindings to a higher level programming language so that others without your C++ acumen can benefit from your labour. A lot of great code is written this way, some that I rely on in my day-to-day work would be OpenBLAS, TensorFlow, and PyTorch.

I can only really speak for myself, but I know that there are many like me, we are researchers and to us code is only incidental. We are judged based on our ability to churn out as many papers and results as possible, in as little time as is humanly possible (ever wondered why academic code can be absolutely awful?). We rarely know the structure of the solution a-priori, rather, we start throwing techniques at things and try to make the experiments run. At some point, a (wild?) performance bottleneck appears and we just want to get around it as soon as possible. Now, some like myself have been in the Python/Cython world for years and you can get around a lot of bottlenecks this way. However, it comes at the expense of additional boilerplate and mastering which parts of the Python programming model you must throw overboard, not to mention how you make your Cython code interact with pure Python code from libraries that others have written. This is where Julia shines, it allows you to much easier go between this “productive” and “performance” mode, to me, this is worth its weight in gold. It is not for everyone, but if I ever have a law named after me I would be happy if it was “Nothing is for everyone”.

[1]: https://news.ycombinator.com/item?id=17206370

stochastic_monk · on June 1, 2018

You’re exactly right. Thank you for bridging the gap in perspectives. I used to do a lot in cython but found that the glue code was taking more effort than writing the whole application in C++.

Certhas · on June 2, 2018

Glad that someone managed the translation. Another note: I'm a trained mathematician, I'm not afraid of a type system (nor are the physicists I work with). In fact I was missing one dearly in Python. If that was all that was needed to write C++ instead of Python we all would.

Yet that's how Julia looks to us: Python + a sensible Type system.

eigenspace · on June 1, 2018

I can see how one might naïvely think that, but in my experience and the experience of everyone I’ve talked to who uses julia, gradually improving the performance in your bottlenecks using only Julia is much nicer that just working in C or dropping down from python to Cython.

You can improve much more gradually as needed, retain all the full language features and have much less menta overhead, needing only to keep Julia in your head.

Julia devs make performance regressions when replacing critical central code components just like everyone else does. The advantage is that julia makes it easier to reason about and fix those regressions

simonbyrne · on June 1, 2018

I take it from your GitHub profile that you're an experienced C++ programmer, in which case you're probably correct.

Personally, I still find C++ to be effectively a black box, meaning that if I want to understand it or make changes, I'm at the mercy of the maintainers or willing colleagues.

ninjin · on June 1, 2018

> I think that the ML community's decision to write core operations in C or C++ and provide Python wrappers is the way to go about it if you want the flexibility of scripting.

Isn’t Google’s “Swift for TensorFlow” move in direct opposition to this statement? I think you are right that it gets you 95% of the way, but that the final 5% of performance and portability simply will not be there.

https://github.com/tensorflow/swift/blob/master/docs/DesignO...

ChrisRackauckas · on June 1, 2018

It's not just 5%. If you have any API which takes a higher order function, for example a differential equation or optimization package, then even if your code is compiled and the user's input is compiled, you still have to hit Python in the middle, and that context switch can be the most costly part of an optimized code, making Numba+SciPy about 10x slower than it should be. So yeah, Python + compiled code is not a viable solution in all cases, or in fact what seems to be most of scientific computing (but not data science where things tend to not include function input).

http://juliadiffeq.org/2018/04/30/Jupyter.html

stochastic_monk · on June 1, 2018

There can be a significant hit in performance due to using Python. I generally prefer developing pure C++, but being able to test/prototype in Python is great. pybind11 makes interoperability easy and efficient.

ScottPJones · on June 1, 2018

What about using Cxx.jl? You can have a nice little REPL for your C++ code. I haven't seen pybind11, does it give you that ability?

stochastic_monk · on June 1, 2018

No, but cling [1] does.

I might be interested in trying cxx.jl, thank you for the pointer.

[1]: https://github.com/vgvassilev/cling

j605 · on June 1, 2018

Hello world program measures startup time which is not what julia is optimized for right now. I''ve heard that they are looking into it but it is not as important as startup time in python where we have a lot of python scripts as binaries which are executed repeatedly or interactively(like "hg status").

ChrisRackauckas · on June 1, 2018

Julia v0.7/v1.0 comes with an interpreter which doesn't have any JIT startup time (because it's an actual interpreter). It's slow because it cannot use precompiled code yet, which is why it's not documented, but it's on the "more to come" list.

sivakon · on June 2, 2018

Do you have a link to share? Sounds interesting.

ChrisRackauckas · on June 2, 2018

It's one of the things being developed on the down low right now, but you can find PRs like:

https://github.com/JuliaLang/julia/pull/24473

which are making this:

https://github.com/JuliaLang/julia/blob/master/src/interpret...

a compilation-free version of Julia. At the same people there is discussion about how to cache native compilation results after precompilation, so that way packages can store all of their compiled code and users can directly use it without compilation through the interpreter. This second part isn't done yet, but it'll be interesting to see what happens when it has a truly dynamic + precompiled form.

Certhas · on June 1, 2018

That's a sweeping statement without any backing argument. Others have debunked the blog post enough (even if you seem to want to cling on to it).

I know several efforts that are transitioning Python + C Libraries to Julia. It's simply much much nicer and simpler to write fast code in Julia than in the Python/C paradigm.

Derbasti · on June 1, 2018

This has been my experience as well: Python-style Julia code often does not run faster than in Python. Only of you rewrite it in C-style do you get C-like performance.

This is not necessarily a bad thing. Just an observation from me translating my Python code to Julia.

d--b · on June 1, 2018

Where are we compared to then (http://www.zverovich.net/2016/05/13/giving-up-on-julia.html)?

eigenspace · on June 1, 2018

Theres a discussion on that post in another thread below, but suffice to say the julia community is pretty exasparated with the constant need to respond to that post when its main source of the speed claims was to measure the perofrmance of a hello world program in julia and included the compilation and startup time in the benchmark.

Thats a deceptive and unhelpful 'benchmark', especially when the C++ compilation time was not included in its benchmark. Julia compilation and statup times are slow if you are expecting the feel of the Python interpreter, however, julia's compilation and statup time add a finite, constant overhead to its preformance and so are completely irrelevant for high performance numrical computing.

If you were wanting to spawn many julia instances and make them execute a single command then exit, yes julia would be a terrible choice of that (unless you turn of compilation and use its interpreter or statically compile your program) but for the most part measuring startup time just isn't relevant unless it takes a minute or something absurd.

In julia the only libraries I know of that take a minute to load are plotting libraries (once you make your first plot the library is blazing fast again) and thats been considered such a big issue that a new plotting library Makie is well under way which is supposed to be fully statically compiled in julia in order to slash the first time to plot.

abakus · on June 1, 2018

One thing I always dislike about julia is the .+ .* .< ... all those dots for element-wise operations. It is inherited from Matlab, but is a bad design because more common usage should have shorter operators.

ssfrr · on June 1, 2018

Whether elementwise or linear algebra operations are more common depends on your use case. I’m personally super happy not to need to write np.dot(A, b) when I could just write A*b, or a’b for an inner product, which matches the math and will do the right thing with complex numbers.

eigenspace · on June 1, 2018

Theres an @. macro that will broadcast all the following operations. ie.

exp.(v.^2) .+ (10 .* u)

can be written as

@. exp(v^2) + (10 * u)

with no added runtime cost.

Bromskloss · on June 2, 2018

Ooo, it must be zero-based indexing! Right?!

oxinabox · on June 4, 2018

Zero-based indexing was kinda mentioned in the post. About indexing with `begin`

mafm · on June 1, 2018

Upvote because I think this is the 1st time a posting from the ucc website has made the HN front page.

wirrbel · on June 1, 2018

Julia always was one of the languages I wondered whether I should have a closer look at (do python data-science stuff mostly, so it is an interesting project). However, reading Dan Luu writing about his experiences with members of the Julia community https://danluu.com/julialang/ I try my best to not get anywhere near it. Directly cited:

    Update: this post was edited a bit to remove a sentence about how friendly
    the Julia community is since that no longer seemed appropriate in light of
    recent private and semi-private communications from one of the co-creators
    of Julia. They were, by far, the nastiest and most dishonest responses I've
    ever gotten to any blog post. Some of those responses were on a private
    discussion channel; multiple people later talked to me about how shocked
    they were at the sheer meanness and dishonesty of the responses. Oh, and
    there's also the public mailing list. The responses there weren't in the
    same league, but even so, I didn't stick around long since I unsubscribed
    when one the Julia co-creators responded with something bad enough that it
    prompted someone else to to suggest sticking to the facts and avoiding
    attacks.  That wasn't the first attack, or even the first one to prompt
    someone to respond and ask that people stay on topic; it just happened to
    be the one that made me think that we weren't going to have a productive
    discussion. I extended an olive branch before leaving, but who knows what
    happened there?


    Update 2, 1 year later: The same person who previously attacked me in
    private is now posting heavily edited and misleading excerpts in an attempt
    to discredit this post. I'm not going to post the full content in part
    because it's extremely long, but mostly because it's a gross violation of
    that community's norms to post internal content publicly. If you know
    anyone in the RC community who was there for the discussion before the
    edits and you want the truth, ask your RC buddy for their take. If you
    don't know any RC folks, consider that my debate partner's behavior was so
    egregious that multiple people asked him to stop, and many more people
    messaged me privately to talk about how inappropriate his behavior was. If
    you compare that to what's been publicly dredged up, you can get an idea of
    both how representative the public excerpts are and of how honest the other
    person is being._

johnmyleswhite · on June 1, 2018

As a person who knows both of the people involved in that debacle, I think it's fair to say that there's a lasting interpersonal conflict between two specific people (one of whom is the author of the post you've linked to) that's been framed as a community-wide issue.

montalbano · on June 1, 2018

Without taking a side here (and declaring for the sake of openness that I use Julia regularly and it is very helpful for the scientific work I do), I think it is worth noting that this was from 2014 (the language has moved forward greatly since then) and the inappropriately negative response described seems to be from 1 developer and not representative of the team.

Edit: To add, Julia is now my go-to language for any scientific/numeric programming and would be my number 1 suggestion to anyone else doing similar work.

eigenspace · on June 1, 2018

For what it's worth, I got involved in the Julia community about a year ago and people have been nothing but friendly to me and toleant of my sometimes excessively dumb questions. I don't doubt that some people have gotten into nasty disputes but as far as I can tell, that seems to be pretty far from the norm.

jb1991 · on June 1, 2018

I've seen that post as well, and often wondered if it was fair characterization of overall Julia community (maybe it is), or is the result of a single bad apple at the top that overshadowed an otherwise pleasant community.

Unfortunately, it doesn't take much negative energy to spoil a community, even if it comes from just one person. That it was a language co-creator is troubling but not necessarily a reason to avoid the language if the rest of the community is nice (which maybe it is, maybe it isn't).

However one thing I have noticed is that most language communities do tend to follow an attitude set by the language creator(s). This seems to have played out quite a bit for Clojure, Python, Elm, Elixir, and other languages where, at least to me, the overall shared perspective of the community is closely aligned to the personal attitudes and opinions of the language author(s).

ScottPJones · on June 1, 2018

I don't feel at all that it's a fair characterization of the overall community. I'm sure I've been the target of more of what Dan Luu experienced than anyone else in the Julia community, from the same source, but I haven't let that stop me from contributing as best I can to the community, by answering questions (on Gitter and Discourse), helping with code, and by contributing my own packages to help out in areas where I felt Julia needed some extra attention. I also think that most of the community is more aligned with the example set by Jeff and Viral. Both show great civility, patience, and a willingness to listen to others, in my experience. Finally, with regards to the person that both Dan and I have had problems with (and I have no personal knowledge of that feud, that happened before I had discovered Julia), I respect him a lot (even if he feels I "attack" him when I try to bring up technical issues with code / designs that he's been involved with), I will always be grateful for his role in creating Julia, I think he's a great promoter of Julia, at conferences, etc., and he often has quite a lot of good, well-thought out things to say on GitHub and Discourse.

wirrbel · on June 3, 2018

Don't you ever wonder how many excellent contributions to Julia were not made, because that person overstepping boundaries?

ssfrr · on June 1, 2018

Hopefully I can clarify some of the "maybe it is, maybe it isn't" in this post - I've found the Julia community to be almost universally full of helpful and friendly folks, mostly scientists and/or people with a language nerd bent.

I'm not sure if there's a fair way to quantify drama in a community, but it hasn't been a big issue for me relative to all the positive interactions I've had with other julians.

cormullion · on June 1, 2018

I thinking quoting from just one side of the argument is unfair, particularly if you’re drawing conclusions.