Google Brain team said it considered Julia as well for TensorFlow but -"since its compilation approach is based on type specialization, it may have enough of a representation and infrastructure to host the Graph Program Extraction techniques `they` rely on"[1].
It sounds like they're saying they think it can be used, but "The Julia community shares many common values as with our project, which they published in a very like-minded blog post[1] after our project was well underway."
I think the key point is "after our project was well underway" -- that if they were starting now, they'd likely be consider Julia much more seriously than they did at that time, which was before projects like Zygote got started.
Later on from your link:
"[We] picked Swift over Julia because Swift has a much larger community, is syntactically closer to Python, and because we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster."
As always, a disclaimer, I am deeply involved in the Julia community so beware of my biases.
Honestly, I think the best argument (and the only one I buy) for Swift is the latter part: “…we were more familiar with its internal implementation details - which allowed us to implement a prototype much faster.” – just like I said about a year ago [1]. If you are sitting on a team deeply familiar and passionate about a language – Swift – what kind of managerial fool would not let them take a stab at it? Especially with Lattner’s excellent track record.
The ideas behind Zygote dates to somewhere around spring 2017, but I think it took about a year to hammer out compiler internals and find time to hack, so you are still right that nothing was public when Google settled on Swift – I think there been at least one Mountain View visit though over XLA.jl, but do not quote me on that one.
The race is still on and I am looking forward to seeing what all the camps bring to this budding field. I have worked with an excellent student on SPMD auto batching for his thesis project and we now have some things to show [2]. This is still a great time to be a machine learning practitioner and endlessly exciting if you care about the intersection between Machine Learning and programming languages.
My only request would be for Jeremy to explain “Swift for TensorFlow is the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language that is designed from the ground up for performance.” to me. Is it the “serious” and/or “widely used” subset where the Julia camp is disjoint? =)
Nice comment, thanks. While I have several years of professional TensorFlow under my belt, since last fall I have spent a fair amount of time at home trying Julia + Flux. I find Flux very nice to use but it is a chicken and egg problem: I think Flux needs a lot more users in its community to really take off. TensorFlow and Pytorch dominate mindshare.
It is indeed a chicken and egg problem, but that is always the case and is mostly an argument for the status quo. When I teach there is always a handful of students asking “Why examples in Julia? Python is what is used in industry.” and my response is always the same: “Yes, true, and I could have used the same argument against Python in favour of Java in 2004. I think Julia has a reasonable potential of being ‘the future’ based on my experience as a scientist and programmer, so my hope is that I am giving you glimpse of the future. Besides, are you not at a university to learn a diverse toolbox rather than being cast into an employee mould?”.
There must be points when a technology takes off and becomes mainstream, what predates those points? That to me, this is the interesting question. In 2015 Torch (Lua) dominated the mindshare, why did TensorFlow succeed then? I think Lua itself caused it, lack of a coherent object model, etc. – sure as heck it was not the speed as I joked around by writing `import TensorFlow as TensorSlow` in my scripts for at least a year past the initial release. There was resistance against writing Python bindings for Torch, but in the end it happened and PyTorch was born; at this point TensorFlow dominated the mindshare. Why have PyTorch now become the favoured framework among all my colleagues and students then? Despite them being solidly in the TensorFlow camp prior to this. I think the answer is eager execution and the move in TensorFlow 2.0 to mimic exactly this speaks in my favour. So what would the Julia moment be then? If I knew, I would tell you, but I and several others in the Julia community are at least hard at work cracking this nut.
Just like I said a year ago, I am biased in favour of the bazaar and big tent that Julia represents. Swift will not have physicists, mathematicians, ESA employees, HPC people, etc. present and I would miss them as they bring wonderful libraries and viewpoints to the Julia bazaar. For example, I think TensorFlow with its initial graph model was designed that way precisely because it fit the mindset of non-practitioners with a compiler background – or perhaps it was the desire to “write once, deploy anywhere”? PyTorch could then take a big piece of the pie because they saw eagerness to be essential due to their academic/practitioner background. Only time will tell who is right here and I think me favouring Julia is not a safe bet, but certainly a reasonable one given the options available.
Great response, thanks! I am sure Julia will keep growing in use and influence. BTW, really off topic, but I am sometimes inconvenienced by the opposite problem of being very used to using older languages like Common Lisp (and to a lessor degree Prolog) that are still very relevant but are losing mind share. On the other hand just this hour an old customer contacting me to do work in Common Lisp and Prolog so there is still some interest in older languages.
Do you think this might just be people wanting something familiar "at the bottom"?
E.g. both Swift and Julia (and Rust, and Clang ...) have a common underlying backend (LLVM) in common, whereas in Common Lisp, it's ... Common Lisp all the way down, and it could potentially be a much better world, but it would just require duplicating too much stuff to be a viable candidate?
Julia is GC based language. If you are ok with then there is whole world of other languages with similar disadvantages. For high performance numerical computing most GC based languages won’t cut it and that fact is typically swept under the rug by creating C++ implementation wrapped by your GC based language bindings.
High performance numerical computing is one area that garbage collection doesn't really matter though, unlike real-time applications that needs consistency more than raw speed. Some occasional milliseconds (or even seconds) long stop-the-world GC won't really affect anything in an hour long or days long simulation, training or calculation. Not to mention you don't really allocate that much in many of those applications, in ML for example is not unusual to allocate all tensors once and just update the values during the training loop (usually on GPUs or TPUs), so there won't be anything to even trigger GC.
And for Julia in particular, there isn't really a need for bindings over C/C++ (the whole purpose of the language is not requiring another language for performance). Many of the most popular libraries for numeric computing are 100% written Julia, including Flux.jl [1] for Machine Learning and DifferentialEquations.jl [2]. Plus even real-time applications can be done with careful programming, since Julia allows writing programs with either no or minimal allocations, for example [3].
One can argue that automatic reference counting is a form of garbage collection. It is, however, deterministic and that’s why some engineers tend to prefer it.
It is not deterministic in the presence of deep nested data structures like acyclic graphs, where the amount of time running cascading deletes canny be fully calculated. Likewise the amount of stack space cannot be fully determined. Finally, sharing pointers across threads introduces non-deterministic cache locks and delays.
Herb Sutter has a quite good CppCon talk about it.
Many engineers prefer it due to cargo cult and anti-tracing GC bias, despite years of CS papers proving the contrary since the early 80's.
It is deterministic wrt. the object lifetime which is exactly what you need if your "delete" is actually controlling access to some shared resource that must be freed promptly when it goes out of use. The drawbacks you mentioned are largely-unavoidable consequences of that requirement, and they mostly matter when the allocation graph is large and complex. It's why obligate RC as found in Swift is likely a bad idea, and even inferior to obligate tracing GC - but other languages can do better than that, e.g. introducing arenas that can be used to allocate or free objects as a group, in a single operation.
Plenty of tracing GC also offer the same control over arenas and deterministic destruction, one just needs to actually use them, e.g. D, Modula-3, Mesa/Cedar, Active Oberon, Eiffel, System C#, C# 7.3/8...
See links below. Did you look at or consider Julia?
--
https://julialang.org/blog/2018/12/ml-language-compiler
https://arxiv.org/abs/1810.07951