The syntax is quite verbose. Anyway to make it better with macros ? Coming from number crunching world makes me wish that the syntax for list and arrays in OCaml were swapped.
On a different note, I hope Owl allows flexibility over memory layout in future. Inter-operating with Fortran is often necessary. Besides that, compile time reshaping n-dimensinal-arrays is very useful for having efficient code. As far as I know Julia can do this with macros, may be without macros too.
> OCaml’s Bigarray can further use kind GADT to specify the
> number type, precision, and memory layout. In Owl, I only
> keep the first two but fix the last one because Owl only
> uses C-layout, or Row-based layout in its implementation.
There are some change in the pipes that should make the syntax for slicing lighter in OCaml 4.10 by eliminating the need of the outermost [| |] or [ ]. For instance,
let a = x.%{ [|2; 3; 4|] }
x.%{ [|2; 3; 4|] } <- 111.
let a = x.${ [[0;4]; [6;-1]; [-1;0]] }
might be simplified to:
let a = x.%{ 2; 3; 4 }
x.%{ 2; 3; 4 } <- 111.
let a = x.${ [0;4]; [6;-1]; [-1;0] }
It is possible to go further with ppxs(OCaml version of macros), but at a greater readability cost.
I think flexibility becomes a double edged sword over time: when you need to read, understand and modify some code with overloaded operators it can take longer to figure out what's happening.
> I think flexibility becomes a double edged sword over time
Agreed in general.
For the specific case of number crunching though, sooner or later you would like to call out to a Fortran library. That and with DNNs being so popular now, flexibility over how 'tensors' are laid out in memory by an upstream library or how it has to be laid out to call another library is quite crucial.
> when you need to read, understand and modify some code with overloaded operators it can take longer to figure out what's happening.
If overloading has been a resounding success anywhere that would be in scientific computing. Unless someone goes overboard it really helps rather than hinders understanding by matching traditional mathematical notation.
I had non arithmetic syntax in mind when making this comment, such as indexing. Those can be simple as in get the i'th element or get elements corresponding to a mask etc. This makes it easy to write, less easy to know what's actually happening later.
How does Owl handle representing arrays of Float64s (for example) in BLAS-compatible format? That is, as contiguous memory blocks, instead of having each float value individually heap allocated and boxed with the array being an array of pointers to these values. That ability seems like the most basic requirement for a language in which scientific computing is done—if for no other reason than to call BLAS and LAPACK and other Fortran/C/C++ libraries. Vanilla OCaml doesn't support this (as far as I'm aware), so you'd need something like NumPy which grafts typed arrays onto Python, but for OCaml. Of course, OCaml already has typed arrays, unlike Python, there's just a legacy insistence that all collections work with pointers to values, since other wise you might need to compile generic code more than once (god forbid). Is such an "efficient array library" part of Owl? Any tidbits on how it works?
Static typing. I'm doing a image-related and ml-related research at the moment, and I'm a seasoned OCaml user. So I've tried python for a bit, it was such a pain so I've decided to stick with owl.
Without static typing, the discoverability is so low so that you're literally feeling pain tinkering with python. REPL experience with static types is so much better. With numpy and scipy, I had to stick to documentation all the time even to do the most trivial things. What kinds of arguments could I pass to the plotting function? Read the docs. Does this work with dense or sparse matrix? Does it work with array? Read the docs.
In OCaml you could derive most of the information from the type, for example a plotting api
> Without static typing, the discoverability is so low so that you're literally feeling pain tinkering with python.
> REPL experience with static types is so much better. With numpy and scipy, I had to stick to documentation all the time even to do the most trivial things.
> What kinds of arguments could I pass to the plotting function? Read the docs. Does this work with dense or sparse matrix? Does it work with array? Read the docs.
I don't think that has anything to do with static vs dynamic typing. I can do all of that in Julia which is a dynamically typed language.
1) I can start typing a function name and hit tab to complete it, or show matching alternatives with possible argument combinations and their types listed.
2) As I start typing arguments I can hit tab more times to see matching completions. The REPL will take into account the type of arguments I've already given when offering possible completion alternatives.
3) If I got a type, which I am not sure what I can do with I can write methodswith(sometype) to get a list of all functions taking an argument of type sometype.
4) Wonder if something works with dense, sparse, arrays etc? Check the function signature. Usually it will say AbstractArray and sparse matrix and dense matrix are subtypes of AbstractArrays. Not sure, you can quickly check that on the REPL by asking AbstractSparseArray <: AbstractArray, or you just look at subtypes of AbstractArray with subtypes(AbstractArray).
Having said that I might want to check out Owl in the future. I have had a number of temporary engagements with statically typed functional languages. Haskell was interesting, but I found it to be impractical and too much of a time investment. OCaml may be more pragmatic, but it has often seemed a bit verbose to me.
Yea as someone with a little Julia experience looking at this, OCaml just seems to have a lot of boilerplate in weird areas.
The first thing I do when evaluating languages for myself is to look at processing text files line by line or all at once and splitting strings. If that is complicated, then the language might not be well suited to my needs.
With types, a repl, native support for scientific computing and sparse matrices, LLVM JIT, LLVM assembly code inspector, all kinds of macros and support for macros, s-expression macro inspector, speed, easy package manager, large community, good graphics packages...etc, I'm really liking Julia.
It's not Julia's dynamic typing, but Julia's multimethod approach. For each function you don't have one set of arguments that must be generic enough to cover all cases (for example f(number, number)), but you have one visible implementation for each combination of types (f(integer, complex), f(complex, integer), f(T, T) where T <: Number...) which makes it easier to know what the function actually covers. And the REPL (and editors plugins) has some good tools for searching for available methods, documentation and source code.
Though while Julia has even sum and product types, it's not common for people to go to the level of detail of an ML language (plus you can't dispatch on the values of a struct), and there is no formal way to annotate an implicit interface for now, so each method will usually not be as clear from types alone as Ocaml.
In Julia every function has a number of associated variants. We call them methods, not to be confused with methods in OOP.
So for every function there is a table of methods. Each method takes a different number of arguments or arguments with different types.
The REPL is able to introspect this table. So when you type a function name in Julia and hit tab, it will essentially dump this table. If you start filling in some arguments, it will use the types of those arguments to filter this table, showing you the only remaining choices.
You can even go in reverse. The function `methodswith` allows you to provide a type and Julia will search through the methods of every function to show every method accepting that type as one of its arguments.
Julia stores a lot of metadata with object and functions. You will be surprised how powerful this system is once you start using it. E.g. you can jump to the definition of a function, that was generated on the fly in a for loop. The function when created keeps track of what file and line number it was created, even if that happened through meta programming.
My understanding/experience is that Julia has optional typing. Meaning it's dynamically typed, but supports type annotation that often improves performance and can be used to enforce types (I think).
A lot of Julia code looks statically typed, but if you want to code "pure" dynamically (e.g., for prototyping or just because it's more convenient or better for whatever reason) in style you can. Type annotation is seen as increasing information for the compiler to use, and to increase clarity in specification, but not a necessity.
This is different from other dynamic languages I'm familiar with where type specification and annotation isn't built in to the same extent, and different from static languages that require type specification all the time.
To me, Julia's approach feels the best of the languages I've used. I've grown to like statically typed languages more over time, but there are some situations where it can create huge headaches (e.g., where the type structures of a library, etc. are poorly organized or unclear).
Actually, type annotations in Julia do not improve performance (and in some pathological cases can even reduce performance). The Julia JIT compiler will always infer the type at compile-time regardless of annotation and will (almost) always produce the optimal code.
The reason for type annotations are for the multiple dispatch (multimethods), documentation, to deliberately restrict the polymorphism of a function and for the rare times when the compiler will not be able to infer the best type.
Interesting. I swear for a long time this (the assertion that type annotations can improve performance) was in the Julia documentation, so much so I stopped reading it. But maybe something changed? There have been a lot of changes.
Julia does not have optional typing. Type annotations are used to specify what function implementation applies to a specific argument types.
E.g. if I write a function definition as foo(x::Int, y::String) that means that code is run whenever the first to arguments are Int and String. While if another definition is specified as foo(x::DateTime, y::Float64), means that this other definition applies to when the arguments are dates and floating point values.
Types are not used to improve performance. Rather they are a necessity to tell Julia what chunk of code should be run for particular types.
Often you use it to narrow down what types a function applies to. E.g. bar(x::Number) means bar function requires some sort of number. You cannot provide a string. However the compiler will not warn you if you use a string in a bar call. Instead you get an exception at runtime telling you there are no variants of bar, which takes string as argument.
Multiple dispatch is so helpful when designing programs and interfaces. It’s one of the biggest things I miss from Erlang/Elixir when I’m using JS/Ruby, besides pattern matching and function guards.
JuliaPro IDE apparently has good completion, plus the language might be dynamic, but uses the same approach as Dylan regarding type inference, meaning although dynamic it infers possible types from context.
Obviously not if you are comparing it to the OCaml bytecode compiler, now doing a full AOT compilation to native code depends pretty much how those 3rd party libraries are delivered and optimization levels being used.
A lot of those benefits are not direct properties of static typing. For example, common lisp(s) can have a very nice REPL experience like this. With numeric stuff you have probably typed the interface anyway, and your environment can interrogate that. I always liked CL for research code for this reason.
Sorry, I meant type inference. In python you could import all the data and deal with types later. Things that aren’t the expected type can be dealt with individually. It’s not pretty but it’s fast and it works.
Have you experienced any problems with static typing in these situations? I appreciate the value of static typing but I’m not sure if it offers substantial benefit when working interactively with data.
“The flexibility of dataframe largely comes from the dynamic typing inherently offered in a language. Due to OCaml’s static type checking, this poses greatest challenges to Owl when I was trying to introduce the similar functionality.”
“To be efficient, Dataframe only takes maximum the first 100 lines in the CSV file for inference.
If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values.
If the types have been decided based on the first 100 lines, any following lines containing the data of inconsistent type will be dropped.”
> In python you could import all the data and deal with types later.
You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.
> deal with types later
What does it mean? Dynamic typed language is still typed, all the expressions have types.
Just like in Python, you can define types in-place with polymorphic variants and objects, so OCaml would infer their types.
let instant_complex = object method re = 3.14 method im = 0.0 end
would infer
instant_complex : < re : float; im : float >
> If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values
Yeah, it's soundness vs completeness. If you want to, you could make your field optional, or made them a subtype of object, which could be None. There is no difference from Python here.
It's just a choice made in favor of soundness and convenience (because using `int option` in case of missing ints would be inconvenient for the most part).
Nobody prohibits Python-like solution in OCaml
class virtual number = ...
let none : number = object .. end
class float = object inherit number ... end
etc. Dynamic typing could be easily replicated within any static language, it's just not why people use static languages, aka soundness.
>You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.
pandas (Python) has the upper hand here. A lot, if not most, real world data will have values of the wrong type interspersed. pandas will still let you read in the table and then deal with these problematic values. For example, reading in the data and then dropping all values that don't conform to the type that is expected could likely be done in 2-3 lines.
But the advantage GP may be speaking of is that you can still do a lot of useful stuff with the data even if you leave the bad values in there.
You could of course parse every column as a string and then cast to an appropriate type at run time interactively. For timestamps and such pandas can look at the data to figure out the exact datetime format. Not sure what one loses in OCaml especially if one is working at the repl.
A large part of benefits originate from OCaml language per se IMO. Owl inherits these advantages for free. E.g., taking advantage of various compiler backends so you can (almost) use the same code for both web backend and frontend if you are a full-stack developer.
From numerical lib designers' perspective, a holistic design (note which does not necessarily mean a monolithic lib) leads to a more coherent and consistent design (e.g. APIs), which makes a large software system easier to optimise and maintain. As a young and experimental system, Owl seems doing a quite decent job so far IMO.
However, from a (broad) users' perspective, being very honest, I never believe which one should replace which despite how much I love Owl myself. Each tool has own pros and cons deeply root in its internal design, depends on the language it uses, its design assumptions and goals. I think the choice of tools is really a matter of project requirements, coders' own skill, personal taste, management choice, as well as the context (i.e. what kind of apps you are going to build for whom and where).
The motivation section of the documentation covers this in detail [1].
From what I understand, the key points are:
1. The author claims that various libraries in Python (Scipy, Pandas, etc.) have a large amount of duplicated code and overlap, if you look deeply in the code. Having one library with a holistic design prevents this overlap and allows for easier optimisation.
2. He further claims that because "numerical computing is built atop of a small amount of abstract number types (e.g. real and complex), then derives a fat set of advanced numerical operations for various fields", unlike systems programming, holism, such as in OWL, is preferable to reductionism.
For me, as a Scipy/Flask/Postgres user it would be about completing the stack. Python’s great but for core things I’d love to have an ML-inspired alternative that’s even half as convenient. I watch this space with great interest.
A great choice of companion language to your Python stack would be Nim: https://nim-lang.org/ (not necessarily ML-inspired) but extremely productive and extremely fast. Bonus, v1.0 is right around the corner for Nim. If you're willing to venture out to a place with an evolving scientific ecosystem, Nim is a great choice to do scientific computing as part of a CPython stack (easy integration both ways since Nim compiles to both C/C++).
I'm a fan of functional languages and believe their implementation choices induce a unique approach to problem solving. Not necessarily better but different, which is enough on its own to justify such projects. While I prefer the functional approach, I'm well aware that's a large part just personal preference. What's true however, is every language makes certain things easier and other things difficult as a result of what's prioritized. The locale of functional and typed languages is less visited and worth exploring.
By keeping Part 1 as a placeholder they are potentially missing out on a lot of new users who are not familiar with functional programming but are willing to learn it through the use case of scientific computing.
Ocaml and F# are great languages but, rather than lament what language this excellent math library/framework is implemented in, lament that there is no F# equivalent or that projects like Owl outside Python don't really get the attention they need. While having multiple such frameworks per language makes little sense, each language having its own mathlib ecosystem makes a lot of sense from a diversity perspective.
We should want to see examples grow not just for F# or Ocaml but also Haskell, Rust, Scala and others. Then we should prioritize as much as possible, inter-operation of libraries, model definitions and outputs.
>Would've been less work for them to write nice functional wrappers around existing .NET libraries.
What? Why would they do it? Unix is a standard in the field of scientific computations, so wrapping blas and lapack totally makes sense. Besides, OCaml already had good blas, lapack bindings.
>OCaml works great in windows and have good .Net bindings.
Perhaps I should clarify. While the language itself might work, the ecosystem didn't seem great. The last time I checked (less than a year ago), opam was mostly broken on Windows. opam is the officially recommended package manager. I didn't want to invest in a system where the official support was virtually nonexistent. Looking at it now, there does seem to be a port of opam for Windows, and the first commit in that repo was less than a year ago, so quite possibly after I last checked.
If it works, I may just switch to it (although in my opinion any software requiring cygwin still means poor Windows support...).
>Does it have bindings for the majority of popular libs? For example are there Gstreamer and Gtk F# bindings? Do they work on both .Net Core and Mono?
There is GTK#[0], which claims support for both. There is Gst#[1], although they claim it is in early stage development.
>The last time I checked (less than a year ago), opam was mostly broken on Windows
opam had windows version based on cygwin for a long time [1].
> any software requiring cygwin still means poor Windows support
And software requiring .Net has good linux support? Since when requiring an additional runtime means poor support?
>While the language itself might work, the ecosystem didn't seem great.
Sure, just like .Net ecosystem, which is totally absent on linux. No wpf, media foundation. And bindings to local facilities, like gstreamer, are lacking.
Yet the only real metric of the maturity of the support is the fact that people use it in production. And people heavily use OCaml on windows in production.
Which ones ? A free one analogous to libmath that works with float32 would be a good start.
Am serious, would love to know of some .Net libraries for number crunching that one would miss if one were to move out of the .Net platform. Or did you mean non-numeric libraries ?
That's the problem. Everybody has to guess. Everyone thinks there is something out there that somebody else knows. This is a problem I dont have in some of the other ecosystems.
That I would have to (i) write my own cos or (ii) take a dependency on a commercial 3rd party library that will likely not be around for long or (iii) call a native library through and FFI, for something as simple as a cos of float32 is not terribly exciting.
I have to guess because that kind of work was only relevant to me 5 years ago, so I lost track of what everyone doing data analysis with .NET is using nowadays.
Oh I have been guessing all year long for several years although it is very relevant to what I do.
BTW I did not intend to mean that you are being unhelpful. Scientific computation just isnt a first class citizen of the .net ecosystem. Either you have to write your own or take a dependency on a commercial library for even the simplest stuff and pray that they will be around.
http://ocaml.xyz/chapter/slicing.html
The syntax is quite verbose. Anyway to make it better with macros ? Coming from number crunching world makes me wish that the syntax for list and arrays in OCaml were swapped.
On a different note, I hope Owl allows flexibility over memory layout in future. Inter-operating with Fortran is often necessary. Besides that, compile time reshaping n-dimensinal-arrays is very useful for having efficient code. As far as I know Julia can do this with macros, may be without macros too.
> OCaml’s Bigarray can further use kind GADT to specify the
> number type, precision, and memory layout. In Owl, I only
> keep the first two but fix the last one because Owl only
> uses C-layout, or Row-based layout in its implementation.
> See the type definition in Ndarray module.