Hacker News new | past | comments | ask | show | jobs | submit login

What's the one-minute pitch for using this, over say Python or Julia? I couldn't figure this out from a quick skim.



Static typing. I'm doing a image-related and ml-related research at the moment, and I'm a seasoned OCaml user. So I've tried python for a bit, it was such a pain so I've decided to stick with owl.

Without static typing, the discoverability is so low so that you're literally feeling pain tinkering with python. REPL experience with static types is so much better. With numpy and scipy, I had to stick to documentation all the time even to do the most trivial things. What kinds of arguments could I pass to the plotting function? Read the docs. Does this work with dense or sparse matrix? Does it work with array? Read the docs.

In OCaml you could derive most of the information from the type, for example a plotting api

http://ocaml.xyz/apidoc/owl_plot.html

Much of this is simply obvious within a REPL session without looking at the doc. Simple calling a function name would show you its type.


> Without static typing, the discoverability is so low so that you're literally feeling pain tinkering with python. > REPL experience with static types is so much better. With numpy and scipy, I had to stick to documentation all the time even to do the most trivial things. > What kinds of arguments could I pass to the plotting function? Read the docs. Does this work with dense or sparse matrix? Does it work with array? Read the docs.

I don't think that has anything to do with static vs dynamic typing. I can do all of that in Julia which is a dynamically typed language. 1) I can start typing a function name and hit tab to complete it, or show matching alternatives with possible argument combinations and their types listed.

2) As I start typing arguments I can hit tab more times to see matching completions. The REPL will take into account the type of arguments I've already given when offering possible completion alternatives.

3) If I got a type, which I am not sure what I can do with I can write methodswith(sometype) to get a list of all functions taking an argument of type sometype.

4) Wonder if something works with dense, sparse, arrays etc? Check the function signature. Usually it will say AbstractArray and sparse matrix and dense matrix are subtypes of AbstractArrays. Not sure, you can quickly check that on the REPL by asking AbstractSparseArray <: AbstractArray, or you just look at subtypes of AbstractArray with subtypes(AbstractArray).

Having said that I might want to check out Owl in the future. I have had a number of temporary engagements with statically typed functional languages. Haskell was interesting, but I found it to be impractical and too much of a time investment. OCaml may be more pragmatic, but it has often seemed a bit verbose to me.


Yea as someone with a little Julia experience looking at this, OCaml just seems to have a lot of boilerplate in weird areas.

The first thing I do when evaluating languages for myself is to look at processing text files line by line or all at once and splitting strings. If that is complicated, then the language might not be well suited to my needs.

With types, a repl, native support for scientific computing and sparse matrices, LLVM JIT, LLVM assembly code inspector, all kinds of macros and support for macros, s-expression macro inspector, speed, easy package manager, large community, good graphics packages...etc, I'm really liking Julia.


int32<->int64 boundaries deep in python models has cost my team many an hour in debugging


Ugh dont remind me, seems we are siblings in pain.


Although Julia is a dynamic language, the way it uses type inference and type annotations, you can also achieve a similar experience.

Naturally OCaml benefits from almost 20 years existence.


How does Julia’s dynamic typing augment discoverability? Does it magically tie into a code completion engine somehow?


It's not Julia's dynamic typing, but Julia's multimethod approach. For each function you don't have one set of arguments that must be generic enough to cover all cases (for example f(number, number)), but you have one visible implementation for each combination of types (f(integer, complex), f(complex, integer), f(T, T) where T <: Number...) which makes it easier to know what the function actually covers. And the REPL (and editors plugins) has some good tools for searching for available methods, documentation and source code.

Though while Julia has even sum and product types, it's not common for people to go to the level of detail of an ML language (plus you can't dispatch on the values of a struct), and there is no formal way to annotate an implicit interface for now, so each method will usually not be as clear from types alone as Ocaml.


In Julia every function has a number of associated variants. We call them methods, not to be confused with methods in OOP.

So for every function there is a table of methods. Each method takes a different number of arguments or arguments with different types.

The REPL is able to introspect this table. So when you type a function name in Julia and hit tab, it will essentially dump this table. If you start filling in some arguments, it will use the types of those arguments to filter this table, showing you the only remaining choices.

You can even go in reverse. The function `methodswith` allows you to provide a type and Julia will search through the methods of every function to show every method accepting that type as one of its arguments.

Julia stores a lot of metadata with object and functions. You will be surprised how powerful this system is once you start using it. E.g. you can jump to the definition of a function, that was generated on the fly in a for loop. The function when created keeps track of what file and line number it was created, even if that happened through meta programming.


My understanding/experience is that Julia has optional typing. Meaning it's dynamically typed, but supports type annotation that often improves performance and can be used to enforce types (I think).

A lot of Julia code looks statically typed, but if you want to code "pure" dynamically (e.g., for prototyping or just because it's more convenient or better for whatever reason) in style you can. Type annotation is seen as increasing information for the compiler to use, and to increase clarity in specification, but not a necessity.

This is different from other dynamic languages I'm familiar with where type specification and annotation isn't built in to the same extent, and different from static languages that require type specification all the time.

To me, Julia's approach feels the best of the languages I've used. I've grown to like statically typed languages more over time, but there are some situations where it can create huge headaches (e.g., where the type structures of a library, etc. are poorly organized or unclear).


Actually, type annotations in Julia do not improve performance (and in some pathological cases can even reduce performance). The Julia JIT compiler will always infer the type at compile-time regardless of annotation and will (almost) always produce the optimal code.

The reason for type annotations are for the multiple dispatch (multimethods), documentation, to deliberately restrict the polymorphism of a function and for the rare times when the compiler will not be able to infer the best type.


Interesting. I swear for a long time this (the assertion that type annotations can improve performance) was in the Julia documentation, so much so I stopped reading it. But maybe something changed? There have been a lot of changes.

Then again, maybe I just misunderstood something.


Julia does not have optional typing. Type annotations are used to specify what function implementation applies to a specific argument types.

E.g. if I write a function definition as foo(x::Int, y::String) that means that code is run whenever the first to arguments are Int and String. While if another definition is specified as foo(x::DateTime, y::Float64), means that this other definition applies to when the arguments are dates and floating point values.

Types are not used to improve performance. Rather they are a necessity to tell Julia what chunk of code should be run for particular types.

Often you use it to narrow down what types a function applies to. E.g. bar(x::Number) means bar function requires some sort of number. You cannot provide a string. However the compiler will not warn you if you use a string in a bar call. Instead you get an exception at runtime telling you there are no variants of bar, which takes string as argument.


Multiple dispatch is so helpful when designing programs and interfaces. It’s one of the biggest things I miss from Erlang/Elixir when I’m using JS/Ruby, besides pattern matching and function guards.


JuliaPro IDE apparently has good completion, plus the language might be dynamic, but uses the same approach as Dylan regarding type inference, meaning although dynamic it infers possible types from context.

Plus there are always type annotations as well.


I'm sure it also replicates the near instant compile times.


Obviously not if you are comparing it to the OCaml bytecode compiler, now doing a full AOT compilation to native code depends pretty much how those 3rd party libraries are delivered and optimization levels being used.


A lot of those benefits are not direct properties of static typing. For example, common lisp(s) can have a very nice REPL experience like this. With numeric stuff you have probably typed the interface anyway, and your environment can interrogate that. I always liked CL for research code for this reason.


How do you deal with data you haven’t seen before? Importing a large csv for example where a lot of the data is the wrong type or the wrong format.


You can read, say, the top N lines of the file and perform type inference on the column data to determine the type of the column. This is how it's done in the F# "type provider" for CSV: https://fsharp.github.io/FSharp.Data/library/CsvProvider.htm...


I actually made just that for OCaml: https://github.com/nv-vn/OCamlCSVProvider


Like in any other language, write a parser? What do you mean by the wrong format?


Sorry, I meant type inference. In python you could import all the data and deal with types later. Things that aren’t the expected type can be dealt with individually. It’s not pretty but it’s fast and it works.

Have you experienced any problems with static typing in these situations? I appreciate the value of static typing but I’m not sure if it offers substantial benefit when working interactively with data.

“The flexibility of dataframe largely comes from the dynamic typing inherently offered in a language. Due to OCaml’s static type checking, this poses greatest challenges to Owl when I was trying to introduce the similar functionality.”

“To be efficient, Dataframe only takes maximum the first 100 lines in the CSV file for inference. If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values. If the types have been decided based on the first 100 lines, any following lines containing the data of inconsistent type will be dropped.”

http://ocaml.xyz/chapter/dataframe.html


> In python you could import all the data and deal with types later.

You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.

> deal with types later

What does it mean? Dynamic typed language is still typed, all the expressions have types.

Just like in Python, you can define types in-place with polymorphic variants and objects, so OCaml would infer their types.

    let instant_complex = object method re = 3.14 method im = 0.0 end
would infer

    instant_complex : < re : float; im : float >
> If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values

Yeah, it's soundness vs completeness. If you want to, you could make your field optional, or made them a subtype of object, which could be None. There is no difference from Python here.

It's just a choice made in favor of soundness and convenience (because using `int option` in case of missing ints would be inconvenient for the most part).

Nobody prohibits Python-like solution in OCaml

    class virtual number = ...

    let none : number = object .. end

    class float = object inherit number ... end
etc. Dynamic typing could be easily replicated within any static language, it's just not why people use static languages, aka soundness.


>You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.

pandas (Python) has the upper hand here. A lot, if not most, real world data will have values of the wrong type interspersed. pandas will still let you read in the table and then deal with these problematic values. For example, reading in the data and then dropping all values that don't conform to the type that is expected could likely be done in 2-3 lines.

But the advantage GP may be speaking of is that you can still do a lot of useful stuff with the data even if you leave the bad values in there.

For all its warts, pandas really is amazing.


You should try using R. You're missing out.


In the context of this conversation (dynamic vs static), R is in the same boat as Python.

And I don't know about today, but when I used pandas years ago, it was much faster than R.


R's data.table package is much faster and much nicer to use than pandas.


In Python (Pandas) you pass the type specification like this when calling read_csv()

---

dtype : Type name or dict of column -> type, optional

Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’}

--- https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

You could of course parse every column as a string and then cast to an appropriate type at run time interactively. For timestamps and such pandas can look at the data to figure out the exact datetime format. Not sure what one loses in OCaml especially if one is working at the repl.


A large part of benefits originate from OCaml language per se IMO. Owl inherits these advantages for free. E.g., taking advantage of various compiler backends so you can (almost) use the same code for both web backend and frontend if you are a full-stack developer.

From numerical lib designers' perspective, a holistic design (note which does not necessarily mean a monolithic lib) leads to a more coherent and consistent design (e.g. APIs), which makes a large software system easier to optimise and maintain. As a young and experimental system, Owl seems doing a quite decent job so far IMO.

However, from a (broad) users' perspective, being very honest, I never believe which one should replace which despite how much I love Owl myself. Each tool has own pros and cons deeply root in its internal design, depends on the language it uses, its design assumptions and goals. I think the choice of tools is really a matter of project requirements, coders' own skill, personal taste, management choice, as well as the context (i.e. what kind of apps you are going to build for whom and where).

liang


The motivation section of the documentation covers this in detail [1].

From what I understand, the key points are:

1. The author claims that various libraries in Python (Scipy, Pandas, etc.) have a large amount of duplicated code and overlap, if you look deeply in the code. Having one library with a holistic design prevents this overlap and allows for easier optimisation.

2. He further claims that because "numerical computing is built atop of a small amount of abstract number types (e.g. real and complex), then derives a fat set of advanced numerical operations for various fields", unlike systems programming, holism, such as in OWL, is preferable to reductionism.

[1]: http://ocaml.xyz/chapter/intro.html#motivation


For me, as a Scipy/Flask/Postgres user it would be about completing the stack. Python’s great but for core things I’d love to have an ML-inspired alternative that’s even half as convenient. I watch this space with great interest.


A great choice of companion language to your Python stack would be Nim: https://nim-lang.org/ (not necessarily ML-inspired) but extremely productive and extremely fast. Bonus, v1.0 is right around the corner for Nim. If you're willing to venture out to a place with an evolving scientific ecosystem, Nim is a great choice to do scientific computing as part of a CPython stack (easy integration both ways since Nim compiles to both C/C++).


I'm a fan of functional languages and believe their implementation choices induce a unique approach to problem solving. Not necessarily better but different, which is enough on its own to justify such projects. While I prefer the functional approach, I'm well aware that's a large part just personal preference. What's true however, is every language makes certain things easier and other things difficult as a result of what's prioritized. The locale of functional and typed languages is less visited and worth exploring.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: