Sorry, I meant type inference. In python you could import all the data and deal with types later. Things that aren’t the expected type can be dealt with individually. It’s not pretty but it’s fast and it works.
Have you experienced any problems with static typing in these situations? I appreciate the value of static typing but I’m not sure if it offers substantial benefit when working interactively with data.
“The flexibility of dataframe largely comes from the dynamic typing inherently offered in a language. Due to OCaml’s static type checking, this poses greatest challenges to Owl when I was trying to introduce the similar functionality.”
“To be efficient, Dataframe only takes maximum the first 100 lines in the CSV file for inference.
If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values.
If the types have been decided based on the first 100 lines, any following lines containing the data of inconsistent type will be dropped.”
> In python you could import all the data and deal with types later.
You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.
> deal with types later
What does it mean? Dynamic typed language is still typed, all the expressions have types.
Just like in Python, you can define types in-place with polymorphic variants and objects, so OCaml would infer their types.
let instant_complex = object method re = 3.14 method im = 0.0 end
would infer
instant_complex : < re : float; im : float >
> If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values
Yeah, it's soundness vs completeness. If you want to, you could make your field optional, or made them a subtype of object, which could be None. There is no difference from Python here.
It's just a choice made in favor of soundness and convenience (because using `int option` in case of missing ints would be inconvenient for the most part).
Nobody prohibits Python-like solution in OCaml
class virtual number = ...
let none : number = object .. end
class float = object inherit number ... end
etc. Dynamic typing could be easily replicated within any static language, it's just not why people use static languages, aka soundness.
>You can't, you need to know what you are parsing, a number, a complex number, a symbol etc.
pandas (Python) has the upper hand here. A lot, if not most, real world data will have values of the wrong type interspersed. pandas will still let you read in the table and then deal with these problematic values. For example, reading in the data and then dropping all values that don't conform to the type that is expected could likely be done in 2-3 lines.
But the advantage GP may be speaking of is that you can still do a lot of useful stuff with the data even if you leave the bad values in there.
You could of course parse every column as a string and then cast to an appropriate type at run time interactively. For timestamps and such pandas can look at the data to figure out the exact datetime format. Not sure what one loses in OCaml especially if one is working at the repl.
Have you experienced any problems with static typing in these situations? I appreciate the value of static typing but I’m not sure if it offers substantial benefit when working interactively with data.
“The flexibility of dataframe largely comes from the dynamic typing inherently offered in a language. Due to OCaml’s static type checking, this poses greatest challenges to Owl when I was trying to introduce the similar functionality.”
“To be efficient, Dataframe only takes maximum the first 100 lines in the CSV file for inference. If there are missing values in a column of integer type, it falls back to float value because we can use nan to represent missing values. If the types have been decided based on the first 100 lines, any following lines containing the data of inconsistent type will be dropped.”
http://ocaml.xyz/chapter/dataframe.html