I will always wish that Python weren't the lingua franca of sorts for machine learning.
For me, it's always frustrating to have to guess the types of parameters in functions without appropriate documentation, especially considering that I don't know much of the language.
Combined with the fact that static analyzers for Python can very easily get slow (ex. completions for Tensorflow always take multiple seconds, which IMO is unacceptable), it's just not something that makes the language seem like a reasonable choice to me.
I was working on Tensorflow bindings for Dart, which is Java-level statically-typed now, and in porting functions from Python found myself absolutely lost as to what any of the code actually did.
I might be the only one who dislikes Python to this degree, though.
Nah, I'm with you. Trying to do the same in Haxe. There's not a first class language for typed tensor ops. Type information could include dimensionality, and remove a whole bunch of stupid, time consuming errors.
What about Julia? It’s what we’ve been teaching our Stanford ML class with and it’s fast! Also typed and JIT with multiple dispatch. Arrays here don’t have the Numpy/Scipy weirdness of matrices vs. ndarrays and linalg is truly first-class.
This comes from someone who used to despise the language, but it’s truly come a long way.
I'm still amazed that Julia hasn't taken off in six years. It's this great language that solves most of the problems that other (scientific computing) languages have, and hardly anyone uses it. I use it all the time for personal projects, but I use Python at work. Looking forward to the day I can use Julia for everything.
My primary issue with Julia is that it has a relatively high latency in a REPL environment. I and many people I know primarily use REPL environments (e.g. Jupyter Lab) for scientific computing, so this is a pretty relevant use-case. For example, if I start Julia and type
[1 2 3; 4 5 6; 7 8 9] ^ 2
I have to wait about 5 seconds for a response (on a first generation Core i7, SSD). On the other hand, running the following in a fresh Python interpreter is almost instantaneous:
Unfortunately, in most scenarios the actual execution speed (where Julia is far superior) is secondary. People just tend to run larger experiments over night; and as long as you can express your code in terms of numpy matrix operations, Python is fast enough.
Just tried this on my (much slower) Intel m3-6Y30 (microsoft surface) processor and it worked in just over a second. What Julia version are you running? Speed has been improving steadily with new releases.
> Unfortunately, in most scenarios the actual execution speed (where Julia is far superior) is secondary. People just tend to run larger experiments over night;
I don't think there are only two scenarios, one requiring instant feedback and dependent on fast startup time, and the other where programs can be run overnight. There are infinite cases in-between. And crucially what about programs that take several days if not weeks (such as the biomechanical data analysis I do in my work)? Execution speed for me (and many others) is essential and doing this work in Python is a pain (I was using it for the same kind of problems before I switched to Julia).
> For example, if I start Julia and type
[1 2 3; 4 5 6; 7 8 9] ^ 2'
Put that in a function and run it twice. The second time will be blazing fast since it's jitted. That's the workaround for REPL/Notebook usage. In my experience with Notebooks, I end up having to rerun the code all the time, so it will only be slow the first time around. And I've had my share of Python code that took 5 seconds or more to complete, every single time.
That taking 5 seconds is very strange. I have an early Core M (mobile laptop chip, much slower than Core i7, which is a desktop chip) and that expression takes 0.7 seconds at a fresh prompt. That's still much worse JIT compilation delay than we'd like it to be, but 5 seconds is either a very bad configuration or perhaps a bit of hyperbole? There are other situations like time-to-first-plot where compile times do cause a really serious delay that is a very real problem—and a top priority to fix.
Tried again this morning after rebooting the computer -- turns out I was low on RAM yesterday evening. After starting Julia a few times to make sure it is cached I get the following results:
time julia -e '[1 2 3; 4 5 6; 7 8 9] ^ 2'
real 0m1.629s
And for Python/Numpy
time python -c 'import numpy as np; print(np.linalg.matrix_power([[1, 2, 3], [4, 5, 6], [7, 8, 9]], 2))'
real 0m0.103s
Edit: Julia Version is 0.6.3, installed directly from the Fedora 28 repositories.
Note that the majority of that time is just loading Perl 6.
time perl6 -e 'Nil'
real 0m0.156s
Perhaps someone could create a slang module for Julia in Perl 6, as that would be a fairly easy way to improve its speed.
(Assuming Julia is easy to parse and doesn't have many features that aren't already in Perl 6)
It’s not a great general purpose language the way Python is. Neither is Matlab, so that’s okay if your competition is matlab but not if the competition is Python, C++, etc..
Meaning the Python standard library and common libraries are geared toward general purpose more than Julia.
I'm not sure that the Julia language itself is lacking in any general sense. It's just more geared toward scientific computing, but there's nothing about the language making it that hard to write general code. It's not R.
Right, if you stick strictly to use cases covered by NumPy, Pandas, Matplotlib etc, there are better options than Python. But many real programs need other things too, and that’s why they start in and stick to Python regardless.
Glad you're enjoying it! "Hardly anyone uses it" isn't really accurate: Julia's in the top 50 languages on the Tiobe index (between Rust and VBScript this month) [1] and in the top 30 of the IEEE Spectrum language rankings (IIRC, paywall) [2]. I'd say that's quite the opposite of "not taking off". Anecdotally, there are a lot of people on StackOverflow giving excellent, accurate answers these days and I have no idea who they are, which feels like a significant place for a language to get. Getting all the way to the top will take a bit of time :)
Honestly, just wanted to give a huge thanks for such an awesome language! I’ve truly been converted as a huge Python person into Julia and it’s been slowly taking over my research workflow since it’s just so fast and actually fun!
My apologies for the wording. I should have just said I wish it was used more in industry nowadays. And thanks for creating Julia btw! It's been a very enjoyable and productive language to program in.
After the mess that was writing a numerical library that interfaced with scipy but used numpy arrays, I’m actually slowly porting most of my daily workflow into Julia. It’s just gotten faster and the code is much easier to read.
As Haskell begins to get more and more dependently typed features this could definitely be an exciting possibility (I think there are folks already working on this).
I have been looking to get into C++ with CNTK actually... I think this might be the next big thing. People don’t realise how close to a high-level language C++ is nowadays with C++14 compliance in every major compiler
In fact I am sure that advocates of Go, Rust et al are really comparing them to C++98 and would be very pleasantly surprised by C++14
I love C++ but it is hard to deny the vast workflow improvements that come with languages that utilize a REPL, like Python or Julia or Matlab/Octave. Being able to poke at your code or data and experiment without a compile/run/debug cycle is a huge advantage to productivity.
A few advocates of safer systems programming languages, regardless of which ones, happen to use C++, are up to date with C++17 and follow up on ISO C++'s work.
Thing is, no matter how much better C++ gets, preventing some groups to use it as "C compiled with C++ compiler" is fighting windmills.
Python is my preferred language to work in, and has been for many years.
It's not the only language I've used, though, and I've had basically the inverse of your problem from time to time: people who think running an API-documentation generator over their statically-typed code, so that it just spits out a big list of functions with their return types, parameter names and parameter types, is "documentation".
Now, it may be you come from a world where that's the standard way to do things and you've gotten used to it and mostly don't notice just how much further research and digging it takes to figure out from "Frobnitzable bunglify(Widget[] w, int c)" what it is exactly that it, um, does. But it's hard to knock people for not writing documentation when advocating for a language feature that encourages people not to write documentation!
My experience is that it's easy to fail to produce useful documentation in just about any programming language, and that languages' choice of type systems does not correlate significantly with the quality of documentation of key libraries and tools.
I think there's a fair bit to be said for API discoverability. I was working with the TypeScript compiler API earlier this year, which is an excellent piece of software but not particularly well documented, and I was able to discover a fair bit of what I needed through Intellisense and well-named functions.
Python is also fairly good in this regard, though not quite as convenient, with the help() function (it's been a long time since I saw one of these but I do recall a handful of libraries breaking help with "clever" metaprogramming).
Ceteris paribus, I'd rather have the nice completions.
My point is mostly that real, good, documentation is a lot of effort no matter the language or its type system, and that once you've worked in a language long enough you internalize whatever processes you use to compensate for common shortcomings in that language's documentation. Which then means it's often not a fair comparison to be thrown into a new language, requiring new processes to compensate, that you aren't used to and so notice more.
I used to point out the same thing way back in the day when people argued that tables were "easier" for web page layout than CSS, because of all the hacks involved in doing CSS layout 15-ish years ago. It wasn't actually that tables didn't require hacks, it was that people had been doing tables long enough they'd forgotten how much of it consisted of hacks, and so the new and unfamiliar stuff they had to learn to make CSS work only seemed more complicated.
Well-designed APIs can be self-documenting to a significant extent with the right choice of names (and things they name, as well). But it takes as much if not more effort to do this right, as it does to write quality docs.
There's always Go - Gorgonia[0] aims to be a Tensorflow/PyTorch replacement. Performance is somewhat comparable (somewhat because it's always in flux as I keep updating it). Heck, it even uses hindley milner type inferences (not that it's particularly useful)[1]
I totally agree, it's kind of depressing. With deep learning we had an opportunity to pick a new language because ppl would have used whatever language was necessary to get what tf (& theano) provide. If Google had written tf in Julia, Swift, Nim, etc. our whole world could be different.
I also strongly dislike python, but I've come to accept that it has its niche and that my main issue (beyond the parts I believe are design flaws) is that its correct niche is far smaller than the domain is actually used.
It's the bash/perl of the 90s, and not the language you should be writing your deliverable in. Unfortunately, python happens to work well for getting something functioning, and turning that into something that functions well gets called "maintenance". I've given up that fight and now I'll just stay in python land for as long as it takes to isolate it into a separate process and then go and write the rest of my code in something I prefer.
And so you know f(a b c) accepts floats, but what good is that if you don't actually know what a b c are. Which is the reason I hate the term type safe. It's type checked but the behavior may or may not be safe.
The most appropriate languages for ML should really be functional first, statically typed languages like OCaml or F# or maybe Scala.
The problem might be that many people doing ML don't really know how to program beyond a basic level, and at first glance a language like Python seems "easier". But they get by anyway since the difficult part of ML is often not really the programming as such, and if your program is just a 100 line script calling Tensorflow then any language can work, however annoying it may be.
Thinks like TensorFlow for Swift are probably partially motivated by the same problem you mention.
Swift overall seems like a good possible static language for ML, but gaining popularity is a slow thing. Actually I really can't seems to see much other alternative languages than it that could gain popularity except maybe Julia.
> I will always wish that Python weren't the lingua franca of sorts for machine learning.
Absolutely ... I feel like the ML world is caught in a strong local minima with Python: it gets people on board due to the exploratory ease, but then the effort of learning a different system once they already understand things in that ecosystem is just too great. Unfortunately it is highly suboptimal for writing highly structured, complex code.
You should have a look at DL4J and ND4J if these things frustrate you. Particularly with a dynamic JVM language like Groovy they give you something close to the best of both worlds.
Jedi in Vim, PyCharm/IntelliJ, and whatever tool VSCode uses (which I think is Jedi).
Same experience everywhere. Once I type "tf" - a hang for multiple seconds, presumably while it loads the entire Tensorflow library. The completion then works smoothly, until I hit space or otherwise exit the completion.
Once I type "tf" again, the cycle repeats, and I have to wait multiple seconds.
You'd think the tool might cache libraries instead of reloading them every time. Seems like it doesn't at all.
You need YouCompleteMe. It uses Jedi internally, but the completion is async i.e. the ui is not blocked. Also it caches completions, you shall get completion candidates reasonably fast the second time you input tf.
IntelliJ completion should index and cache and it's always async. I just tried it with the current one (and current tf) - the response is not instant (tf has a ridiculous number of toplevel definitions) but it's much faster than 2 seconds.
I'm a fan of this multi-language trend of gradual and optional typing. Static typing has definite benefits, but there are also diminishing returns on productivity as a language becomes more and more statically safe. Often times, type safety is a rabbit hole that can easily turn a language into what some call a "puzzle language," where the efforts to please the compiler become non-intuitive and take long times to figure out. Some of the classes in Swift for example have to adopt a hundred protocols just to compile. Rust is an even more extreme example. You might get extra safety, but at significant development cost.
Gradual typing in Python is a happy medium that lets the developer decide (to some degree) on the workflow up front. It's a nice compromise for a wide range of work.
> Rust is an even more extreme example. You might get extra safety, but at significant development cost.
This is not the indisputable fact that you seem to be portraying this as. I've written similar amounts of Rust and Python, and it is much much easier and faster for me to write robust software in Rust than in Python. Python might let me get to a prototype more quickly in some cases, but the amount of time spent on maintenance after the fact puts Rust very clearly in the lead in my experience.
There's no doubt that Rust's learning curve is much much steeper than Python's, but this is a very different thing that saying something general about overall development cost.
Do you not find that lack of a REPL-oriented workflow inhibits productivity? I sure do. I love C++ and don't use Rust beyond hobby interest, but being able to do things interactively is definitely an advantage in languages like Python.
The nice thing about gradual typing is if you get to a proof-of-concept, you can then layer in types for static analyzers to make it more robust. Sure it's not Rust-level safety, but it works for very demanding projects. Heck, all of Instagram is built on Python, and they use the static type analysis in their workflow just like this.
It's just a compromise everyone needs to find on their own or for a specific project.
You're kind of missing my point. I'm specifically not talking about gradual typing. I'm calling your claims about development cost into question.
I'm personally not a huge fan of gradual typing. I've tried it, and it didn't work well for both technical and social reasons. But I'm not here to argue that point.
>Rust is an even more extreme example. You might get extra safety, but at significant development cost.
I don't buy this argument. If you want your program to do what you intend you have to know the types anyway. I've never spent significant time on finding the right type, even in Rust.
I have spent significant time on incorrect code, because of functions that accepted inputs with the wrong type and didn't complain, because the wrong type had the same method.
I have spent significant time on looking up what a function can do, because the IDE can't figure out what my parameter is.
I have spent significant time on programs that ran into runtime problems every once in a blue moon that would have been easily detected by a good type system at compile time.
Articles like this just show me that every language that needs to have maintainability for large projects will turn into strongly, statically typed at some point. Whether it's type hints or TypeScript.
But static typing != static typing. There are many different degrees of type safety, some much more strict than others. It is often about more than just "finding the right type," as you say. How much does the compiler require features of a type to be implemented, even if they are not used, in order for it to be a certain type, or just use one featureset of a protocol/trait that it adopts? You can easily write types that eventually require a large amount of boilerplate in some languages to just do something fairly basic that is not as complex as what the compiler asks you to do.
The problem with Python is the default is untyped. I want gradual static typing for the code I write but that should be on top of a base of already well typed libraries. It's one reason I like Groovy - it's a language that lets you optionally and incrementally type
your code, but it sits within an ecosystem of mostly statically typed languages so you aren't fighting a massive legacy of culture and code that just doesn't care about declaring types.
Although Apache Groovy sits within an ecosystem of statically typed languages, Groovy itself has a long history of dynamic typing only. Its static typing was only introduced in version 2, and even then people don't use it very much -- Groovy's primary use is as a DSL specifying Gradle build scripts. Legacy Groovy code is typically littered with `def` and `{a, b-> ...}` everywhere.
Absolutely, most groovy code is written in the dynamic style. But it's still way better off than Python where that is true AND its true for all the libraries it uses. With Python you can add as many type hints as you want, you are still going to get stuck at the first call that reaches outside your own code where nobody cared about types. With Groovy that call is going into Java or typed GDK code 95% of the time (unless you are talking about Grails which I'm not a big fan of for that reason), and hence you have the benefit of 95% of code already being typed by default.
I think you're vastly underestimating what makes a script language "scripty" - not having to declare types is only one factor and probably not the most important.
Type annotations more completely describe a program. This can be useful if your code-base's requirements are subject to rapid change (e.g. websites). But when your program is something which has been well thought-out from the start, and thus whose requirements don't change during the course of development, I trust a program which has been statically-typed to not develop bugs over time. case in point: /usr/bin/git
One key takeaway I had is that python is closing in on its own “Perl 6” moment. It seems to me (and this is just my opinion), that I’m adding type hints, the complexity has gone up considerably. (Don’t this about the simple syntax, rather think about all the gotchas and edge cases that are going to cause productivity issues for developers in larger codebases.
I really wonder (and have for a while - only now even more so), if Python has gone about it the wrong way.
I don’t develop often anymore because it’s no longer my job, but really do love it when I get to spend time on it. And python, for many reasons, is my go to language (sorry for the pun).
But now I’m starting to look at Swift and Rust, figuring if I’m going to go thru these headaches, I might as well get a more modern language that runs fast as blazes as a nice side benefit.
I’d love some guidance from anyone who’s further down the road with this issue.
Type hints are almost always a crutch or fix for bad design/unclear data flow/`None` abuse.
The idealist in me wishes type hints had never been introduced and we would just write obvious, beautiful code. The pragmatist/realist is happy because type hints are a gradual improvement. The pessimist is saying that the bad devs who introduce awful data flow and abuse None also won't bother using type hints/getting them right, so why bother.
Type hints are almost always a crutch or fix for bad design/unclear data flow/`None` abuse.
If you've ever worked on a Python code base that you didn't design and write from the ground up, you would know that these are inevitable, and in many cases encouraged by the language. Even if you did design and write it from the ground up, and even if you meticulously document every class and helper method, the reduction in cognitive load is tremendous.
And if you are writing a library or API that other people will use, it's an extra layer of documentation convenience. Why force people to peruse your 200 word docstring when they could just see the signature pop up in their IDE?
edit -
Consider this example:
Country = NamedTuple('Country', [
('iso2', str),
('iso3', str),
('full_name', str),
('short_name', str),
('other_names', Sequence[str])
]
class Person():
__slots__ = ('age', 'name', 'birth_country')
age: int
name: str
birth_country: Country
def force_standardize_name(self) -> Tuple[str, str]:
""" Attempt to coerce a name into standardized First, Last form """
...
Maybe my design is bad, but just having all that information written right in the source code makes it so much easier for me to work. Not to mention, I can actually run the Mypy static checker on this and verify that my code does what I think it does.
> If you've ever worked on a Python code base that you didn't design and write from the ground up, you would know that these are inevitable, and in many cases encouraged by the language.
I have. Maybe it is "inevitable", but saying Python encourages this is a stretch. Those things happen in other languages, it's just called `null` abuse instead.
> Not to mention, I can actually run the Mypy static checker on this and verify that my code does what I think it does.
Until it doesn't, e.g. at runtime. Type hints are a nice tool, but are not a panacea. They are not the same as variable types. This is probably the most dangerous thing about them. If you want that, Python is not the best choice, type hints don't change that.
I think Swift might be closer to what you want than Rust.
I like both, but Rust and Swift are really for two very different groups and solve different problems. Rust is for the C/C++/assembly programmers who would never even think of using Java or Python. Swift is a modern Java.
If you're coming from Python, I think Swift is closer to what you want since Rust will probably seem more complex for reasons that don't have benefit for the Python-style use case.
The only thing in which Swift relates to Java is syntax. Apart from that, they couldn't be more different.
And if you start digging deeper, you get to see that Swift is a very complex language, too. It's easy to get started, probably easier than Rust, but it has its own kind of quirks.
Swift does have a lot of syntax, but it's quite well designed and that contributes to the reference guide being organised in a useful manor. So that complexity is more manageable than C++ or Rust.
Once they can remove lots of the ObjC compatibility parts, it should get better too.
On one hand it's nice that there's a standard way to specify types in Python now, but TBH, if I'm going through the trouble of specifying types then I'm going to use a language that has them built in and fully exploits them. In particular, not using them for performance tuning seems like a big miss.
Agree. At that point it's just better to go back to Java or <insert your favorite statically typed language here>. Java has a broad solution to type safety and a standard doc format for code. Sure there are awkward corner cases around things like type erasure but I find Java for the most part just works.
Disagree, performance is a side-effect of type safety... it's main use is to prevent bugs and help devs reading code. Also we use Python for the library ecosystem, easy to read syntax etc. which Java will probably never match. For performance, use Go or Rust and get everything Java has plus an active modern community.
Performance and type safety are absolutely linked. The only major reason JS is still 5-10x slower than Java is looseness of types preventing optimizations.
The Java community is still vibrant and much larger than Go and Rust put together. Modern stuff like RxJava, Lombok, MapStruct, streams, Guava, Gson, Retrofit, DropWizard, Vert.x, etc... Make Java a breeze these days. The learning curve is just steep.
And the Java ecosystem is easily larger than Python. Python may be nearly as popular now, but Java has been in the top three for nearly two decades.
A lot of shops use "vanilla Java" out of ignorance without modern libraries and it's crufty as hell. It's similar to shops still using JQuery hacking vs those that have awakened and use React and Typescript, practically a different language
"modern." I hear that word a lot. What are the concrete benefits of those languages beyond subjective properties like readability? COBOL is readable if you work with it regularly.
Just as an example the Golang scheme for managing package dependencies still leaves much to be desired compared to the maven ecosystem commonly used in Java projects.
> Just as an example the Golang scheme for managing package dependencies still leaves much to be desired compared to the maven ecosystem commonly used in Java projects.
"One of the main selling points for Python is that it is dynamically-typed."
This is actually quite a surprise to me, I am using python despite this as I work as an analyst and many libraries are written for python. I've never seen any advantages of dynamic typing, in fact I tend to avoid languages with it if I can.
Are there any valid reasons to have it in a language?
Most typesystems are limited in what they can express, and as they get more complicated in what they can express they tend to get more complicated to understand.
A typesystem is a limitation you place on your code which you accept in the hope that the things which it proves true (problems it avoids) are worth the limitations that you've accepted. You can set everything to Object but the culture that develops around statically typed code tends to mean that people are really afraid to ever do this or be ostracised.
From a data moving point of view there's a couple of examples I hit in production where a typesystem isn't helping me =>
* DataFrames tend to gain/lose columns on every operation, so to represent them in types I'd need a zillion types with code duplication or an interface for each field name. Some operations aren't even possible to type like pivot_table. So tons of my code is really not helped by typing
* Everything turns into a DAG of data flows, you can do this with typing but the overhead would mean that noone woudl come up with something like mdf (I don't use mdf in prod but it's impressive for specific applications)
If you want a better defense of dynamism google Rich Hickey, a point he made:
There's many choices to be made for typesystems, you can have generics/not, higher kinded types, dependent typing. Maybe it's better not to enforce these choices right at the root language level and allow them to happen at a higher level (be plugged in).
> DataFrames tend to gain/lose columns on every operation, so to represent them in types I'd need a zillion types with code duplication or an interface for each field name. Some operations aren't even possible to type like pivot_table.
For pivoting? I don't know about dependent types, sometimes it's possible in theory to know the type of the pivot columns because you know the possible combinations of column fields that you're pivoting from, but is that dependent-typable?
I think there's a kind of Motte and Bailey that goes on with people who think typesystems are the one true god. They choose the simplest typesystem as their example of the cognitive/learning overhead, (say C#?)
But if you question the limitations they go to "but it's possible in X obscure typesystem".
But that's exactly to the OP's point: You need to think about this. You need to figure it out beforehand. In Python, you can express this without that mental overhead--particularly while prototyping.
I think the debate is largely going in favor of static type checking. The widespread adoption of type inference in many statically typed languages (var in Java, auto in C++, let in Rust) has greatly eased the ergonomics of using a statically typed language.
Unless there is a massive ecosystem that you need to take advantage of (for example javascript), in 2018 greenfield development should probably be in a statically typed language.
My dream back in 2008 or so was a language with smart enough type inference and probably some kind of gradual typing that allowed you to bridge the gap from static and dynamic typing -- dynamic for the prototyping, REPL, quick scripts and very high level code, but as you go down the layers into lower-level, more performance-sensitive code you could add more and more typing and machine-specific hacks.
Type inference might got one halfway there, but it would work best at the call site -- the type of variables could be inferred from return values and from literals and so on. But you'd still want to define a function as basically returning "Any". This means a caller wouldn't have a type for the return value, and so if you want to do "fileName := funcReturningAny(); openFile(fileName)" and openFile() was defined only for a string argument, the compiler would have to generate a runtime check at the call site to assert the type fileName as being a strong. This the call site is penalized for being typeless, but the statically typed code wouldn't be.
I suspect this would work pretty well. You can start out with no types, and you'd be able to add types as you go, slowly tightening the screws on your program. Generics, traits and other modern abstractions would still be available, though some of those features would start to become unavailable or awkward to use with any code not sufficiently tightened with types.
I don't know of any language that does anything like this, although I suppose TypeScript is pretty close.
Sounds like what Perl 6 does :) You can add optional type signatures, and you get compile-time checks disallowing impossible usages of these:
> sub takes-string(Str $foo) { }
sub takes-string (Str $foo) { #`(Sub|94327719210888) ... }
> takes-string(5); say "Hi!"
===SORRY!=== Error while compiling:
Calling takes-string(Int) will never work with declared signature (Str $foo)
------> <BOL>⏏takes-string(5); say "Hi!"
Basically have already lived this dream with Haxe. However the cross-platform tooling side of Haxe has historically been the fly in the ointment since it sits above the "real runtime" of whatever you targeted, creating all sorts of bits of friction. And it doesn't cross over into "systems" stuff - it's always within a GC'd environment.
Pretty much nobody does this, but you certainly could. I'm not sure that all that DLR work that was put in gets a ton of real usage, particularly with how C# has grown language features since then.
Agreed. The majority are beginning to realize that strong type systems are actually helpful instead of a mandatory annoyance.
Most programming I've done in the past has been with python (no type hinting) or languages with weak static type systems (java et al). Having to verbosely specify the types in java was always just painful, while the upfront speed and freedom of python felt much nicer. However after getting a taste of better type systems by dabbling in rust and elm, now every time I write something in python I'm thinking "was argument a string or a foo class?" or "that runtime error would not have happened if the type checker caught it".
Python type hints now are a huge plus for the language. It still seems a bit "hacky" though. I'd love to see something that adds more syntax sugar and feels more built in. Maybe it's just me.
I would be like to be able to specify types at integration points (module level, function level, etc).
Function arguments do benefit from clear inputs and outputs (is this type X? a list of X? A list of anything? Is it returning Y? List of Y? Nothing at all? An error or nothing? Something that is just like X but with an extra field?)
Inside the function? Leave me alone. You are a computer, go figure out. Which seems to be the direction we are heading towards.
> was argument a string or a foo class
Oh, it was supposed to be a string. However, it is a string that came from the UI, so it is untrusted, so your SQL function should refuse to use it. But hey, feel free to lowercase it.
The main problem is that usually type systems are not very rich, so the baby gets thrown out with the bathwater.
Also, runtime errors are ok if they happen in your test cases. You do have them, right? :)
> Also, runtime errors are ok if they happen in your test cases.
Yes, but if some errors can be caught sooner by your linter/IDE/typechecker then it speeds up development. Would you rather be notified of a type mismatch immediately after saving the file or later when you run the test suite?
Yeah, my only experience with static typing early on was C and Java, and it turned me off the whole thing. I could understand the advantage in C for the whole close-to-the-machine thing, but Java's verbosity and poor expressiveness while allowing NPEs everywhere felt really silly.
The lesson is rather that optional static typing is useful. The programmer should be able to choose based on what makes sense for the task. Strictly statically typed languages preclude the option of going dynamic.
>Strictly statically typed languages preclude the option of going dynamic.
No they don't; even in C it's possible to just type everything as void*, and everything as Object in Java. Just nobody does this 'cos it introduces a bunch of unnecessary runtime errors.
To call a method on a void* or a Java Object you first have to explicitly cast it to a type that has that method, which means that you need to declare a common interface and have each object explicitly implement that interface. Duck typing in dynamic languages doesn't require that level of ceremony.
This is the lesson I've learned in Clojure. We have varying levels of data shape specification, contracts, and static type checking available a-la-carte when you want it but not getting in your way when you don't.
I agree, and yet dynamic languages such as Python continue to gain popularity because they are supposedly easier for newbies to pick up. I think we (experienced software developers) have to do a better job of educating newcomers about the dangers of dynamic typing.
Why do you think static languages are easier to pick up? I have had the opposite experience. In managing people who do not aspire to be software developers, but do need to write simple programs to perform basic job functions, I’ve seen much more openness to Python than to more structured alternatives. If you have suggestions that set the bar for working code lower than we can get with Python then I would love to hear them.
I think Python is perceived as easy to pick up because it's a scripting language, not because it's dynamically typed. I wish I could recommend a statically-typed scripting language that's easy to pick up, but I'm not familiar with any mainstream ones (and I'm probably too far from the newbie stage myself to be a good judge).
It's not even the dynamic typing that makes Python easy to pick up. It's readable core language, and a well-documented and reasonably thorough standard library that doesn't ignore things that are useful specifically for teaching people (like turtle graphics - I've yet to see a better way to teach loops, recursion etc than getting someone to draw stars and spirals with a turtle).
The more TypeScript I write, the less I like Python. In a recent project that involved pulling some data from Hive via Pandas dataframes and merging it with data from a JSON REST API, then doing some calculations and stuffing that data into a different JSON REST API (with a totally different shape to the data), I kept running into little edge cases that Python forces you to handle explicitly vs Javascript just doing what you want. It also annoys me to an irrational degree that Python has "dicts" and not objects, "lists" and not arrays, KeyError everywhere, error looking up dict keys as ints (you can only look them up by str with no implicit type conversion), can't say "dict.keyname", have to do dict['keyname'] or usually dict.get('keyname') and handle things like dicts within dicts where you're not sure if keys exist even more verbosely, etc
example of something annoying that came up:
foo = [1,2,3]
bar = {'1': 1, '2': 2, '3': 3}
for i in foo:
print(bar.get(i))
What does it output? None x3. In order to get it to do the thing, I did something like this:
foo = [1,2,3]
bar = {'1': 1, '2': 2, '3': 3}
for i in foo:
str_i = str(i) # even more fun when this throws
print(bar.get(str_i))
I use python because most of my coworkers understand it and it has wide library + runtime support where I work, but it's definitely not my favorite thing to work with. Dynamic typing where types still matter and where the language is picky about implicit type conversions kind of sucks.
I also couldn't do:
foo = [1,2,3]
bar = {'1': 1, '2': 2, '3': 3}
for str(i) in foo:
print(bar.get(i))
Which was also eyeroll. Like, I get it, but...c'mon. Working with Pandas is also more or less learning another language, with the insanity that is df.T.whatever and df.loc[]:
df.loc[df['shield'] > 6, ['max_speed']]
I'm sure that someone will comment about how I'm an idiot and there's a much better way to do it, but given PEP20:
There should be one-- and preferably only one --obvious way to
do it. Although that way may not be obvious at first unless
you're Dutch.
So, in a discussion where most complaints about Python are that it is dynamically typed (vs statically typed eg Java, Go, Rust etc) you're complaining that it is strongly typed (eg vs weakly typed eg Javascript)?
I don't have a problem with that - it just seemed kinda incongruous in that context. I imagine the other critics would be even more unhappy about implicit type conversions.
I'm not familiar with Pandas but I'm guessing that its API is similar in style to numpy just by looking at the code snippet. I can't rule out that you might be an idiot, but it is not because you find that line of code confusing. The numpy API is horrific. Its abuse of operator overloading and undermining of expectations is shockingly awful. They should have just made a separate query language.
I could write an "obvious" fix for your example, but I expect that it'd be unsatisfactory, since the real problem likely has a different solution.
It looks like you might want a little utility to transform key lookups to a standard. Web frameworks do stuff like that for mapping URLs to functions, to ignore case, etc.
You might be interested in learning about the ``dict.__missing__`` magic method.
No, the problem you had was not because of Python, but because of JavaScript, JavaScript's objects only let you put strings inside of object keys, and JSON inherited that, it would have been more sensible for bar to come from the API like this:
> Unless there is a massive ecosystem that you need to take advantage of (for example javascript), in 2018 greenfield development should probably be in a statically typed language.
I find this sort of generalizations off-putting. It is all about trade offs.
Types can be very useful. There are whole categories of errors that are avoided by types.
And yet, in many cases they are a productivity drain. Time is often better spend by writing testcases instead of chasing down type issues(more so if a language has a 'generics' or C++ templates concept), so that approach got favored in many fields. These will catch type errors, and static languages also need tests, so the gains are not so easily quantifiable.
It is not just a matter of ergonomics, it is a matter of getting out of the way of the problem domain. Most people are paid to solve actual issues, not to craft convoluted C++ templates, custom types or deep type chains.
Type inference helps, but even Haskellers will often avoid it when defining functions, so that they will be explicit. But Haskell has incredibly powerful constructs. I was amazed that you can actually tell the compiler you are writing a recursive data structure[1], which is something that most languages won't allow you to do – the most they will allow you is to define a reference to the same type of object(or 'struct'), and you have to stitch that together in runtime. Simple cases are also simple. A function that gets a list and returns an element of that list? [a] -> a, I don't care what a is, nor does the compiler. But feed it something that is not a list, and it will complain.
A language should let you easily define new types, and allow these to be used without friction by standard functions. For instance: if I have an integer, is this integer describing a temperature value? If so, functions that work with speed should refuse to operate with this data. Standard mathematical functions _should not care_ and should do whatever operations we request(without any silly type casting, I told the compiler this thing is a number, use it). The compiler optimizer should also use this information to generate appropriate assembly instructions.
So yes, if you have an expressive type system, types can be a joy (you'll find yourself thanking the compiler often). If instead your type is spent trying to figure out why IEnumberable<Something> cannot be converted to IEnumerable<SomethingElse>, even though they are subclasses and even though the same operations work with arrays, then types are getting into the way of solving real problems[2][3].
And even with an expressive type system, there is a non trivial cognitive cost. Sometimes you just want to write (map (lambda (x) (do-something x)) lst) and not really care what it is that you are manipulating. On the other hand, I personally do prefer using Golang to talk to AWS apis (instead of python or javascript), precisely because of their deeply nested type hierarchies.
Being able to sprinkle type annotations where it is useful and have the compiler figure out as much as possible otherwise is a good compromise, which is the approach that the Python article talks about.
I would read this as not favoring mandatory static type checking in 2018 but, instead, using a hybrid mechanism that benefits from both approaches.
>Time is often better spend by writing testcases instead of chasing down type issues(more so if a language has a 'generics' or C++ templates concept), so that approach got favored in many fields. These will catch type errors, and static languages also need tests, so the gains are not so easily quantifiable.
Assuming an equivalent program written in a statically-checked language and a dynamically-checked language, if your program has type errors at compile time, it will also have type errors at runtime. The difference is when you know about them.
Static type checking is (informally) about statically proving that the domain of your function is actually what you think it is. They catch "obvious" (for a definition of obvious that is defined by the power of the type system) domain errors. They don't replace tests, but they do constrain the domain you need to test and let you know when your tests are sane.
> And yet, in many cases they are a productivity drain. Time is often better spend by writing testcases instead of chasing down type issues(more so if a language has a 'generics' or C++ templates concept), so that approach got favored in many fields.
From my experience I've seen that type hinting and checking provides a better ROI than automated testing. It is not either/or, you will most likely need some automated testing, even if it is just unit tests. But at the very least use types. It provides broader benefits than just catching stupid bugs and typos. At makes your code base much easier to understand and code paths much easier to follow.
With static typing, you usually have this big suite of unit tests that check a whole bunch of basic correctness out of the box, called the compiler... /s
If you actually use the type system, and build less anemic types, you can eliminate huge swathes of potential errors.
I use Numpy NDArray and pandas DataFrames a lot, and often times you want to validate the shape / columns / dtype of theses objects. I have yet to find a clean and meaningful way to type hint these, and ended up just put assertions at the beginning of code.
I don't get what's so bad about assertions. It makes it much easier to demand that a given object has a given method than forcing the object to be of a certain class.
If anything adding a list of assertions at the top of a method/function/class is much more readable than the current type hint implementation.
And if you're running out of resources move the assertions to your regression tests.
As someone whose used python since the 2.1 days in high school, why not create a language with the type system you want instead of killing python? There is a reason why python is this good when it specifically chose to not implement the features that are now being added to it.
The point of adding type hinting to Python is to create a smoother path from prototyping to production level code. Contrast with prototyping in Python then rewriting in C++. Soon typed Python code will be automatically transformable to native code. Being able to incrementally evolve a codebase from a prototype to production level quality and performance without wholesale rewrites will revolutionize development practices.
> Soon typed Python code will be automatically transformable to native code.
Do you have a specific project in mind? I know of a few, like Cython, RPython, Nuitka etc. that already do this, but use their own type annotations that are partially incompatible with the Mypy type system.
One hurdle to native code generation is that most large Python projects will contain a bit of metaprogramming that the type checker can't handle. That isn't a problem if you can check it by hand and use assertions to signal the type checker that you know what you're doing, but that code will also be impossible to compile. There are some ways to get around that (e.g. RPython allows arbitrary Python during initialization, which is executed at compile time), but they might require a complete rearchitecturing of the program.
As an author of enforce.py, I just want to say that working on and with type hints exhausted me. The inner implementation is so full of corner cases. And the indifferent at best attitude of python subreddit wasn't very helpful either (imho). I still don't know if anyone besides project's awesome contributors ever used it.
To be honest, the biggest issue of python type hinting (from my point of view) is that it is an abstraction of a totally different typing system underneath. Type hints do not represent actual types at all. They are just for convenience sake.
What started as a proof-of-concept for me, soon became a huge monster with special handlers for almost every single case. It is so messy now, that hardly anyone besides me can do any maintenance. I have been considering a rewrite (using ast modifications to avoid huge number of potential corner cases and performance issues) for more than a year now, but such endeavour is too big for me to undertake. It would be a fresh start after all.
I don’t like the added burden of importing types in order to use them. So far, I’ve relied on pycharm’s excellent static analysis and docstrings to add type hints. One huge pain point that about C++ are header files and maintaining two source files for related code. Python stub files really reduce flexibility when refactoring or building new code. If you are adding types and maintaining two source files, the rapid development advantage of using python is nearly lost, and the type info is hidden while reviewing code, unless you have both files open. Not great for code review when the editor isn’t able to report type conflicts.
I started using python type hints when you needed to apply patches. This was years ago. At work we had a database that changed it's schema randomly every night. With type hinting in our build process we were able to determine whether we could handle the new schema. I have also been developing python for about eight years now.
I can't really stand the language anymore. It feels to me that unless you have a large python code base, or a large python employee base. It's not just worth the hassle. It was game changing when it first came out. But now,outside of data science, machine learning, and devops I'm not seeing it.
With projects like mypy/type hinting, and pypi. You're one step away from using a static language, that is either native or running on a VM (CLR, JVM, BEAM, etc.)
Additionally almost all teams I've worked on, have encountered. Either maintainability issues. Where a static type checker would aid. I.E. changing a class or function definition, and not seeing it updated across the board. But more common is performance. Specifically the more asynchronous/parallel work stream. We have asyncio, and others. But it's more complex than some of the static languages.
Since type hinting first came on the scene it has been a requisite in any project I've worked on. I've lead a number of teams, who want to move from python. To the tune of we complete each sprint developing the project in python and $LanguageOfChoice. Begging the business to move on, but no dice. Yes I speak in business language why we should move on. But the business won't let us.
Type hinting in Python is so ugly that I stripped it all out after a few days of using it on some personal projects. They more than double the length of function signatures and create comment clutter (is this a real comment or some MyPy flag?).
I wish there was a way to fully separate the hints into separate files. Such files could act like header files in C. Or, some comment character other than # to start type hint (how about "!"?), so IDEs can hide them if the user wants.
For me, it's always frustrating to have to guess the types of parameters in functions without appropriate documentation, especially considering that I don't know much of the language.
Combined with the fact that static analyzers for Python can very easily get slow (ex. completions for Tensorflow always take multiple seconds, which IMO is unacceptable), it's just not something that makes the language seem like a reasonable choice to me.
I was working on Tensorflow bindings for Dart, which is Java-level statically-typed now, and in porting functions from Python found myself absolutely lost as to what any of the code actually did.
I might be the only one who dislikes Python to this degree, though.