Hacker News new | past | comments | ask | show | jobs | submit login
I Accidentally Some Machine Learning – My Story of a Month of Learning Elixir (fredwu.me)
102 points by fredwu on July 23, 2016 | hide | past | favorite | 44 comments



> About a month ago I was in-between jobs - I had two weeks to rest up, recharge and get ready for my new job. So I thought, I should use those two weeks to learn something new.

After this, people are still wondering why developers earn $100,000+ salaries.


I don't think that follows?


I don't get it.


Developers get paid market price for their labor, right now employers pay them above median salary, some commenter on HN wants to point out a correlation between this particular developer's choice to spend their time to learn a new skill and their current market valuation suggesting that the latter follows from the former.


Agreed, welders for example spend a lot of their spare time working on side projects, learning new ways of welding, etc. Carpenters spend a lot of time on family and friends home projects. There is no correlation between a programmers willingness to learn and the median salary for the industry


There is. Programming is a non-linear field, while welding is a linear one, unless you become a successful welding-artist.


If you ask welders, there is definitely an art to it. Check out /r/welding sometime.


I think that most people would have used interim time to, oh I don't know, travel? finish a game? not do more "work"


I think they mean that as developers, we should be continually learning?


Why did Python get a bunch of mschine learning frameworks but ruby has little or nothing ? Was it because scientists just started to use Python more ?


It was a combination of:

1) The language was created by a researcher (Guido Van Rossum) at a research facility in the Netherlands [1]

2) Network effect - as more machine learning researchers and engineers in academia used it, the need for a language that could integrate with FORTRAN with similar performance came about, which lead to NumPy, SciPy, etc.

3) The language itself is pretty simple to grasp. There's multiple ways in Ruby itself to do something, while in Python there's at most one way to do something. Anecdotal - a lot of researchers I've worked with care more about the usability and ease of adoption of a language, compared to the expressivity.

[1] https://en.wikipedia.org/wiki/Python_(programming_language)#...


Not disagreeing with the relative simplicity of Python compared to Ruby, but the degree to which the Python community is adamant that there is always only one way to do something has always baffled me. What about `map` versus a list comprehension, or `glob` vs `os.path`? Obviously Python has pretty clear standards about what the correct way to do something in the language is, but I feel like the persistent claim that there's always exactly one way to do something can range from oversimplified, misleading, or just plain wrong.


Extreme pro-Python statements about "one way to do it" are usually made by relative newcomers who are caught up in their enthusiasm for purity. (This exuberance is natural, and is a big part of the language adoption cycle.) Extreme anti-Python statements about "one way to do it" are usually made by outsiders viewing Python through accounts given by enthusiastic newcomers. This exuberant-newcomers-vs-skeptical-outsiders dichotomy isn't representative of the actual Python culture, which is much more sensible and practical than all of this implies. If you go to PyCon and attend talks by long-time Python folks, you won't find them advocating anything like blind adherence to "one way to do it".

To get specific, the wording in PEP 20 is: "there should be one, and preferably only one, obvious way to do it". Note that the dominant idea here is that there should exist some obvious way. It's merely preferable that there be only one way. Again: this is sensible, but it's not good fodder for newcomers craving purity, or for detractors craving straw men.

PEP 20 itself is often elevated to religious levels of importance, as if it were a design document that served as a blueprint for the design of Python. It's absolutely not that; it was written by Tim Peters (not Guido!) fully eight years after Python's first release. Here's the original email that Tim sent to comp.lang.python on 4 June, 1999: https://groups.google.com/d/msg/comp.lang.python/B_VxeTBClM0.... Note how he finishes: "If the answer to any Python design issue isn't obvious after reading those -- well, I just give up <wink>." Tim is very particular about his winks, and that's not even a <0.5 wink> or a <0.9 wink>, but a full, unqualified <wink>.

To put all of those pieces together: (1) we have a short, tongue-in-cheek document written after-the-fact. (2) It's misinterpreted as a design document for an entire language and ecosystem. (3) A secondary portion of one of its points is misinterpreted to be the entirety of the point. (4) This is done by exuberant newcomers and skeptical detractors who don't accurately represent the Python culture as a whole. This is why you get the impression that the community is adamant about "one way to do it". It's a reasonable extrapolation from the visible evidence, but it's not true.


Thanks, that's a very clear and well thought out explanation. I'll make sure not to take the skeptical detracting too far in the future!


Besides, it only says that preferably there should be only one obvious way to do it.


pure speculation, because i don't know the interior history of python:

1. perl has always prided itself that "there's more than one way to do it" including repeated, prominent use of an acronym for that. see timtowtdi.

2. there is a theoretical idea kicking around out there that, if a language allowed for only one way to express any idea, then that language would be a better language (all else being equal) because if you and I always write the same code for the same task, we could easily maintain each other's work. see for example "egoless programming" or the ideas that led to Simonyi's "metaprogramming" (whether or not you agree)

3. perl is a mess

therefore, it might, given the history, make sense to say that about python as a way of trying to point out a salient difference from perl


I have always assumed that that statement was supposed to directly contrast with TIMTOWTDI.


I've never seen that sort of response from the Python community. I've seen plenty of discussion and recommendation about what is "Pythonic", but never about there being "one way to do it". If anything, python is touted as being able to do things in multiple ways, though that is very grey territory due to it being a very dynamic, patchable language.


Because people used Cython to build performant numerical data processing libraries like SciPy, NumPy, and Pandas, which makes then building sufficiently performant machine learning libraries like scikit-learn much easier.


Same reason Perl got a bunch of biotech libraries and tools at the turn of the century: a good enough language in the right place at the right time.

Ruby's big disadvantage is that until Rails happened, it was relatively unknown in the English-speaking tech community. And though it has begun to broaden its reach since then, Rails (and, by extension, web dev) so dominates the conversation in that community that many people unconsciously forget it's a general-purpose language and are often unaware that there are libraries out there for doing non-web work in Ruby.

Python's early advantage for scientific stuff was its relative simplicity and its easy integration with extensions written in compiled languages. Interpreted languages will probably never be fast enough on their own for scientific computing, but a high-level interpreted language as an interface to fast compiled C and Fortran is good enough to attract people and introduce network effects which turn a moderate advantage into a big one.


It is largely by accident. All started with the invention of Numpy, around 1996. After proven success in HPC, Python has since established its role in numeric computation.


I think it's just the people who decided to do it. I was working for Enthought back when SciPy and NumPy were new things. You can fit the people who built much of the scientific and machine learning stuff for Python into a mid-sized conference room (and they'll all already know each other).

Why they chose Python, rather than Ruby or something else, probably has to do with timing. They were already working on this stuff before Ruby made any in-roads in the western world. And, once you've got a few hundred thousand lines of code, customers, and an OSS ecosystem based around what you've built, you don't drop it and move over to another language.

Python also happens to be well-suited for the task. SWIG support, for hooking into the ATLAS C++ libs, and there were reasonable ways to hook into the FORTRAN libraries, as well (and FORTRAN is the gold standard for performance in the scientific computing world, or it was back then). So, the technical side of Python was pretty well-suited to the task, even early on. The fact that Python was beginning to show up in universities computer science programs probably helped with adoption, too, though back in the early days the people using Python for scientific computing often had to also spend time and effort on training their customers how to use Python, so it wasn't an easy path (they were facing off against Java using competitors who had a bit of a lead in the move away from FORTRAN and C++ in the industries where they work).

That said, before Python, there was a pretty large contingent of scientists using Perl, for a lot of the same technical reasons; and there still are in some fields. Biology and finance still have pretty big Perl presence in industry. So, again, it wasn't inevitable that Python would be the leader. It just happens that the people who work in those spaces wanted to use Python and built a ton of great tools for doing it. At some point, it becomes a no-brainer because the inertia of the ecosystem does so much of the work for you. So, people without strong opinions on language, but need to do this work will flock to the obvious choice, which is now Python.

I guess one could ask, "Why didn't Python take off as a web development language? Why did PHP first, and then Ruby, own so much of that market for so long?" Again, probably because of inertia plus some early movers who built valuable resources for the language, making it the easy choice for people without strong opinions. Scientific computing is a much smaller market, with a lot less room for a bunch of players to be "leaders", I think. So, even though web development has several strong ecosystems in many languages, I doubt scientific computing could be that disbursed and still be as effective.

Edit: Also Google. Google likes Python, and Google likes machine learning. And, with the already excellent scientific computing ecosystem (SciPy, NumPy, Jupyter/iPython), Tensor Flow was an easy fit.


And because Python has a really nice history with C. Really help to link to some classic things that were written in C.


Interestingly, i think most of your speed on Stemmer come from using String.replace_suffix, which use binary pattern matching instead of regex.

Nice learning :)


I wonder if the new regexp code in Ruby 2.4 would address some of that


Author here. When I was benchmarking my library against a ruby one, I did actually try the ruby one on 2.4. Interesting findings are:

- Switching from =~ to Regex.match? did not yield significant speed improvement

- Without any changes though, 2.4.0-preview1 ran the stemming in 19s, compared to the 25s on 2.3.1.

The ruby stemmer I used is this one: https://github.com/NeilNjae/porter2stemmer


The headline is missing the word "Did" after "Accidentally", both on HN and on the post itself.


I think he accidentally the whole thing [1].

[1] http://de.urbandictionary.com/define.php?term=Accidentally


It is deliberate, dropping the verb after 'accidentally' is a long running joke. See http://knowyourmeme.com/memes/i-accidentally


Like numerous other memes, this too originates from 4chan. A poster on /b/ back in 2008 wrote:

"I accidentally 93MB of .rar files” and the rest is history.


wow. so people are this ignorant of memes. you must be incredibly productive.


All your base are belong to us.

Sometimes typos lead to memes.


Sometimes memes lead to people learning incorrect grammar (as I beat down my former benevolent grammar dictator within).


Does anybody know of a good benchmark between Elixer, GO and Scala?


Over a year ago, there was a benchmark of web frameworks. It compares the performance of frameworks written in Elixir, Go, Java, Node and Ruby.

https://gist.github.com/omnibs/e5e72b31e6bd25caf39a


That's really interesting --- how come Elixir does so well? I thought that the BEAM VM was good for concurrency and reliable systems, but wasn't so great at raw number crunching compared to the JVM and compiled code?


it's web framework benchmark there is no number crunching


it depends on the number crunching. All things considered, I've usually been impressed, working with it.

it's also a monster at processing binary data streams via pattern matching.


It depends on what you're doing. For purely computational workloads, Go, and Scala kick Elixir's ass. For highly concurrent workloads where you're edging up against GC, Elixir, and Go kick ass.

It all depends.


I'm pretty sure that neither Go nor Elixir have anything to compete against Hotspot/Azul in the GC department.


They don't have to. An usual Elixir problem is IO bound, which means that most of the time you don't have to do GC.

The BEAM GC each process isolately, which means that if a process die before a GC is needed, the BEAM just clean the whole process memory. No need for a GC.

That is what happens most of the time.


The TechEmpower benchmarks are commonly mentioned. They show Go as a top performer with some occasional decent results from Scala.

http://www.techempower.com/benchmarks/#section=data-r12&hw=p...

I have no idea how well the implementations are on each of these or how well each platform works within the constraints of this benchmark.


It should be noted here that the Phoenix framework author (Chris McCord) complained that the Elixir benchmarks here were set up poorly. Quoting from his post:

"They were testing JSON benchmarks through the :browser pipeline, complete with crsf token generation. They had a dev DB pool size of 10, where other frameworks were given of pool size of 100. And they also had heavy IO logging, where other frameworks did no logging. We sent a PR to address these issues, and I was hoping to see true results in the latest runs, but no preview was provided this time and we weren't able to work with them on the errors."


IMO the most effective way to answer this is to write your own benchmarks tailored exactly to your use case(s) in each of these languages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: