Hacker News new | past | comments | ask | show | jobs | submit login
Why Every Language Needs Its Underscore (hackflow.com)
88 points by Suor on June 22, 2014 | hide | past | favorite | 76 comments



Being a Ruby developer for quite a while, Underscore.js always struck me as an attempt to bring some of the native functionality of Ruby into the language that had none of it. Internal iterators being the most immediate example.

Problem is that Javascript lacks a lot of the things programmers need at hand. So they write stuff to complement that deficiency. I write in Dart as well and can tell you I love it precisely because it provides better OO capabilities through its core.

Which really brings a very important question to the table: what should be a part of the language and what shouldn't. A Unix minimalist in me wants a language to be as small as possible. But then when you actually start writing code, you mostly want simple things done without having to import new libraries.


It's not a coincidence that Underscore looks like Ruby, its creator has a Ruby background and similarly developed CoffeeScript along those lines. (Both of which are perfect for me as someone who uses Rails on the back-end.)

Yes, JavaScript never had anything resembling a decent utils library, so an elegant and popular implementation like underscore was heaven when it arrived. It's the usual problem of browsers evolving too slowly and fragmenting too much for something like Java's collections to emerge as a standard.

The situation has gotten better now, with even IE moving towards auto-upgrade, but it's still not feasible for JS core to ship with large, frequently-updated, JS libs. The closest thing is Node's standard library, but that's not going to be ubiquitous in browser's anytime soon.

"what should be a part of the language and what shouldn't"

In the absence of a standard library, you end up with a dozen competing libraries trying to fill the void, and that stifles end-user applications.

We saw that a few years ago with the DOM. For the first ~8 years of JS, there was no significant DOM library, so app developers had to suck up the unfriendly raw DOM API. Then the toolkits started to emerge, giving way to huge fragmentation between jQuery, Prototype, MooTools, Dojo, and YUI. Suddenly libraries were jQuery plugins or MooTools plugins and your app was using the jQuery stack or MooTools stack. So the next 8 years of JS saw vast effort duplicated as developers built parallel libraries for each toolkit.

So I think the answer is to launch a language without the kitchen sink, but watch the community and gradually standardise useful functionality as the need emerges and adapt popular open source libraries as they becomes popular. Java did this well in its heyday. In JS's case, a lot of the DOM conveniences are now becoming core JS, but the nature of JS means libraries move slower than other languages.


>the nature of JS means libraries move slower than other languages.

prior to node and npm maybe. The JS npm ecosystem is actually growing faster than any other language's module/library system, it's on the cusp of becoming #1 in the world. http://modulecounts.com/


You made a perfect case for Dart.


I maintain a utility library for Dart called Quiver, that you could view in the same vein as Underscore, or Guava. Like you mention, the Dart core libraries are pretty rich, so Quiver doesn't have to define as many fundamental concepts. But rather than there being a sprawling library, many of the nice things come about from composing a couple of features.

My favorite example is Iterable. Dart's Iterable interface doesn't just have .iterator, it has .forEach(), .map(), .where(), .reduce(), .take(), .first, .last... 27 methods and getters! This makes using Iterables a consistent pleasure, but it could be a heavy burden on Iterable implementors, except that Dart has mixins and the IterableMixin, so all you need to do to implement the 27 methods is implement .iterator and use the mixin, which uses .iterator to implement everything else.

The ubiquity of Futures and Streams and how dart:html uses them consistently instead of raw callbacks is another example.

There are other Iterable utilities that are possible, which is where one of Quiver's libraries steps in to provide things like zip(), enumerate() and partition(), which does raise the same question you ask: why are the built-ins special, where do you draw the line? That's a judgement call where I tend to fall on a pretty pragmatic side of things.

Programmers benefit from consistency - not just because there's one way to learn how to do things, but also because some powerful idioms, like more functional-style programming, become popular in a vicious cycle: as they become more common, more people learn them and they become even more common. Having some of the higher-level bits ship with the platform makes everyone's life easier.

quiver: http://pub.dartlang.org/packages/quiver


If core is small then such language should provide a way to extend it outside. Like macros.


or libraries... like underscore...


Maybe it's just me, but I find:

  d = {}
  for k, v in request.items():
      try:
          d[k] = int(v)
      except (TypeError, ValueError):
          d[k] = None

much easier to read and understand than:

  walk_values(silent(int), request)

I need to find our what the "silent" attribute does and I'm not sure what type of errors are caught here. Although I'm all for hiding implementation when it's useful, I find it counter-intuitive to do it at this level in python, it makes things harder to read.


I feel like this is why types have such a benefit in Haskell. This would be typed

    walkValues :: (v -> a) -> Map k v -> Map k a
    silent     :: (inp -> Either e a) -> (inp -> Maybe a)
Here, `silent` is the contextualized version of a really standard error-ignoring combinator which simply throws away the additional error details conveyed in `Either`.

    silent'    :: Either e a -> Maybe a
`walkValues` is an extraordinarily common function usually called `fmap` and its existence tells us that `Map` is a `Functor`

    instance Functor (Map k) where
      fmap = walkValues
Ultimately, this pattern is really common and would be written in Haskell as

    fmap (silent' . int)
The types ensure it cannot be used incorrect. The notion of `Functor` happy compresses all ideas of "mapping" such as what walk_values is doing.


I love Haskell, and I totally agree with you, but I don't think you're making much of a case to people who don't already know Haskell. I'm pretty sure everyone who doesn't grok Haskell reads your post and thinks "Ah there's another Haskell weirdo spouting some weird syntax incomprehensible code."

So for them: What tel is saying is that in Haskell you annotate your functions with type information (and if you don't the compiler will generate the info for you), which gives you an immediate understanding of what moves into and out of a function, often making it easier to understand what a high level function does.


Additionally, the reason why we have weird typeclasses is that they cut such a wide swath of abstraction that you usually can recognize common functions for various types and minimize the number of things you must memorize.


It's so simple! Thanks, tel!


Oh, it's certainly not just you! I suspect most folks aren't used to writing and reading code in this kind of functional style.

In fact, these two blocks probably aren't equivalent... my guess is that `silent` catches all exceptions, since it would be an odd name for a function that just catches `TypeError`. Relying on the reader to learn the name of a function that specific would be completely unreasonable. OTOH, transforming a collection is pretty common, and silencing all exceptions is fairly common as well. (And occasionally even correct!) General-purpose functions like that are often worth the time to learn, since the small dividends from each application soon outweigh the up-front learning cost.

IMO, the biggest benefit of small abstractions like this is that they make bigger abstractions easier. The first form is easy for experienced programmers to understand because it's a common pattern, so you don't have to think too much about it. The functional style extracts that pattern and gives it a name, so it appears only as a single token. The resulting code is higher-level, but smaller; and once that extra machinery is out of the way, you have a chance to notice larger-scale patterns that was obscured by all that extra code. As programmers, we already learn to notice repeated calculations and factor those out to a method; the functional style does the same thing with control flow, with the same caveats in terms of extra abstractions and learning cost, and with the same significant benefits.


If you only see it once, sure. But if you use it as an idiom, it will be clearer once you do know what it does.

Similarly, flatten( my_list_of_lists ) is MUCH easier to read than the construct that you end up writing to flatten a list of lists.


Often all you need is an explanatory variable. How about this version:

    clean_ints = silent(int)
    walk_values(clean_ints, request)
Also it's useful to point out that almost any function call is non-obvious in isolation. If this is part of a larger application that is silently ignoring exceptions elsewhere, you've probably seen a call to `silent` once or twice already today.


I can see at a glance that the one liner is correct, where the explicit version I have to brainparse the whole thing to know if it is correct. (Once you are fluent in the expanded vocabulary)


This! A 1000 times this!


I don't disagree with the main point, but Python already has quite a bit of this. A more pythonic version of the first example:

    @retry(DOWNLOAD_TRIES)
    def download(url):
        ...

    images = [download(u) for u in urls]
To check if a sequence x is ascending, you can use standard library functions:

    all( t[1] > t[0] for t in zip(x[0:-1], x[1:]) )


Just

    zip(seq, seq[1:])
Won't work with iterator though.


Indeed:

    all(a < b for a,b in zip(seq,seq[1:]))
(Although one might want <= if checking to see if the sequence is sorted, rather than strictly ascending (that is, it can contain matching/repeated elements)).


itertools.islice would work.


No it won't, unless you `tee` an iterator first.


FYI, the official documentation of itertools has its very own pairwise example:

    def pairwise(iterable):
      "s -> (s0,s1), (s1,s2), (s2, s3), ..."
      a, b = tee(iterable)
      next(b, None)
      return izip(a, b)
https://docs.python.org/2/library/itertools.html#itertools.i...


for the record: the official itertools docs have a lot more examples than this one and they're all really worth knowing about if not studying.


What this article also nicely illustrates is the tradeoff between readability and compactness. As a generally experienced programmer I can easily read the original versions and I can also easily imagine how I would make similar improvements in a language I'm very familiar with. The condensed Python version is only comprehensible to me after reading the rest of the article.

Is there value in code that's readable by non-experts? Sometimes yes, more often no.


Readability is important but it's also important to remember that judging readability by a small fragment in isolation is misleading. Yes, for that small fragment on first encounter, the verbose version was more readable. But by the time you're looking at a hundred lines of code, let alone a hundred thousand, the compact version becomes more readable.


Walking through 100k lines of code I've never seen before - I feel the denser the code the harder to grok. Code needs to be concise, not short. But that's just my opinion.


Okay, but think about what you're comparing. Other things equal, 100 kloc of dense code is harder to understand than 100 kloc of verbose code, naturally. But that's not the relevant comparison. The relevant comparison is maybe 50 kloc of dense code to 100 kloc of verbose code.


> Is there value in code that's readable by non-experts? Sometimes yes, more often no.

Often for me yes, I've been programming a long time and these days I do mostly back end development so I write all front end code to be readable by a non-expert (me) so that in 6 months when I come back to it I will understand what the hell is going on.

I'm not confident in the language (javascript) to be sure I will understand my own code so I deliberately lean towards explicitness.


I disagree that explicitly listing every variable involved in a comprehension makes the code more readable by non-experts. I'd argue quite the opposite: it's easy enough to explain `map` to a non-programmer ("we're doing this thing to every item in the list") but it's much harder to explain variables, iterators, comprehensions, etc.

And if the non-expert is yourself in 6 months, the argument remains: `map` is the same in every language. Comprehensions, loops, conditionals - those are the things that you need a language reference to properly use.


> tradeoff between readability and compactness

What do you mean by "readability" exactly? Can you provide some kind of definition for what "readability" is? Can you point me in a direction of any research on "readability" (in programming) and it's impact on programmers' productivity?

Sorry for the OT, but this is a pet peeve of mine: we have no way of defining, let alone measuring "readability" in any meaningful way, yet people, including even experienced programmers like you, talk about "readability" as if they knew what it is, how it looks like and as if it was some universal feature of physical world like force or speed.


> Can you provide some kind of definition for what "readability" is?

Sure. In this particular case, your parent considers an algorithm "readable" if it can be readily understood by someone who is not familiar with the particular language being used, and is not experienced with Lisp-style languages, but is experienced with ALGOL-style languages.

That's not a universal definition, but at this moment in this path-dependent world it's a reasonable thing to consider.


I think in most cases a working programmer should be able to write code under the assumption that it will be read and maintained by people who are familiar with the programming language being used.


I think you missed GP point, which I believe was that "you can write Fortran in any language". Or more precisely that you can know a language very well, but that won't help you much if you are confronted with unfamiliar idioms and patterns you don't know.


> who is not familiar with the particular language being used

> but is experienced with ALGOL-style languages

That's NOT readability, that's merely familiarity. It doesn't help anyone to confuse the two.


No, I don't know of any research although it may exist. My personal definition of readability is very simple: the time it takes to understand a particular piece of code. I think there's a balance between overly dense code and code that requires me to scroll through several pages or jump between files.

It takes me longer to parse dense Ruby or Python code because I'm not very familiar with the languages but I'm aware that those languages give us ways to write more expressive code with less syntax overhead. The trick is to write compact code that is still easy to understand for someone other than the author.


> don't know of any research although it may exist

I couldn't really find any, but I'd be very happy to learn that it exists. We'd be much better off as an industry if it existed.

> the time it takes to understand a particular piece of code

Can you see a problem with this definition? Do you think that any given piece of code has time_needed_to_understand property, which is at least similar - or at least in the same order of magnitude! - for all the people who could read it? Exactly the same few lines of code can take one person half a day to understand while some other person will know their meaning in less than 10 seconds.

It's much worse than that, actually. You can show the same piece of code to several people, and you're almost guaranteed to hear complaints about both "too terse" and "too verbose"; both "too explicit" and "too implicit"; both "too noisy" and "without enough syntax sugar".

And this happens even if we're talking about people who mainly program in the same language - adding other languages makes this far, far worse. You get holy wars on the grounds of readability and syntax, you get people who just won't ever touch X, because it's so much uglier (and less readable) than Y and so on. It's insane.

Personally, I try to write "readable" code in every language I happen to use. I actually read a couple of books on the subject and looked through many pieces of code/projects which are widely considered "readable". That doesn't matter, because every now and then I still hear that the code I wrote is "too terse"/"too much syntax"/"too little syntax"/"too verbose" - the same code, just in the eyes of different people.

Back to the article, so that it's not entirely off-topic, I find this:

    http_retry = retry(DOWNLOAD_TRIES, HttpError)
    harder_download = http_retry(download_image)
    images = map(harder_download, urls)
very "readable" (by your definition of "readability"), as it takes me less than a second to comprehend what is it about. Conversely, this:

    images = []
    for url in urls:
        for attempt in range(DOWNLOAD_TRIES):
            try:
                images.append(download_image(url))
                break
            except HttpError:
                if attempt + 1 == DOWNLOAD_TRIES:
                    raise
I find much less readable, because I need to track a lot of irrelevant data to understand what it is about.

On the other hand, if my task was to understand why the code retries one time less than it should (for example), then the latter code is much more readable, because I don't need to navigate to some other file to see a definition of retry (which takes time) and I just see the potential off-by-one error in front of my eyes.

Anyway, "readability" is really a hard problems for the whole industry, I just wish people would admit that much and started to do some serious research on the topic instead of thinking it's obvious.

Edit: I'd also like to know what people who downvoted my first post here think.


I don't speak Python (guessing that's what this is) but the problem I have with the short version is that it still seems oddly verbose. Why not something like this?

    images = map(retry(DOWNLOAD_TRIES, HttpError,download_image), urls)
(I may have slightly fumbled the syntax here, but with luck my point is clear.)


Interesting that all the examples were monads. There is another level of abstractions waiting behind the abstractions in Underscore.js.


Heh, might as well just use another language at this time. Guido himself seemed to be very pissed off at the common lispers for being forced to add some functional language features. You can see it from the crappy implementations of single-arity lambdas, non lazy map, filter, reduce, etc. The language will never catch up to the times, as evident in the lame feature list in python 3. Most python users seem blissfuly ignorant of modern languages and Guido doesn't seem to like learning new stuff either.


> Most python users seem blissfuly ignorant of modern languages

It's really not that. The Python users I know, including myself, enjoy learning new languages. The issue is that there's extremely few which are "Pythonic" - a form of beauty in its pragmatism, readability, and usability.

If you come at most Pythonic Python code, it's remarkably readable, and it's easy to understand what's going on even without much domain knowledge regarding whatever it is the code does. There's no need to understand abstract branches of maths to understand how a Python program works, no need to understand some program's own DSL, no need for special text editors, it's all just a pragmatic language.

In some ways, it's really the Java of the current generation. That's not a terrible thing - it's a simple, beautiful language which works for developing reasonably complex applications. It's possibly to write code in all sorts of architectures without having to bend the language too much, or invent sublanguages.

IMHO, Rust is going to be a massive language for current Python users. It's extremely unmagical, readable, and opposed to the idea of "DSLs for everything" that some popular languages have, while having a very nice and usable type system. Code that I've seen in Rust also tends to meet the Zen of Python, even better than code in Go (another language currently popular among Python users).


I'm also out of love with python for similar reasons, but please note that map is lazy in python 3.

    >>> map(lambda x: x*2, range(100000000000000000000))
    <map object at 0x104792240>


It's not lazy, it returns iterator. There is a difference, you can't slice or index its result anymore. Compare to Clojure or Haskell map, which are truly lazy.


To be fair, the syntax might not be as clean, but isn't:

    m = map(lambda x: x*2, range(100000000000000000000))
    i = itertools.islice(m, 10**6, None)
    next(i)
Somewhat equivalent? This does seem a little slow calculating the first million squares, but subsequent calls to next(i) are snappy(ish).

I'm not sure how close you could get to "truly" lazy sequences combining count, zip and slice, though.


This is a nitpick, but it is lazy.

It's not represented in head normal form, but it's still lazy.

The limitations you mentioned are not an artifact of the implementation being eager(because it's not eager), but instead are an artifact of the representation used.


Is it true that this is O(1) in Haskell? (Making up function names but you get the idea)

    (fmap a function anArray) `get` n
?

How does fmap construct its value?


How fmap constructs it's value depends on how that type implements the functor typeclass I believe.


I think the biggest point why python does NOT need underscore is that python has pip and for almost every problem exists a simpler helper library.

Probably _the_ reason for underscore is that it can be distributed via CDNs and cached by the browser. This reason does not exist for other languages. If i want to retry stuff i can use a retry library, easily downloaded via pip. Same as i use argtools because argparse sucks. Same for database access, etc. etc.


In all of those python examples the non-underscore code was much more easy to read. We can save the extreme compactness for the IOCCC.


It's easy cause it's familiar. But it's not simple.

This excellent talk by Rich Hickey explains the difference http://www.infoq.com/presentations/Simple-Made-Easy


It's a pity some people downmod for expressing opinion. It's not like you were offensive making a point that too short can be less readable, not more.


I feel like under score just makes your code compact. They're also ignoring that I might want to be doing input validation while iterating over those lists. Forcing me to add a loop right after that very compact code and nullifying it only purpose. Then on top of that ill need to add comments to explain the compact and cryptic code.

I really don't see the benefit.


You don't have to add a loop to do your validation:

    if not all(is_valid, the_values):
        inform_user("You supplied bad data")
Or perhaps

    bad_values = select(is_not_valid, the_values)
    if some(bad_values):
        inform_user("The following values were bad", bad_values)
Yes, sometimes you might want to do several things in a single iteration for reasons of efficiency, but in that case you can compose() the functions representing the per-element operations.

This code is only "cryptic" to people not used to functional programming. If the intended audience is the set of everyone who knows Python, using these abstractions will reduce the number of people who understand the code. If it's your startup's team of three programmers, there's a good chance the time saved is worth the learning curve.


| I really don't see the benefit.

And you won't while you just pass it off as something that "just makes your code compact."

Regarding your comment about comments -> imagine a world where there were no for loops, then somebody invents a for loop. What would you say to the person who shoots it down because it just "makes code compact" and programmers don't want to have to add a comment above a cryptic 'for loop' explaining what it does?


Strikes me as more of an argument about why Python doesn't need an Underscore library, given its expressive syntax.


It doesn't need direct port. But it can take advantage borrowing ideas or library design principles.


I won't belabor the point, but I think these concepts in Python (and Ruby) influenced the design of Underscore, not vice versa.


Or that they all took influence from a bunch of languages from the 70s like Lisps and MLs


No idea how but I'd managed to miss underscore.js up to this point (this is one of the things that drives me crazy when I have to move into a domain that is adjacent to mine I never know if I'm doing things the best way).

This is going to make things much simpler for me working on the client side.

Nice!


You may want to check out Lo-Dash too (http://lodash.com/). It's a drop-in replacement for Underscore with some extra capabilities and claims to be faster and more reliable. It's become a major project in its own right now.


Also there is lazy.js which is even more faster than lo-dash especially when chaining multiple methods. http://danieltao.com/lazy.js/


I'll have a look at that as well, thanks.


Debugging underscore's endlessly created anonymous functions can be quite painful, leading to huge, nondescript stacktraces and sometimes straight crashing the debugger. I have ripped it out of multiple projects and found performance gains and much better readability. It seems like a great idea but in practice native features are usually better. Only in certain browser cases are certain underscore methods useful.


On the flipside - I've never had to suspect _ for being the cause of a problem.

That's a huge benefit. I actually spend less time debugging. I've reduced my program to data manipulation strategies. If I get something wrong, it's because I did my logic wrong, not because I typoed in a nested for loop somewhere.

In fact, when I happen upon code that isn't using underscore, it tends to be really long, loaded with unnecessary state and memos, and is very fragile and not easy to reorganize. One of the first things I do is start utilizing underscore. Once you get used to the extended language, you can do things like convert 200 lines of code down into a few lines. It frees your mind up to think about other things (which, for me, is the single most important thing in programming).

I suggest learning to love the higher order data flow abstractions.

Edit: I also suggest you utilize breakpoints ability to backwards through the call-stack (at least in DevTools), within the 'Breakpoints' tab, when paused, you will see a trace which you can click through, and be plopped into the scope of that execution as if you had a breakpoint anywhere along the chain of function calls. Even if the function is anonymous, you're interacting with it and so it shouldn't matter as much.


IMHO, if the program logic is essentially to loop over some steps, it had better be written in the form of a loop over some steps.


OTOH, if the program logic is mapping a sequence, then you better use map() on the sequence.


Sure, I agree. If the "map" verb is in your vocabulary, and it is the very right verb to use, why not?

Just, if possible, please make sure it is easy to find out how the map() is actually implemented (like please don't hide it deep in a language runtime).


Although underscore.js is more widely-used, I think sugarjs.com is an improvement and makes the syntax much nicer.


It would help to give an explanation of what underscore is besides saying it "makes life better." I am an avid python user and have never heard of Underscore.js. Thought this was an article about semantic underscores in Python, or underscores as used in variable names (which I dislike) :)


It's a utility library for JS. Consider that the article mentions Python's itertools and functools as equivalents. http://underscorejs.org/ does a good job:

"""

Underscore is a JavaScript library that provides a whole mess of useful functional programming helpers without extending any built-in objects. It’s the answer to the question: “If I sit down in front of a blank HTML page, and want to start being productive immediately, what do I need?” … and the tie to go along with jQuery's tux and Backbone's suspenders.

Underscore provides 80-odd functions that support both the usual functional suspects: map, select, invoke — as well as more specialized helpers: function binding, javascript templating, deep equality testing, and so on. It delegates to built-in functions, if present, so modern browsers will use the native implementations of forEach, map, reduce, filter, every, some and indexOf.

"""


Yeah, I figured that out. I just didn't like how the introduction assumed everyone knows about it.


Shameless plug: https://github.com/serkanyersen/underscore.py

Underscore.py is a direct port of underscore.js to python.


   s/underscore/lodash/g
Underscore,like Backbone feel like it is in maintenance mode.They didnt get significant new features for years.People should better use lodash.


> They didnt get significant new features for years.

http://c2.com/cgi/wiki?FeatureCreep


I use underscore a lot in JS but never felt the need in Python. In the examples given, I'd rather see explicitly what the exceptions being raised are, or the comparisons.


For Java, I think Google Guava or Apache Commons are more of the de facto standard library enhancements than the one linked in the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: