Hacker Newsnew | past | comments | ask | show | jobs | submit | MrManatee's commentslogin

With anything like this, I would love to look at the raw data to get an intuitive feel for the phenomenon.

For example, the word "surpass" was used 1.47 times per million in the pre-2022 dataset and 3.53 times per million in the post-2022 dataset. That's 16 occurrences in 10.92M words and 41 occurrences in 11.63M words, respectively. That's a low enough number that I could just read through every occurrence and see how it feels. In this case I can't because the authors very understandably couldn't publish the whole dataset for copyright reasons. And replicating the analysis from scratch is a bit too much to do just for curiosity's sake. :)

I often find drilling to the raw data like this to be useful. It can't prove anything, but it can help formulate a bunch of alternative explanations, and then I can start to think how could I possibly tell which of the explanations is the best.

What are the competing explanations here? Perhaps the overall usage rate has increased. Or maybe there was just one or few guests who really like that word. Or perhaps a topic was discussed where it would naturally come up more. Or maybe some of these podcasts are not quite as unscripted, and ChatGPT was directly responsible for the increase. These are some alternative explanations I could think of without seeing the raw data, but there could easily be more alternative explanations that would immediately come to mind upon seeing the raw data.


Yeah. I can believe that this will improve in the future, and this is a technology preview and all. But my takeaway from the video is that it is still too vibey for me to trust it. If I was coaching a junior data analyst at my company, and they wrote this code, I would happily give some feedback so that they can improve. With AI, I'd probably just do the analysis myself.

In addition to the hardcoded lookup table, here are some other notes on the generated code:

1. Silently assuming that the first page of results contains all of the data feels a bit fragile. If we're not going to bother with paging otherwise, I'd at least assert that the first page contains everything.

2. Unlike the code comment claims, 2022 is not the latest year that has data available, 2023 is. The reason this is worrisome is not that the difference is massive, but because of methodological implications. It looks like the year 2022 came from "remembering" details from some other code that was analyzing this same data set instead of just looking at current version of the data set we're supposed to be analyzing.

3. The code for removing the world aggregate doesn't actually work (although it doesn't matter for the map). The place where it says d.country.id !== "WLD" should be either d.country.id !== "1W" or d.countryiso3code !== "WLD" instead. Also, if it would actually be important to filter this then presumably it would also be important to filter out a bunch of other aggregates as well.

4. The text says "darker colors indicating higher life expectancy", which is pretty much the opposite of how I would describe this color scheme.

5. The analysis given is: "Notice how certain regions tend to cluster together in terms of life expectancy, reflecting similar economic, healthcare, and social conditions". This is such a bland take. It feels like something I could have written before seeing the data. I would try to encourage the analyst to look for something more concrete and more interesting.

6. The actual thing that immediately pops out from the map is that the Central African Republic is a massive outlier with a life expectancy of 18.8 years, whereas every other country is over 50. This doesn't seem plausible. I would do something about that just so that it doesn't mess up the colors of the entire map.


I think there exists a separate skill for classifying problems by difficulty, apart from being able to solve them. This skill can be developed from both directions by learning which problems have been solved and which haven't been.

If someone asked me to find solutions to these example equations, there are three complications that I would immediately notice:

1. We are looking for solutions over integers. 2. There are three variables. 3. The degree of the equation is 3.

Having all three is a deadly combination. If we were looking for solutions over reals or complex numbers? Solvable. Less than three variables? Solvable. Degree less than 3? Solvable. With all three complications, it's still not necessarily hard, but now it might be. We might even be looking at an unsolved problem.

I haven't studied enough number theory to actually solve either of these problems, but I have studied enough to know where to look. And because I know where to look, it only takes me a few seconds to recognize the "this might be very difficult" vibe that both of these have. Maybe LLMs can learn to pick up on similar cues to classify problems as difficult or not so difficult without having needing to solve them. (Or, maybe they have already learned?)


For "similar to tree-sitter but js-only", Lezer might be an even closer match:

https://lezer.codemirror.net/

I have used both Ohm and Lezer - for different use cases - and have been happy with both.

If you want a parser that makes it possible for code editors to provide syntax highlighting, code folding, and other such features, Tree-sitter and Lezer work great for that use case. They are incremental, so it's possible to parse the file every time a new character is added. Also, for the editor use case it is essential that they can produce some kind of parse tree even if there are syntax errors.

I wouldn't try to build a syntax highlighter on top of Ohm. Ohm is, as the title says, meant for building parsers, interpreters, compilers, etc. And for those use cases, I think Ohm is easier to build upon than Lezer is.


What makes Ohm better for building parsers and compilers?


This is more of an interpreter than a compiler, but if you look at the "Arithmetic example with semantics" [1] linked on Ohm's GitHub page, you can see how Ohm can be simultaneously used for both (1) defining a language for arithmetic expressions and (2) defining how those expressions should be evaluated. I don't think Lezer even tries to help with something like this.

[1] https://jsfiddle.net/pdubroy/15k63qae/


If the minimum wage is increased $4, the competing explanations seem to be:

1. Change in unemployment is normally distributed with mean 0% and standard deviation 0.606%.

2. Change in unemployment is uniformly distributed between 1% and 10%.

I don't really agree that "(1) vs (2)" is a particularly good formulation of the original question ("Would raising the minimum wage by $4 lead to greater unemployment?"). But if it were, how would the math work out?

If we observe that unemployment increases 1%, then yes, that piece of evidence is very slightly in favor of explanation (1). This doesn't feel weird or paradoxical to me. But surely we wouldn't want to decide the matter based just on that one inconclusive data point? Instead we would want to look at another instance of the same situation. If in that case an increase of, say, 6% would (almost) conclusively settle the matter in favor of (2), and an increase of, say, 0.8% would (absolutely) conclusively settle the matter in favor of (1).


> When most mathematicians say something like "X is true", what they mean is "X can be proved from the axioms of set theory".

I'm just one mathematician, but I certainly don't mean that.

Before we can prove anything about sets, we need to pick some axioms. Zermelo set theory (Z) would be enough for most of ordinary mathematics. If we need something stronger, there's Zermelo–Fraenkel set theory with the axiom of choice (ZFC). Or if I need something even stronger, there's, for example, Tarski–Grothendieck set theory (TG).

What I mean by "X is true" is technically difficult to define. The statements

(1) X is provable in Z.

(2) X is provable in ZFC.

(3) X is provable in TG.

are all increasingly accurate characterizations of "X is true", but none of them capture everything about it. And that's kind of the point. There is no proof system P such that "X is provable in P" would work as a faithful definition of "X is true". So the best we can get is this tower of increasingly sophisticated axioms that still always fail to capture the full meaning of "truth".

There is a convention among mathematicians: Anything up to ZFC you can assume without explicitly mentioning it, but if you go beyond it, it's good to state what axioms you have used. ZFC is not a bad choice for this role. It is quite high in the tower. In most cases ZFC is strong enough, or in fact, overkill. But still, it is not at the top of the tower (there is no top!), so sometimes you need stronger axioms. The fact that ZFC has been singled out like this is ultimately a bit arbitrary - a social convention. "X is provable in ZFC" may be the most common justification for "X is true", but that doesn't make it the definition of "X is true".


Not all functional programming idioms work in all languages.

On my computer, the article's imperative range(6000) took 0.05 ms and the "functional" range(6000) took 400 ms. The whole [...cur, cur.length+1] thing turns a linear algorithm into a quadratic one. It wouldn't happen in all languages, but that's how JavaScript's arrays work. My advice is that if you really want to do stuff like this, choose appropriate data structures (i.e., learn about persistent data structures).

Except in this case the imperative version is already totally fine. It is modifying an array that it created itself. You have all of the benefits of avoiding shared mutable state already.

Also, the difference in performance would become even worse, except that for range(7000) I already get "Maximum call stack size exceeded" on the recursive one. The imperative implementation handles 10000000 without breaking a sweat. My second advice is to not use recursion to process arrays in languages (like JavaScript) that have a very limited call stack size.


And suddenly you care a lot about the "how", not just the "what". Voiding the whole "but it's declarative!" argument.

At least in imperative code the "how" is explicit. In functional code it's implicit and you need intimate knowledge about compiler and/or runtime to know what's going to happen.


I kind of agree that it's possible to slightly overstate the declarativeness argument. Functional-style JavaScript is not that declarative. Not in the way SQL is.

But also, writing imperative code doesn't guarantee explicit performance characteristics. Whether you mutate references or not, you still need to know which operations are fast and which are slow.

In JavaScript, concatenation, [...array1, ...array2], is non-mutating and slow. Adding an element to the end, array.push(x), is mutating and fast. But adding an element to the beginning, array.unshift(x), is mutating and slow. So even if you're sticking to mutation, you still need to know that push is fast and unshift is slow.

And yeah, sorry, "in JavaScript" is not quite right. I meant in my browser. This is not part of the spec, and it's not mentioned in the documentation. Is it the same in other browsers? Who knows. To me, this is just as much "you need intimate knowledge about compiler and/or runtime to know what's going to happen".


> array.unshift(x), is mutating and slow

This year it's slow. I wouldn't count on that being true in five years. I mean, it might, it might not. There's a risk in optimizing too much for current conditions. You can easily end up over-fitting for the current set of coincidences, and end up with less readable code that's actually slower in the long run.

But by all means, measure and improve until it's fast enough.


You never know what the compiler is going to produce until you look at what it actually produced, whatever language you are using.

Unless there is clear evidences upfront that the project will be a piece of software where local performance is highly critical, it makes sense to favor code readability and maintainability over optimality.

Of course, you can have different level of code quality whatever the paradigm retained. Most languages out there will allow you to mix different paradigms anyway.

Given this fact, we would surely better served with an article like "When to favor expression in each paradigm available in your favorite language".


> Unless there is clear evidences upfront that the project will be a piece of software where local performance is highly critical, it makes sense to favor code readability and maintainability over optimality.

Sure, but let's also not confuse "optimal" with "reasonable". One of the major challenge of modern programs is how slow they are at every level. Very often, this bad performance can be attributed to a style of programming that tries to completely ignore that the program will run on a real machine with real hardware. A little bit of mechanical sympathy (e.g., operations that make good use of the CPU caches or don't confuse the branch predictor) can yield a program that is 10x faster than a naive implementation, with little to no loss of readability or maintainability. (In fact, as noted by Nelson Elhage [1], faster programs enable simpler architectures, which helps make them more readable and maintainable.)

In FP languages, programmers face an extra difficulty: the distance between the code they write and the machine code that will be executed is greater than in languages like Rust or Go. They will need to be knowledgeable about the language, its libraries, and its compiler to avoid the pitfalls that could make their programs slower than would be reasonable (e.g., avoiding unnecessary thunking in Haskell).

[1] https://blog.nelhage.com/post/reflections-on-performance/#pe...


> the distance between the code they write and the machine code that will be executed is greater than in languages like Rust or Go.

First of all, one of those languages is not like the other (Go is closer to JS than to Rust) - second, we really can’t reasonably guess at the produced assembly, even C compilers do some insane transformations leaving the code nothing alike the original, let alone more expressive languages.


I completely agree. That functional style actually favors readability and maintainability is a quite strong claim which I read often but it's usually lacking evidence.

In my experience, software engineers "think" imperatively. First do this, then do that. That's what we do in everyday life (open a random cooking book..) and that's also what the CPU does, modulo some out-of-order and pipelining tricks. A declarative style adds some extra cognitive load upfront. With training you may get oblivious to that, but in the end of the day, the machine does one thing after the other, and the software engineer wants to make it do that. So, either you express that more "directly" in an imperative style, or try to come up with a declarative style which may or may not be more elegant, but that this ends up more readable or maintainable is on the functional proponents to prove.


Maybe we have different mental models and that’s what drives this conflict? I certainly wouldn’t say that first do this then do that is my primary mental model, in small blocks yes, but once you get past even 1 file that breaks down. Once you introduce threads or the network these assumptions have to go out the window anyways.

It’s funny you mention recipes, because i’ve always been frustrated by traditional recipe descriptions that muddle concurrency and make it difficult to conceptualize the whole process. E.g. the table structure here is superior to step by step http://www.cookingforengineers.com/recipe/158/Dark-Chocolate...


The network is the prime example for forcing serialization of events.

Tbf, I agree with the recipe criticism. Would be neat with a dependency graph instead of a step-by-step list of things to do when baking a cake. Would have saved me a lot of headache in the past. (The table in your link expresses a tree, which is probably sufficient for most purposes.)


> In my experience, software engineers "think" imperatively

I hear this often. In the past the claim used to be that they "think" object-oriented. This is a thinly veiled argumentum ad naturam.

> ... on the functional proponents to prove

Prove your own claims before you demand proofs from other people. And by prove I mean really rigorous thinking, not just superficially seeking confirmation for the things you already believe either way.


Not necessarily. If the current "default" is imperative, then the burden of proof is on the functional advocates, because they're the ones advocating for change.


Burden of proof is stupid if the only argument for the other is being the status quo.


Burden of proof is very relevant if neither side gave an argument.

People are doing A. Someone says "Do B instead!". "Why should we do B?" "Well, why should you do A?" At the end of that extremely unproductive exchange, what are people going to do, A or B? They're going to keep doing A, because they were already doing that, and nobody gave them any actual reason to change.

So "burden of proof" isn't meant in the sense of this being a formal debate, with rules. It means that, when ahf8Aithaex7Nai said "Prove your own claims before you demand proofs from other people", that ahf8Aithaex7Nai is wrong. OOP is the current default in terms of the bulk of professional programming; if FP advocates want that to change, it's on the FP advocates to provide reasons, not on the OOP advocates to prove the correctness of the status quo.


Right the status quo works. Computers are serving us. The only question of course is whether rigorous adaptation of FP would make them work even better.

The proponents of status quo only need to prove that the existing approach works and is useful. And I think that is proven already by the very existence of status quo. It wouldn't be there if it wasn't somehow useful.


A: This thing! It's true!

B: Prove it!

A: No, you prove first! With really rigorous thinking, please!


Bad faith


I mean, OOP is still a very great model for plenty of programs.


In my experience, people think and explain their ideas in natural language and sometimes pictures. So that is the best way to program computers.


Imperative is good (perhaps even better) on a local, small scope.

It is terrible on a system level with concurrent execution, there you really need all the safe guards.


As weird as it sounds, but even when performance is not a deal breaker for individual users, on high-traffic sites the sum of all energy wasted on unnecessary/unoptimized computations can become so large that it becomes an actual ecological concern. It's a completely new thing that we should keep an eye on.


Is it really when we literally waste countries’ energy usage on bitcoin and such? Computers are surprisingly low in energy consumption on the grand scale of things, especially compared to their usefulness.


Really? Only one application of computers --- Bitcoin --- wastes countries' energy usage and yet it doesn't make sense to optimize software for ecologic reasons?


No, but there are plenty much higher targets before going after computers, especially compared to the insane value they produce. Like, is that single server running all the logistics of a country’s train network “inefficiently” really significant compared to.. the inefficiency of even a single train?


What about those billions of devices running software in gadgets, phones, TVs, routers, dishwashers, etc ?


Crypto uses vast energy as a feature, a side effect of maintaining a competitive environment as part of proof of work (other staking mechanisms are different). You could not optimize this to use less energy.


I wrote a blog post once, https://www.fbrs.io/ramda/, where I compared a functional solution and a vanilla solution. I tried to cover both performance and some vague notion of readability.

In the end the vanilla solution won in both areas. And I say that as an FP and Haskell fan boy.


Ramda is what you get when people try to zealously make Haskell out of Javascript. The same happens in many other languages where people try to force Haskell idioms into languages.

Edit. For some reason people equate "functional" with Haskell, even though Javascript (and most modern languages) is perfectly capable of expressing functional idioms without the descent into madness.

Your vanilla solution is already a functional solution. Literally no need for `transduce pipe map over lens`


> [...] is perfectly capable of expressing functional idioms without the descent into madness.

Except of course deep recursion, which needs to be rewritten into a not so readable externalized stack or some kind of loop replacement. There are difficulties involved with rewriting some recursions as loops and some loops as recursion.

Of course, if JS engines adhered to the standard, they would all have TCO by now and the problem would not exist.


All you have to do is use a relocatable stack/heap hybrid like Go does.


I would argue that if you want to make an honest comparison than use an implementation that at least makes an effort to optimize functional code. And JS does jack shit for FP code compared to Haskell.


All code I've seen that uses Ramda looks like the one in the example.

No idea what your comment about jack shit is about. Haskell is often just as bad as Ramda.

Edit. By the way. All the unusable and unreadable jargon that Haskell uses can be much more easily explained with Javascript: https://github.com/hemanth/functional-programming-jargon Make of it what you will


What's wrong with Ramda?


Unusable unreadable unnecessary complex abstractions in a language that doesn't need them. Also, extremely non-performant.

As you can see even in the blog post: straightforward readable code using built-in functionality is turned into 8-12 imports, a bunch of auxillary functions and multiple function calls that obscure the actual functionality. At 35 times speed penalty and 2 times memory penalty.


The library itself is fine. The problem I see is twisting JS/TS in to a language it is not.

JS simply does not lend itself to currying, data-last signatures, piping, and pattern matching like an ML family language does. And as you can tell by the Ramda typings, neither does the TS type system. [0]

You will forever fight an uphill battle against every single tutorial and piece of documentation and colleague when going down this path. I don't think the effort is worth it.

0 - https://github.com/DefinitelyTyped/DefinitelyTyped/blob/dc9b...



As much as I like these concepts, I'd be hesitant about adding them to the language. At what point does a language support too many paradigms?

In contrast, take the recently stage-3'd Iterator Helpers[0]. These build on top of the language by using methods that already exists in other parts. It feels natural and is more of the same.

0 - https://github.com/tc39/proposal-iterator-helpers#implementa...


You can cherry-pick examples that go both ways, and we've all worked with individuals who have consistent biases towards and against functional idioms. In the end you have to rely on your own judgment for each case.


Great points! I am still trying to figure out how to best use functional programming with JavaScript. I've found writing performance critical parts imperatively while being conscious about shared mutable state is a decent compromise.

> My second advice is to not use recursion to process arrays languages (like JavaScript) that have a very limited call stack size.

Can also confirm this, here was my process on the some of the harder days in Advent of Code:

1. I wonder if I can solve this in a pure functional style with JavaScript

2. Yay, it works on the example input

3. Oh, I hit the call stack limit on the real input

4. Time to rewrite the recursion as a loop lol


Shame since ECMAScript 2015 was supposed to give of proper tail calls in JavaScript, but Blink chickened out and so did Gecko.


> I am still trying to figure out how to best use functional programming with JavaScript.

IME avoid recursion, use immutability in the large while avoiding in local scope beyond a simple .map.filter chain, and on the backend I allow myself fp-ts to have sane error handling (I assumed TypeScript there but I'd just not ever use vanilla JS nowadays).

With that said I'd really, really rather use a compile to JS lang that does the optimizations for you, specially for frontend use cases.


> trying to figure out how to best use functional programming with JavaScript

Use ES6 classes to guard every property read and write behind a method-call.

You can avoid mutable state by making methods return only primitive data, including JSON. Then nobody can modify such state from the outside. You control and can easily observe all state-changes.


5. refactor loop as loop function


The functional version is written in a style that uses tail recursion, which is not optimized in JS. It’s linear if arrays are not copied, which should be the case with TCO.

Both versions are quadratic if arrays are actual arrays and require a single chunk of memory. (push may need to copy the entire array)


And persistent memory structures also tends to have a cost. For example, persistent maps are usually O(lg n) lookup and insert rather than O(1) usually seen in mutable maps.

If I remember correctly this is a pretty fundamental limitation of persistent structures, so it's not like there is a better persistent map data structure out there waiting.


Lazy structures can usually get the log(n) back when you amortize over many operations.


Sure, but for some structures it amortizes to log(n) where mutable structures can amortize to O(1).


> On my computer, the article's imperative range(6000) took 0.05 ms and the "functional" range(6000) took 400 ms. The whole [...cur, cur.length+1] thing turns a linear algorithm into a quadratic one. It wouldn't happen in all languages, but that's how JavaScript's arrays work. My advice is that if you really want to do stuff like this, choose appropriate data structures (i.e., learn about persistent data structures).

That is a good point and should perhaps have been timed before being put as an example. Arrays themselves are usually quite the mutable data structure as well, so I think your advice about learning persistent data structures is spot on. Sometimes one can simulate a persistent data structure by using copying the whole or parts for some operations and use those only when one needs a separate copy.


Pre-allocating the size of the array lowers that number even more.

Persistent data structure are really useful, and are (often) magnitudes more efficient than DIY-immutable data structures.

Imperative ones however, like you mentioned, are often (comparatively) near ideal by virtue of being imperative.


“Functional Programming—When and Where”


The way I understood [1] is that even a trivial app (such as a clipboard URL logger) will take longer than 2 hours once you start adding the polish. I don't think that can be refuted by writing a 5-minute non-polished version of the app.

[1] https://austinhenley.com/blog/thisprojectwillonlytake.html


This. I’ve been a Backblaze customer for years. I’m not a backup enthusiast. To me it’s a problem I want to solve and forget about. If I needed to save money on utilities, I’d probably have more to gain by switching my electricity provider, my insurance provider, or my internet provider. The possibility that there might exist a slightly cheaper backup provider isn’t particularly exciting to me, and hasn’t been on my radar.

What has been on my radar is that my laptop has a small SSD, and in addition I have an external hard drive containing files that I want to back up but I rarely use. Backblaze used to require that I regularly plug in that hard drive and then keep it plugged in for a few hours. This was a total chore, and meant that I couldn’t just forget about my backups. I seriously considered switching backup providers because of that. Luckily Backblaze eventually introduced a slightly more expensive plan where I don’t have to do that. So at the moment I’m a satisfied customer, and I’m not actively looking for alternatives.


How about relation chaining, such as with inequalities? Is there some nice S-expression notation for, say, "0 < x ≤ y < 1"? In many programming languages you would have to write "0 < x and x <= y and y < 1", but in mathematics this kind of repetition would quickly become very cumbersome.


If you wanted to avoid this in math notation without being overly verbose you could write x,y \in (0,1) and x <= y.


Yeah, I suppose this would be acceptable in this instance. But it still repeats x and y, and if x and y were complicated expressions instead of just simple variables, I wouldn't want to repeat them.

For example, the proof of the Minkowski inequality in Wikipedia [1] contains a statement of the form "x_1 = x_2 = x_3 ≤ x_4 = x_5 ≤ x_6 = x_7". Except that x_1, ..., x_7 are so complicated that any notation that requires repeating them doesn't feel like a satisfactory notation for "all of mathematics".

[1] https://en.wikipedia.org/wiki/Minkowski_inequality


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: