How do you distinguish data shuffling from computation? What’s actual computatio...

mrkeen · 2024-07-04T14:09:36

Before I was good at Haskell, I would approach a data-processing job sequentially based on the next thing that needs to be done.

I want to open a file, and I can't read it all at once, so I'll use a FileReader and it should be buffered, so I'll wrap it with a BufferedReader. I'll use try-with-resources and click into the classes because I can't remember if the contract of the outermost reader is that it will close the inner readers too.

Right, now I'll grab the next n bytes from the stream, and start thinking about the algorithm. Swear a bit when I think about crossing the buffer boundaries, and on-and-on...

The IO concerns are very much interwoven with the algorithm.

In Haskell I just start by writing one function from bytes to bytes. That's the computation. Then when that's done I expose that function as bytes to bytes.

Others can hook it up to files, webservers, pipe it through gzip, whatever!

lucianbr · 2024-07-04T12:35:41

Reading a row from a database and putting it on the screen, and reading some numbers from the keyboard and putting them in the database. These things I would not call computation. I mean sure, displaying needs to compute coordinates for where to light up pixels, but that's all already written. I just call it. Same with updating btrees when writing to the db.

I'm guessing if all you do is this kind of db - screen - keyboard and back stuff, haskell is not very useful, if not actively a hindrance.

benreesman · 2024-07-05T05:34:43

Haskell is actively a hindrance if one is mostly moving bytes from one place to another: the only thing that matters when you need to talk to 7 databases each different is fashion. The language that has bindings to all 7 each with a zillion users is the one you should use.

If you’re moving a lot of undifferentiated bytes the language you should use is historically C, more recently C++ (which is still the standard), or maybe soon Rust (which looks to become the standard).

If IO is a small part of your problem, performance needs to be good but not insane, and you’re mostly thinking about algorithms and mathematics?

Haskell is a very pragmatic choice there. OCaml is strong here too, and TypeScript is a very cool compromise between “mainstream” and “we do math here”.

tossandthrow · 2024-07-04T12:00:27

Philosophically speaking there is no difference.

What parent commenter probably refers to is that you think in terms of computations and not in terms of data units.

And that is just tremendously elegant.

082349872349872 · 2024-07-04T20:01:46

Philosophically speaking there's a great difference.

Data shuffling doesn't —in principle— lose information; computation does. ("evaluation is forgetting")

In https://news.ycombinator.com/item?id=32498382 "glue code" and "parsley code" are data shuffling, while "crunch code" is computation.

tossandthrow · 2024-07-05T08:04:27

Surely someone could find a taxonomy that makes a distinction...

I guess we have to colive in a world where both views are true.

082349872349872 · 2024-07-05T10:49:54

Luckily we have a whole partial order of equalities from which we all may choose, et de gustibus.

(compare https://news.ycombinator.com/item?id=40714086 )