Is a little long but this is basically the same sort of thing you see in OO languages with object.chaining.methods(with,arguments).together.forever(for(succintness { forever }).
Monoids are not complicated. They're the _easiest way to talk about combinable groups of things generically_ and lots of programs use echos of that concept when abstracting over collection objects. Monoids just finish the job by saying, "Anything which can be empty and combined associatively (think: adding integers, which also forms a monoid).
These abstractions make your life easier, not harder. But they can be a bit confusing at first because they're so universal and it can be so surprising how much milage the same code can get just by swapping monoids.
I'm porting some code from Haskell to Clojure right now for various reasons, and man do I miss monoids and the Writer monad. Producing machine-readable concatenated outputs as you go through a computation with ease is so powerful, and a lot of my time is spent writing less-powerful re-implementations of this feature.
> What was actually difficult to understand about the code?
Really? How about the CSV reading, from the article:
import Data.Csv
import qualified Data.Vector as V
import qualified Data.ByteString.Lazy.Char8 as BS
main = do
Right rawdata <- fmap (fmap V.toList . decode True) $ BS.readFile "nukes-list.csv"
:: IO (Either String [(String, String, String, Int)])
let list_usa = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="USA" ) rawdata
let list_uk = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="UK" ) rawdata
let list_france = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="France") rawdata
let list_russia = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="Russia") rawdata
let list_china = fmap (\row -> row^._4) $ filter (\row -> (row^._1)=="China" ) rawdata
putStrLn $ "List of American nuclear weapon sizes = " ++ show list_usa
Here's how I'd do it in Python (note I haven't tried this code, but it'd be pretty close to this):
import csv
countries = dict()
with open("nukes-list.csv", 'rb') as f:
csvfile = csv.reader(f)
for row in csvfile:
tmp = countries.get(row[0], list())
tmp.append(int(row[4]))
countries[row[0]] = tmp
print("List of American nuclear weapon sizes =", countries['USA'])
How can anybody argue the Haskell version is easier to understand?
The rest of the code is basically calling sum and filter on the data or calling library functions, and it would be almost exactly the same in Python, or even C++.
I'm pretty sure this isn't the idomatic way to parse a CSV file in Haskell. It's just what I found from a quick Google search. I'm sure someone else could do better.
The rest of the code is basically calling sum and filter on the data, and it would be almost exactly the same in Python, or even C++.
This is definitely NOT true. The rest of the code is about using algebraic manipulations, like group operations on the data structures. This is certainly possible in other languages, it's just never done and it's not idiomatic.
FWIW, I really enjoyed the article, and appreciated the example dataset -- usually this literature is quite dry. Do you know if there is any monoid documentation that enumerates the various properties as something like C# interfaces and then explains what the derived types are? I think that would help me and a lot of other people who don't know as much abstract algebra or find it hard to connect with their day to day work.
> I'm pretty sure this isn't the idomatic way to parse a CSV file in Haskell. It's just what I found from a quick Google search. I'm sure someone else could do better.
Isn't that the point, though? In Haskell it was so difficult to read a CSV file that you had to Google it. In Python it was immediately obvious how it should be done.
> How can anybody argue the Haskell version is easier to understand?
You're not comparing apples to apples here. This is the part of the code that does the CSV parsing:
Right rawdata <- fmap (fmap V.toList . decode True) $ BS.readFile "nukes-list.csv" -- Decode is a cassava function.
The actual CSV parsing invocation and subsequent rendering to list is:
fmap (fmap V.toList . decode True)
The code of yours that does that is:
countries = dict()
with open("nukes-list.csv", 'rb') as f:
csvfile = csv.reader(f)
for row in csvfile:
tmp = countries.get(row[0], list())
tmp.append(int(row[4]))
countries[row[0]] = tmp
As for fetching the actual subsets, I am not sure why he didn't write it this way:
let rows_from name = filter (\x -> (x ^. _1 == name))
let list_usa = rows_from "USA"
let list_uk = rows_from "UK"
-- ...
putStrLn $ "List of American nuclear weapon sizes = " ++
(list_usa ^.. traverse._4)
Particularly since he was going to do the same operation over and over. But this is not Haskell, this is just the Author sort of not writing beautiful code but copypasting code from a Real Thing. And probably just playing around with Edwardk's new lens library, which everyone loves but also everyone is just coming to terms with.
Of course, you took a shortcut because you "knew" that you'd only be using the data indexed by country name. We didn't make that assumption in this code, but it'd be trivial to add.
> I'm sorry, but it's not at all obvious that code is parsing a CSV file.
What, exactly, is your complaint?
Could this fellow have written cleaner code? Sure, we agree. Are you saying this is some fundamental problem for Haskell? Because I'm trying to tell you it's not the case. For example, on that line a better practice would be to do exactly what your Python script did; label the reader with csvs with "qualified" in the import.
So let's refresh the comparison with the proper style and without the decision to convert to lists from vectors:
-- Haskell, cleaner but longer.
-- Many people prefer this style when dealing in IO.
-- Please note that 'readFile' here sucks nearly as
-- badly as your Python version's readfile does.
-- They are terrible.
f <- BS.readFile "somefile.csv"
case CSV.decode True f of
Left error -> putStrLn error
Right tuples -> do
-- ...
# Python
with open("nukes-list.csv", 'rb') as f:
try:
csvfile = csv.reader(f)
# ...
except csv.Error as e:
print e
Huh. Similar linecount, similar line density.
You might argue, "Why were all the extra fmaps there?" Well, ostensibly they're there to do what your code doesn't do, namely deal with exception conditions (your code just ignores errors). But of course the author's poor style throws that benefit away on the same line with the pattern match on "Right _", but we've already conceded the author's code could be cleaner.
My point: You're blaming Haskell for the author's decisions. You're also crediting Python with your own personal familiarity. It is not Python that is natural and obvious, your many hours of hard work with Python have made that language's solution easy for you to perceive. Give yourself some credit, and stop pretending Python is somehow "naturally' clearer.
If you'd like to see examples of good, clean, fast Haskell code then they're easy to provide. I have a funny feeling that's not your goal here, though.
I wish I didn't agree with you. I'm still a Haskell novice, and I've enjoyed the time I've spent learning it, but this post was just terrifying. All I could think was how simple all that would be in R (or any statistical package), or even Ruby (or any popular scripting language).
Of course, one can write overly complicate code in any language. I think the purpose of the post was more to show off conceptually advanced techniques rather than to actually analyze the dataset in a straightforward manner.
I think the purpose of the post was more to show off conceptually advanced techniques rather than to actually analyze the dataset in a straightforward manner.
Correct. I admit it turned out to be a bit too much for one post, but i wanted to use some real world data (the nukes) to demonstrate the techniques.
Edit: I could be wrong, but I don't think R (or any other stats package) supports "group subtraction" of distributions, which is how we calculate the survivable nuclear weapons.
The "cool stuff" is in the commutative diagrams at the end of the article. They show that there are many different ways to train a distribution from data, some of which will be more suited to different tasks. In most languages, we would have to program each of those paths separately. In Haskell, we get all of those paths "for free."
Haskell has a way of making the hard things easy and the easy things hard. The high level, pretty stuff is great. The ugly stuff is, for example, getting what we want out of `rawdata`.
which is just saying, column 1 is the country, and we only want the rows that are "USA," and from those rows, we only want the fourth column. It would've been simpler if it were just arrays or lists, but that wouldn't be as powerful or as composable as lenses.
A better, more readable way would've been to not use those lambdas, since he's reusing the accessors over and over anyway:
let getKaboomieColumn row = row^._4
let trueIfUSA row = row^._1 == "USA"
let list_usa = fmap getKaboomieColumn $ filter trueIfUSA rawdata
...
When you read it that way, it looks a bit less intimidating.
Well, it's not exactly a simple task. There are a few simple ways to approach it, but isolating the various general structures and creating the maps between them is the bigger task.
I'm curious why you say that the code would be simpler in magma or gap? The post doesn't actually implement any group theory, it just applies it, which is exactly what you'd have to do anywhere.
Of course, it's possible that the internal code of the library would be simpler, but I'm a little skeptical.