Hacker News new | past | comments | ask | show | jobs | submit login
Translating Mathematics into Code: Examples in Java, Python, Haskell and Racket (might.net)
121 points by lelf on Feb 9, 2015 | hide | past | favorite | 48 comments



> Mathematics has no side effects.

I'm aware the author is a professor of CS, but this is a statement I always see people originally trained as programmers claim, and as a mathematician and a programmer I respectfully disagree.

Mathematics does what you define it to do, and if you can communicate your idea to another person it doesn't matter whether you "mutated" a variable or not. Likewise, you could say that there's no such thing as mutation in programs because the value of a variable in a language that supports mutation depends implicitly on time; you just don't express that in the source code. It's a matter of perspective, and more perspectives are often better than fewer. In math if you want to index a variable to have "pretend mutation" to avoid confusion you can, but there's no rule about it. If you feel the best way to communicate that idea is with mutation, then that doesn't stop it from being mathematics.


I don't think anyone, mathematicians or functional programmers, have a problem with mutation [0]. They have a problem with names suddenly changing without being too explicit about that fact.

In particular, mathematics certainly doesn't prohibit you from using whatever kind of notation you like to explain your concept, but it has suggested a certain primacy of the notion of a (pure) function. The reasons are both ones of mathematical "elegance" (whatever that means) and pedagogy. They're a fantastically important fundamental building block.

Mathematicians would never claim that pure functions are the only thing around, but by and large they'd assume that's what you were talking about given no other indication. Furthermore if you introduced something new (say something with a notion of mutation) then you'd merely be obliged to explain exactly what that new thing is with sufficient formality so that the reader can be assured that its behavior is completely reasonable.

All of this holds identically for functional programming. The only additional challenge is that FPers are accidental intuitionists so you can't merely name the behavior of a new kind of notation but must actually exhibit an interpreter of that new notation into more common notation. This is significantly more heavy-handed than most mathematicians would do.

[0] Or any other side effects. The whole argument is parametric in your choice of side effect.


From a mathematicians perspective, functional programming languages do a fairly bad job at being able to express algorithms in the way mathematicians have been writing them down since before computers existed. There is little difference between writing $x_{k+1} = f(x_{k})$ and $x = f(x)$ in an imperative language. In some cases the index k can even be identified as a multiplicator of some discrete timestep, in which case what the hardware is doing is in very close correspondence to the abstract iterative process.

All the operations in abstract algebra are actually destructive by default, if you do a multiplication, then that is a map

m : A \otimes A \to A

If you want a copy of the values you put in you have to pre- compose, with a diagonal map \Delta: A \to A \otimes A, which might even not exist (it doesn't if \otimes is the tensor product of vector spaces for example). Even application of a morphism on its own is treated as irreversible, once you've applied a morphism to an Object you can't get it back unless you are in a special category like Set, or the morphism happens to be invertible. The fact that functional programming languages hide copying behind your back does not make sense physically either, on a hardware level duplication is fairly expensive, because it decreases entropy.

There are countless examples in Computational Physics and Numerics, where pure functional programming languages fall flat on the nose. I'm thinking about finite-element methods with adaptive refinements, all the code for computer algebra and computational group theory, etc., not to mention all the Fortran/C/C++ code that is in existence in physics (At Cern they have written 60 million lines of it). It might be that the experts in functional programming languages simply don't have the expertise necessary to come up with viable solutions, because things like repa and accelerate are not realistic alternatives when it comes to real world applications.


I'm sadly not actually able to follow what you're getting at here.

I think there are a few threads going on that I'd be interested in following up on, whether imperative/mutable semantics are common in mathematics, whether it makes sense to call something in abstract algebra as "destructive", what copying means, whether analyzing computational physics/numerics is a similar task, whether functional programming experts are capable of building real world applied math tools...

Ultimately, though, I'm not sure which ones move the conversation along productively.

I think perhaps the most interesting one is the notion of copying. It is certainly the case that modern FP assumes copying is okay. This comes naturally from the "structural properties" of implication in type theory and there's plenty of work studying type theories which are "substructural" and thus don't assume an ambient copy/destroy comonoid. I don't honestly consider this out of scope of FP, although no languages implement it today. There is a need to pick a "default semantics sweet spot" and linear logic may have not yet proven it pays its way on the weight/power tradeoff spectrum. But maybe it will be shown to in time?

In the mean time you can do plenty of work in abstract algebra without talking about mutation or destruction at all. You mention category theory but ignore the prime dictum of CT: it's all about the arrows. Categories certainly exist which embody destructive internal algebras---CT would be pretty useless if it didn't---but externally we consider arrows and compositions of arrows alone. This is where the real power of CT comes from anyway.


Perhaps a few points of clarification are in order, my original comment had too many different threads mixed together. I will first try to expand on my remarks on copying and then try to respond to some of the things you said.

When you look at the categories in use by mathematicians a surprisingly large number of them are not cartesian, this is true in particular of the categories relevant to physics, where both the category of symplectic manifolds (relevant to classical mechanics) and Hilbert spaces (relevant to Quantum Mechanics) are monoidal and not cartesian. In both cases you therefore can not copy states.

Moreover in physics things are local in the sense that the dynamics of a physical system is usually determined by differential equations. That means there is typically no situation where you have to retain much of the state of the system for a long time in order to simulate its behavior.

Even for complicated things like PDE simulations, what you really do is approximate some infinite dimensional function space locally with finite dimensional vector spaces. The state space in each step of the simulation is some tensor product. So while a computer certainly allows you to do it, it isn't really natural to constantly copy state in such simulations. Abstractly a Fortran 77 program could probably described by a collection of morphisms in some traced monoidal category, so it doesn't seem too far off from the mathematics it was invented to implement, although ultimately it clearly is deficient.

From a certain angle a lot of abstract algebra is actually more or less about creative destruction of information, certainly if you use it as a tool to study number theory, or topology. A whole class of functors there are monoidal functors from cartesian to non-cartesian categories and you carefully adjust how much information the functor loses in order to make computations feasible (Homology vs. Homotopy for example). The reason mutation doesn't seem to be prevalent, is because a lot of operations that are studied are invertible. But as soon as you for example study a group operation on a space, that is not faithful, then you won't be able to recover the original space after you've applied the group operation once, unless you've copied it before.

The connection between entropy and copying is fairly well explained in this paper I stumbled upon a while ago: http://www.pipeline.com/~hbaker1/ThermoGC.html

Of course the origins of copying in type theory is that for example Gentzen's calculus can be properly thought of as living in some cartesian closed category, as you probably know. Personally I find linear logic interesting, especially because it is able to faithfully embed intuitionistic logic and lets you talk precisely about copying. The problem is of course that type inference is undecidable for linear type theory, if I remember correctly.

In terms of expressive power Haskell is probably already fairly close to the ideal, most relevant to the discussions is I believe its inability to express laws its type classes have.

It is fairly good at expressing "free" or "cofree" things, but is not so good at expressing laws statically, not to mention providing a mechanism for proving the laws for instances. The best thing you can do is write embedded interpreters or compilers for those free / cofree things, that sort out things at runtime, or use template haskell.


Broadly, as others have highlighted, you appear to be mixing modes between internal algebra and external algebra. That's fine and in agreement with my argument: "nobody cares about using [categories] with [side effects], they merely want you to be explicit".

The only intensifying bit is that you need some kind of ambient category to build your notion of categories internally to and this one, by long tradition, is typically cartesian.

Differential equations are a great example, but I'm going to use them to again intensify my point instead of yours ;)

Differential equations clearly involve a vital component of state. However, it is benign in its usage. In particular, an enormous amount of work done in diffeq (from designing the models, to analyzing them, to solving them) is done statically either via algebraic manipulations of state free equations or via analyses such as, e.g., Laplace Transforms.

The point is that diffeq equations, by being explicit about state (e.g. external to it) enable many useful interpretations. If diffeq were truly about state alone then you would have no other choices---if I handed you a differential equation then the only thing you could do would be to observe it as it is now or press play and watch.

Clearly that's an important POV, but equally clearly it's not the only one.

So, if your argument boils down to the fact that there are many topics of study which have destructive internal algebras then, yes I agree wholeheartedly!

I just am emphasizing that you need a place to study them externally because it's damn powerful. This external place is, by mathematical convention and the seductive power of cartesian closure, usually side-effect free.

In particular, if you want to dive down into the internal algebra of a side-effective category then you have monads to do that with.

---

Complete inference in any interesting type system stronger than HM is probably undecidable. Personally I don't actually see that as such a big problem. Replacing complete inference is a compiler which will, in real time, tell you everything it knows about relevant names in scope which is pretty powerful. Not being able to ignore types entirely is actually a good thing.

---

And w.r.t. laws and the like, I disagree that TH is much of an answer, would rather suggest that "interpreters and compilers" are an ideal solution and essentially the best we can do while retaining constructivity, and suggest you look closer at the dependently typed languages if you want to learn more about law expression in that kind of tool.


There is nothing destructive about the map M: A x A -> A.


In fact there is, take A to be the space of differential forms on some manifold, and M the wedge product. Given two differential forms eta, mu, there is no way to recover eta and mu from M(eta,mu). Even simpler take two arbitrary numbers multiply them, unless they were prime you have no way of telling from the product what the two original numbers were. In other words multiplication destroys information.

In the case of numbers this is fine because there is in fact a way of copying them beforehand by an operation Delta : A -> A x A, which sends a number a to (a,a), so if given (a,b) you want both their product and the numbers themselves, you need a map A x A -> A x A x A given by (for example)

(id x M x id) . (id x id x Delta) . (Delta x id)

In the case of differential forms, there is no such (natural) map Delta.


No, applying a map does not mutate or consume the parameters, and therefore no information is destroyed. Function application in mathematics is referentially transparent.

To make this point perfectly clear: Whenever you encounter an expression such as "f(x)", you may freely re-use the expression "x" in a different place. This is a matter independent from the category you choose to work with -- for example, the expression "psi \otimes psi" makes sense in the monoidal category of Hilbert spaces.


This largely depends on your viewpoint, the map M : A * A \to A was meant to be a morphism in an arbitrary monoidal category. An element like x, would be interpreted as a morphism x : I -> A (in general the objects A in a monoidal category, or any category for that matter have no notion of elements). The expression f(x) is interpreted as the composition f . x, so no you are not free to use x a second time, once you formed that composition. This is swept under the rug in most cases because monoidal categories like Set are cartesian and allow you to freely copy morphisms like x : I -> A, via the Delta map Delta : A -> A * A. Explicitly you have Delta(x) = Delta . x = x * x. In general it might not be possible to even determine if two morphisms x : I -> A and y : I -> A are the same.

In some cases you might be able to assume from the start that you have a certain number of equal morphisms x : I -> A "in reserve", for example in linear logic you have the operator ! ("of course") which gives you an arbitrary number of identical morphisms to play with.

In any case this distinction is quite subtle and I understand, why you might assume that I'm simply misunderstanding things. In particular I should emphasize that almost no programming languages work this way, although with some effort you would be able to recast typical cryptographic / numerical code in this language.

It is also really easy that to see for example in the case of addition that indeed information is destroyed, clearly the map

(a,b) -> (a+b,a-b)

has an inverse if the characteristic isn't 2, the subsequent projection to the first factor destroys information. Categories in which that is possible have maps d_A : A -> I.

What tel is pointing out above, is that Delta and d together form a comonoid.

While psi \otimes psi certainly makes sense, the map psi \to psi \otimes psi is not linear and therefore not a morphism (Physicists call this the no cloning theorem for some reason).


I understand where you are come from, but you are conflating different meta-levels (external vs. internal language/logic): You are absolutely free to use any formal expression "x" a second time, and - crucially - it will have the same mathematical meaning if you choose to do so, as can be seen by pondering over the tautology "P(x) <-> P(x)". This is not a property of the category that you chose to work with, but rather of the formal language that we use when engaging in mathematics, as you have demonstrated yourself when you had permitted that "psi \otimes \psi" made sense.

The non-existence of a morphism "psi -> psi \otimes psi" and the notion of "destroyed information" that you are discussing in the rest of your post is independent from all of this. If you wish I can elaborate on the "no cloning theorem".


Well ok, then this was a simple misunderstanding, or miscommunication. My knowledge is largely based on reading of http://ncatlab.org/nlab/show/internal+logic and several other pages on that wiki, plus graduate training in math and physics. Based on your last reply I'm confident you have a similar background. Logic and category theory are not my specialty and I probably haven't expressed myself as clearly as I would in professional communication.


To check if I understand the argument: I'm guessing you're talking about the internal logic or language of some category, in contrast to the external one used to talk about that category.


Right I think that was the point of misunderstanding between me and asuidyasiud. If you think of x as an assumption in linear logic for example, then there is a difference if you assume x or x^2 and so on. Since in programming languages such x : I -> A are values of type A, this then ties in with duplication and destruction I was going on about above. I tried to express the sentiment that in many ways it is actually natural to think of those kinds of logic as more fundamental, because it makes the duplication of values (unnatural for things like resources, spaces of state, etc.) explicit or even impossible.


What is your definition of "destructive"? A function with no inverse? Just trying to understand...


Yes, not injective essentially, which is not the same as not having an inverse. The notion can be extended though, for example functors that aren't faithful are certainly destructive .


Java has no functions and is the language which gets harped on the most, so this is obviously not the main issue.


One obvious example of a mutable variable is in math's very own expression of the for loop:

     n
    ∑  ƒ(i)
   i=m
This implies a mutating variable (i) and a hidden mutating accumulator that tracks the sum. It seems like a perfectly clear mathematical expression of the concept, though, in spite of the fact that it is fraught with 'unmathematical' mutation.

I can think of many mathematical algorithms which, when stated most simply, imply the mutation of state. Dijkstra's shortest path, for example, requires you to label graph nodes with distances and states which change with each iteration. You can implement it, obviously, on purely functional data structures, but doing so with mutable graph node structs isn't going to undermine the validity of your implementation of what is at core an iterative algorithm designed to be efficient in terms of the amount of additional data you need to store to run it on a given graph.


Dijkstra's shortest path: good point. There is obviously a purely functional implementation, but it's true that the most natural way of thinking about it involves assigning labels.

I disagree about the sum "loop". It's merely a notation, not an actual step by step operation. Unlike a traditional imperative language's variable, i's value cannot be changed. Whatever the actual expression of f, you know i is always the same within it. You could think of it this way: "for each value v from range m..n, introduce a new variable i at each step with value v, which we will use for the expression".


I disagree completely on both examples. The first I have a technical nitpick to illustrate a deeper point and the second I have merely a stylistic point.

Firstly, I disagree completely that sum notation implies a hidden mutating variable. The most obvious way to attack this point is to note that sum notation is often applied when the summing domain is infinite and therefore if it means anything constructive (and it easily can!) then it certainly cannot be interpreted as a mutable loop.

Instead, it's best to merely have it represent the well-defined notion of a sum and note that it can be interpreted in many ways. This interpretation could involve using an algebraic identity (as one must with infinities) or, in the exact case you have an ordered finite domain, be interpreted either as a mutating iterative summing algorithm or a recursive immutable one.

The recursive bit applies exactly and only when the mutating algorithm one does.

---

As for Djikstra's algorithm, Djikstra wrote it using the popular computer science notation of the time. Nobody denies that this formulation of the algorithm requires mutation to implement, but it's merely a formulation of it used for proof purposes. The idea of Djikstra's algorithm can be expressed without using a mutable language---it's trivial at its basest, merely replace mutation with a bunch of copies of your graph---and there's no particular reason to prefer either one!


As a further intensifying point, this is why Djikstra spent so long making sure he understood and communicating ALGOL and FORTRAN and the like so well. He wanted to explore new notations and needed to be absolutely clear how they worked.

This is similarly why Milner, Tofte, Harper, MacQueen completely specified the behavior of SML. [0]

[0] http://www.amazon.com/The-Definition-Standard-ML-Revised/dp/...


Mathematicians arguably do not think of the first example as a loop with a mutating variable, but rather as a short-hand notation for f(m) + ... + f(n) - as a macro, if you wish.

I agree with the second part.


I actually agree with your example. The point is that nobody should care whether one interprets the looping variable (and underlying sum) as mutation or as shorthand for a mutationless expression. It's pretentious and counterproductive to call one interpretation mathematics and the other not.


Right - my point was more in the context of the idea that to correctly translate that mathematical concept to code you are doing it wrong if you use mutation. If a mathematical algorithm you're implementing contains a summation, you're not fundamentally misrepresenting the underlying mathematical purity if you implement it as a for loop with an accumulator.


> Likewise, you could say that there's no such thing as mutation in programs because the value of a variable in a language that supports mutation depends implicitly on time; you just don't express that in the source code.

But this is precisely the problem with mutation and programming languages:

    ...some code...
    y = f(x)
    ...more code...
    z = g(x)
Is x in the two lines the same? Often it's not. This is what's surprising to a mathematician. It would help if the "time" was made explicit, i.e. if you had x(t1) and x(t2), in which case it's quite obvious you need to take a closer look. But in mainstream imperative languages there is no support for this, which is very much a problem.


I think what's surprising to a mathematician is if the fact that x can change is not stated prior to witnessing it. So you might say that immutability is a reasonable default for mathematics, but again not necessary as the author claims.

While I think programmers might find it surprising, in mathematical proofs context is often the primary tool one uses to figure out what the hell is going on locally in some expression. For example, indices like time and the size of a problem (in combinatorics) are often dropped because the reader understands that the context is asymptotic.


> I think what's surprising to a mathematician is if the fact that x can change is not stated prior to witnessing it

This is precisely what happens in languages without immutability / control of mutation. They effectively say "this may or may not mutate. See for yourself. Oops! It just mutated. Sorry!"

Also, isn't what you talk about in your example just a shorthand to avoid cluttering a proof or whatever? It's not that variables mutate; it's that by convention you elide the notation that shows they mutate. That makes a lot of sense: the writer and the reader assume some things and go on with a compact notation that eliminates noise. It's a useful convention, and if the reader loses track of the notation, the worst that can happen is that he/she won't be able to understand the proof.

The problem with computer programs is that it's about both the reader and the computer interpreting the program. Because variables may or may not mutate, it's harder for the computer to perform some optimizations. Because of the same reason, it's harder for humans to understand the code ("is this going to use the convention that it won't mutate variables? I have to carefully read the code and consider all paths in order to be sure").

In a sense, programming with uncontrolled mutation is like switching the default convention to its most confusing setting :)


I'm not trying to remark about what is or isn't good for program design. I actually like functional programming. I'm criticizing the author's claims about mathematics which are clearly a justification for his views.

If you're reading a paper with theorems and proofs and algorithms and the author says at the beginning that the algorithms in the paper will be described using mutation (or even if the reader has to figure this out on the fly) then I argue you can't use this as a reason to say what you're reading somehow isn't math. You can criticize it if you think it's unclear, sure, but it's still math.


>> the value of a variable in a language that supports mutation depends implicitly on time

Mind blown.


What blows my mind is that variables in C implicitly depend not only on time but also on the entire contents of memory!


And also depend on all the contents of disk and network and all attached input devices.


And if things are multithreaded then shared memory variables depend implicitly upon all possible actions of all possible simultaneous threads.


Welcome to FRP


I always justify immutability by the fact that I like my assumptions not to change. I expect mathematicians to enjoy that property too.

ps : FP expresses changes as new instances with new parameters. Pushing away mutation as far as possible.


>If you feel the best way to communicate that idea is with mutation, then that doesn't stop it from being mathematics.

No, but it does stop it from being mathematics as commonly taught, written and practiced in mathematical notation, which for all intents and purposes its the same.

That is, you're technically correct, which is the worst kind of correct.


This:

    public Node(SortedSet<T> left, T element, SortedSet<T> right) {
      this.left = left ;
      this.right = right ;
      this.element = element ;
Isn't really mathematics. Nodes have a label. Any record attached to the node is business logic. It doesn't matter if the tuple is (first, last, address, department) or (value left, right), these are not part of the mathematics. Sure (value, left, right) looks like math, but it's just a convenient place to cache the edges describing a particular form of graph.

A graph is defined mathematically G = {V, E}. A single vertex with no edges is a graph. A set of vertices with no edges is also a graph. These may or may not be interesting, but edges are dependent on vertices. The converse is not true.

This example with Node(value, left, right) is standard object oriented fair. It's worse than Car(model, manufacturer, color, year). The Car simplifies something complex. Node(value, left, right) makes something simple more complicated. It's an implementation not an abstraction.


The article seems to be not-so-subtly hinting that Java is an anti-math programming language, and I agree. The language encourages each developer to reinvent and reimplement key concepts in terms of "classes" that are much more succinctly described in languages that embrace abstract algebra. The example of using inheritance and dynamic dispatch to represent a union type, is a perfect example.


Union types is not a key concept in abstract algebra, I bet that most mathematicians have never even heard of it. I'd like to hear about its use in abstract algebra, how would you define the union type of the simplest of simple objects without breaking its operation: a group with a group?


The concept is called a coproduct. The coproduct of two sets is the disjoint union. The coproduct of two abelian groups is also called the direct sum [edited], and the category of general groups does not have coproducts. The coproduct of two types in the category of types is exactly the union type. The idea of an Abelian category (basically, a "nice" category) requires the existence of coproducts, which I think is widely used enough in universal algebra to be considered a key concept.

I wrote a long (and unfinished) series about how to interpret category theory with programs on my blog [0], and in [1] I cover universal properties and the coproduct, which describes how you would formulate a "union type" in any category.

[0]: http://jeremykun.com/ [1]: http://jeremykun.com/2013/05/24/universal-properties/


I didn't say that union types are a key concept in abstract algebra, and I'm not a mathematician by any means, although I vaguely understand that a disjoint union is a thing in set theory.

What I'm saying is that languages that do embrace concepts from abstract algebra, are much more expressive than languages that eschew these concepts (like Java).


That would be a groupoid - http://en.wikipedia.org/wiki/Groupoid


No, groupoids breaks the group operation and thus behaves like neither of the groups. Do type systems allow operations between union types? If not then it is just a simple interface in OO. If they do then you are no longer working with pure functions.


Node isn't a graph here. Not sure what it is, at is like a "cut" but is rather underspecified. But it's not a bad model of... Whatever it is modeling.


I think it's supposed to be a set, but as modeled by a red-black tree. So a graph, but not a general graph and with more metadata.


The article doesn't mention matrices, which map to multidimensional arrays. One of the basic features of FORTRAN, all the way back to the original version for the IBM 704 in 1954, was multidimensional arrays. Yet C, C++, Go, and Rust all lack them. They all have "arrays of arrays", or some hack involving macros and templates and a multiply for every array access. (Yes, there's a sort of multidimensional array capability in C99, but it is not used much.)

For Go and Rust, this subject has been discussed.[1][2] In both cases, the participants in the discussion got tangled up with how slicing syntax would interact with multidimensional arrays, and ended up doing nothing.

The number-crunching community is mostly using Matlab/Octave, and Python with NumPy.

[1] https://groups.google.com/forum/#!searchin/golang-nuts/Multi...

[2] http://internals.rust-lang.org/t/difficulty-with-rfc-439-and...


Maybe my brain isn't fully operational yet this morning, but I feel like this is backwards:

> X ⊂ Y.

> This subset relationship could be represented as inheritence in a language like Java or Python, if these sets are meant to be types:

> class X extends Y { ... }

If X extends Y, then isn't X a superset of Y, not a subset? I feel like it should be: class Y extends X { ... }


Mathematical constructions can be difficult to interpret as code or visualize. I have tried quite a few:

http://monsieurcactus.github.io/circles/122514-Lima%C3%A7ons...


I wont believe in functional programming as long as not even mathematicians prefer it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: