Kind of off topic, but I am curious: is this one of the advantages of functional programming - the notion that you can prove or disprove certain things about the code and therefore optimize the compiler, based on such proofs, to your heart's content?
The more advanced of a type system you have, the more you can optimize indeed. A nice example of this is the zip function.
In Haskell:
zip :: [a] -> [b] -> [(a, b)]
zip [] _ = []
zip _ [] = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys
Note how each iteration needs to check both lists for emptiness.
In a more advanced type system (or using more advanced type hackery in Haskell) you can have length-indexed lists. That is: lists that have 2 type parameters or "indexes" instead of 1. The 1 is usually the type of element inside the list, the extra 1 is a natural number indicating the length of the list.
So the function zip becomes something like:
zip :: List N a -> List N b -> List N (a, b)
zip [] [] = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys
The compiler checks the code and verifies, at compile-time, that both the input lists and the resulting list are all of the same length. This also means that you only need one runtime emptiness check instead of 2. If the N is concretely known at compile-time, you may need 0 (though that case might be caught by inlining/loop unrolling optimizations anyway).
In an unsafe language like C or C++, you would just assert that the two containers are the same length and use parallel iterators. The type system here lets you optimize away some of the overhead of using a safe language, but it doesn't let you write code that is more optimized than the obvious unsafe code, at least in this example.
Also, in this example you're using a singly linked list, which is usually a performance loss compared to a more appropriate data structure. CPUs are designed to favor contiguous arrays.
> In an unsafe language like C or C++, you would just assert that the two containers are the same length and use parallel iterators. The type system here lets you optimize away some of the overhead of using a safe language, but it doesn't let you write code that is more optimized than the obvious unsafe code, at least in this example.
Well, you can choose any two of: 1. Fast (less run-time), 2. Safe (cannot crash), 3. Simple types (None of the advanced type hackery). Many choose 1+3 or 2+3, but advanced types let you choose 1+2 which in many cases is remarkable and somewhat surprising that it is even possible.
Also note that runtime checks, in many cases, do not give you safety. For example, normally, you have an array indexing operation like:
readArray :: Array a -> Int -> IO a
Ignore the "IO" if you are unfamiliar with Haskell's IO type. This is unsafe whether or not you have index bounds checking. A runtime check will only convert one kind of error (corruption/segfault) to another (runtime exception).
In these (common cases), your only option of getting safety is advanced type hackery. Something like:
index :: (size:N) -> Array size a -> Fin size -> a
where Fin N is the type of integer ranged between 0 and (N-1).
> Also, in this example you're using a singly linked list, which is usually a performance loss compared to a more appropriate data structure. CPUs are designed to favor contiguous arrays
Nothing about this type hackery is specific to singly linked lists, and advanced type hackary is applicable for pretty much any data structure. Also note that the singly linked lists used in zip may not actually be represented by pointer-chasing singly-linked lists. They may be "fused" together into efficient loops that process the input directly.
> Well, you can choose any two of: 1. Fast (less run-time), 2. Safe (cannot crash), 3. Simple types (None of the advanced type hackery). Many choose 1+3 or 2+3, but advanced types let you choose 1+2 which in many cases is remarkable and somewhat surprising that it is even possible.
I don't think it's really that surprising that it's possible. Most programs don't rely on any deep mathematical properties for their correctness, so the ability to formalize a proof that they are correct and encoding it into a type system doesn't surprise me at all. If it is possible to do it on a large program without maintenance or scalability problems, then I will be more impressed.
You also present a false trichotomy. Haskell is not actually a completely 'safe' language; it still has exceptions that can cause a program to "go wrong" and terminate unexpectedly. You can write 'fast' code with it, but it isn't as fast as what I could write in a lower-level language.
> Nothing about this type hackery is specific to singly linked lists, and advanced type hackary is applicable for pretty much any data structure. Also note that the singly linked lists used in zip may not actually be represented by pointer-chasing singly-linked lists. They may be "fused" together into efficient loops that process the input directly.
As the data structures get further away from algebraic data types, the type hackery gets more and more involved. If you start working with arbitrary mutable data structures that arise in practice, the type hackery becomes a topic for a PhD thesis rather than something usable in practical programming today.
I don't know of any compiler that actually does a reasonable job converting uses of data structures like lists and association lists into arrays and hash tables in general. There are compilers (like GHC) that handle special cases but fall over beyond that. I think this falls into the territory of the "sufficiently smart compiler" fallacy. Also, making data contiguous is just the simplest of a common set of data representation optimizations. Does any compiler automatically convert a list into an intrusive doubly linked list or eliminate the use of a visited stack for DFS by stealing a bit from vertices or reversing pointers?
> Haskell is not actually a completely 'safe' language; it still has exceptions that can cause a program to "go wrong" and terminate unexpectedly. You can write 'fast' code with it, but it isn't as fast as what I could write in a lower-level language.
I never meant to imply Haskell was a "completely safe" language. That is an impossibility, even totality does not imply complete safety. My zip example is actually a point against Haskell (as length-indexed lists are not yet actually in use in the Haskell eco-system).
Haskell can encode low-level programs that are almost C-level, and in that (ugly) style can probably reach the performance you can with lower-level languages (excluding hand-optimized assembly, perhaps).
> As the data structures get further away from algebraic data types, the type hackery gets more and more involved. If you start working with arbitrary mutable data structures that arise in practice, the type hackery becomes a topic for a PhD thesis rather than something usable in practical programming today.
Mutable data structures do not have to "arise in practice". You can have pure semantics with mutable performance (e.g: Clean's uniqueness types, or Haskell's ST).
Lots of people's PhD thesis are in actual use in practical programming today. A PhD thesis often discovers a technique and makes it accessible for real world programs today.
> I don't know of any compiler that actually does a reasonable job converting uses of data structures like lists and association lists into arrays and hash tables in general. I think this falls into the territory of the "sufficiently smart compiler" fallacy.
Converting these is not a good idea because you would change the complexities of the code (which is, IMO, the heart of the fallacy). A prepend of an a-list or a list is O(1), but to a list or hash table, it is (in the worst-case) worse.
But what Haskell's primary compiler, GHC, does do, is fuse together list processing such that lists can disappear altogether into efficient loops.
As for arrays and hash tables, these are not amenable to efficient "pure" modification (unless you use uniqueness types as in Clean), but they have reasonable alternatives that are pure and persistent: Sequence allows O(logN) indexing, O(1) amortized (O(logN) worst-case) prepend/append. Tries and search trees allow for quick lookups and quick pure modification. These are easy to reason about.
> Also, making data contiguous is just the simplest of a common set of data representation optimizations.
This conflicts with O(1) prepend. If you want contiguous allocation, you can use different data structures.
> Does any compiler automatically convert a list into an intrusive doubly linked list or eliminate the use of a visited stack for DFS by stealing a bit from vertices or reversing pointers?
If you need to remove/insert items in the middle of a list, you would probably not use a linked list. So a doubly linked is not a good optimization to apply automatically.
If your point is that code with simpler mathematical semantics that is easy to reason about might cause a constant hit on performance in some cases -- then I agree. You cannot always have simplicity, verifiability, and speed. But often you can.
Also note that purity affords many optimizations that are unavailable in non-pure languages (e.g: rewrite rules like: map f . map g --> map (f . g)).
O(1) prepend doesn't conflict with good contiguity at all. Deque is an useful data structure with (possibly amortised) O(1) prepend and append, O(1) indexing and excellent contiguity.
This is interesting, but the two versions of zip you present are not equivalent. The standard version can handle lists of different lengths; it effectively truncates the longer list to the shorter one's length. Can you get this behavior in the second version and still have just one emptiness check?
I would guess that the truncation of the ordinary zip is not because it is useful in that context, but because it makes the zip function total.
If you wanted truncation and had no type-level knowledge about a relationship between the list lengths, you would explicitly compose list truncation of the longer list with the zipping.
But if you have some sort of known relationship between the lengths of the lists, not just equality of the lengths (as specified above), you can avoid the check.
For example:
zipTruncate :: List (1+ N) a -> List N b -> List N (a, b)
zipTruncate _ [] = []
zipTruncate (x:xs) (y:ys) = (x, y) : zipTruncate xs ys
This also has just one check, and is verified safe by the compiler, and truncates the single extra element from the first given list.
With dependent types, you can even say something like:
zipTruncate :: List (someFunc N) a -> List N -> List N (a, b)
But I am no expert on dependent types, and do not know what can/cannot be done with application of functions at the type-level.
I would guess that the truncation of the ordinary zip is not because it is useful in that context, but because it makes the zip function total.
I used to think the same way, but then I ran into a couple of cases where it was actually useful (it was in Python, where zip has the same behavior).
If you wanted truncation and had no type-level knowledge about a relationship between the list lengths, you would explicitly compose list truncation of the longer list with the zipping.
Yes, but I guess that would be less efficient than the standard version of zip. You would at least have to check which list is longer, no?
But if you have some sort of known relationship between the lengths of the lists...
Special cases often enable better optimizations, but I'm asking about the general case.
So I guess the answer is that one can't use these types to implement the standard zip. On the other hand, if you assume the lists are of equal lengths then you can avoid the extra check in standard Haskell:
> I used to think the same way, but then I ran into a couple of cases where it was actually useful (it was in Python, where zip has the same behavior).
Then there's little reason not to explicitly truncate...
> Yes, but I guess that would be less efficient than the standard version of zip. You would at least have to check which list is longer, no?
A simple function like:
truncate :: List N a -> List M a -> (List (min N M) a,
List (min N M) a)
composed with zip, can be fused into the same original function as truncating zip -- there is no reason for it to be more expensive.
> Special cases often enable better optimizations, but I'm asking about the general case.
Note these same special cases will not yield any optimizations if you cannot encode that special case into the type.
> If the lengths are different this will give an error, although it's a runtime error so it's not as good as your version.
Actually, not having the pattern match for the empty/non-empty and non-empty/empty cases does not mean there is no runtime check.
You pay both with partiality (some inputs will fail at runtime) and speed (failed pattern matches still get checked at runtime). Dependant-typed zip gives you both of these back.
The relevant element of FP is statelessness. That said, statelessness could easily be considered a leaky abstraction because, at some point, the machine you are working on has state. Your machine has CPU registers and memory and such, so the entire "statelessness" of FP is subject to the underlying system-level implementation.
> The relevant element of FP is statelessness. That said, statelessness could easily be considered a leaky abstraction because, at some point, the machine you are working on has state. Your machine has CPU registers and memory and such, so the entire "statelessness" of FP is subject to the underlying system-level implementation.
It is only leaky if you are exposed to the statefulness of the underlying machine.
Haskell's pure code does enjoy the abstraction of purity without exposing the non-pure primitives from which it is exposed. Haskell's non-pure code is marked as such, and not supposed to be abstracted away from the nature of the underlying machine. I'm not sure what abstraction leak you're referring to. Can you give an example?
More generally, the fact that execution time and memory usage (tail recursion!) sometimes depend on how well the compiler can turn your stateless algorithm into a stateful one, in ways that don't make sense from the perspective of the abstract machine.
Debug.Trace explicitly breaks referential transparency and its entire purpose is to penetrate the evaluation order out of the abstraction. It's a tool to penetrate the abstraction, much like a debugger. But the existence of such debugging tools does not mean the abstractions themselves are leaky, because these tools do not leak the abstraction as a side-effect, it is their only purpose.
Performance analysis of pure algorithms is not a leak of the abstraction, if you consider the abstraction relates to the semantics, the result of the algorithm, and not how much resources it takes to execute.
If you wish to know the resource use of your pure code, then none of the nice abstractions hold anymore, but most of the time, you don't need to.
I don't think what you're saying is intrinsic to FP. There are static analysis tools for non-functional languages. At the end of the day, everything is getting turned into an abstract syntax tree, so style should be irrelevant.
Do you mean pure (side-effect free) functional languages?
It's more an advantage of very-specifically designed languages like Haskell, some of which happen to be functional because part of the reason they can be specified in so much detail is because of the lack of side effects.
To a big part the ability to optimize stems form a static and very detailed type system.
In Haskell you happen to know about statelessness from the types, which is an additional bonus. But other statically type languages like Eiffel are also pretty well optimizable.
Is it just me or the LLVMs idea that "If arithmetic on an 'int' type (for example) overflows, the result is undefined.. For example, knowing that INT_MAX+1 is undefined allows optimizing "X+1 > X" to "true"." is incredibly stupid? Who said that undefined number is bigger than a defined one ??
It is not that the result of operation is undefined, the operation of INT_MAX+1 itself is undefined(actually the entire program becomes undefined when you do it), so the compiler can do whatever it pleases. According to the C standard, INT_MAX+1 might as well lead to formatting of your hard drive. So, it is safe to assume that x!=INT_MAX and that x+1>x for all x, because if x=INT_MAX your program is undefined, so compiler can do whatever it wants, including setting it to true or outputting an error or formatting the hard drive.
This kind of logic is the exact cause to the optimization problem described. We all know that in practice MAX_INT+1 is negative, and that many "non-standard" programs depends on it, but still, the optimizer is sticking to the C standard that no one follows just because it gives it a nice optimization chance. Bad..
No, people like you are the cause of the problem described. The whole point of the article is that "we all know that in practice MAX_INT+1 is negative" is wrong.
Apologies for the confrontational tone, but someone making that kind of mistake out of ignorance is understandable. Insisting on wrong thinking after having it explained in detail why it's wrong, isn't.
If you can't live with the fundamental design principles behind C/C++ then the correct approach is to not use them. Don't ask the compiler writers to go against those principles.
"Bad"? You use C because you want speed above all else. If you don't know the language and need hand-holding, don't use C. Compilers are going to optimize C -- that's why it usually runs so fast.
This is bad because the optimizer ignores the imperfect reality about the big crowd of non standard programs. It actually punishes non standard programs (and probably the majority of programs out there are not 100% standard). So it is a bad and patronizing optimization
How many programs are there that depend on SIGNED overflow wrapping around to negative? I can't think of very many - the only places I've seen potential signed overflow that wasn't a bug was when people were checking for overflow.
In those cases, just use unsigned math to do the checks.
So how about a hypothetical architecture that instead of using two's complement, stores negative numbers with a negative flag bit for some reason? Would MAX_INT+1 still be negative on such an architecture?
"We all know that in practice" type assumptions are exactly what make code difficult to port and/or maintain and developers should really stay away from them. Or at least, if you really want to make such assumptions (for example, for performance reasons), make sure that that (1) you know exactly what you're doing and (b) test for their validity at compile time or runtime and provide fallback code that always does the "right" thing.
the way to write good solid code is to use more than 2 compiler, and optimisation etc. i mostly write code that runs on at least 3 different os, and 2 different hardware. ... and it just works. no prog-lang can be faulted for bad programming using one. learn the lang, and use it. :-)
gcc still gets the pointer dereference and assignment to what a pointer points to, due to deferring assignments. these edges, idioms of any lang needs to be looked at with academic eyes. :-)
yes, random bits flipped on good input. yes, i run anything that can help me and covers the problem. nowadays, 32 and 64 bit are what i write code for. oh, the days of 16bit ints on win3.11 for workgroups are gone. :-)
Instead, consider it from the point of view of probability.
Even if probability of any single programmer writing an exceedingly stupid piece of code is vanishingly small, the number of programmers writing new code daily is so big, that mistakes of vanishingly small probability end up being quite common indeed.
To given a real example: consider that Apple undoubtably employs only the finest programmers for their OS group and yet every iOS release is jailbroken quite quickly due to a piece of code which had a security bug.
Even within a much smaller scope of a single application, all web browser from Microsoft, Apple, Mozilla (and sometimes Google) are regularly compromised via security bugs despite huge investments from all those companies into security.
Toss in the fact that many programmers actually like doing things in unnecessarily clever or novel ways and inevitably the clever code is more likely to be buggy than boring code.
Bugs and crappy code is a statistical certainty, especially in languages like C and C++ that give you so many ways to shoot yourself in the foot.
Many of the strange things that happen in C language edge cases come about from macro expansion or code generation.
Unrelated to the issue, but the superfluous parenthesis added in the second example are interesting. I think every C coder has a level of comfort with the 15 sets in the operator precedence rules, and beyond that they throw parenthesis at it. I'm fine with the rules until I use a <<, >>, &, ^, or |[1]. Then I drag out the parenthesis.
[1] I swear the positions of the ones I named are placed alphabetically rather than in relation to their mathematical function.
Working within the "contrived example" framework, it would have made a little more sense if buffer were a pointer rather an array, in which case you wouldn't be able to use "sizeof".
Precisely. This optimization is possible because according to ANSI C, a variable is allowed to point to an address anywhere within an object, or at most one (sizeof) past an object. Anything else is undefined behavior (what they refer to as as "illegal pointer" in the post).
Therefore, buffer through buffer+1 would be valid pointers, but unless they can prove that i-128 is <=1 they can safely assume that the final pointer can be more than 1 past the buffer.
If it was just a pointer (not an array) they could do the same analysis if the pointer was mallocced right then in the same function with a constant size. More likely though, the pointer would be a function argument and then they would not know where the object boundaries were without doing interprocedural analysis (which I assume is outside of the scope of this post) and some other additional propagation (so technically within the C99 spec, but I doubt you would see an optimization that actually would do that).
My thoughts exactly. I figured maybe there was some other crap in the ... that necessitated the convoluted 2nd part of the || statement, but I couldn't think of anything that would.
The OP has since changed the article but IIRC `buffer` was an array of `int`s, in which case the `sizeof` operator would return 512. Generally you'd want `(i >= sizeof(buffer)/sizeof(buffer[0]))`.