My experience when trying to give such tools to data scientists is that they can't use them effectively and will attempt to circumvent the idea of vector programming whenever they can get away with it.
They will also often write code that doesn't do what they really want but has the appearance of doing so.
Looks like a pretty neat reference to have when dabbling in J as well, which I prefer over Kona for several reasons, including that I can run it on Android.
I have a somewhat negative experience with Kona and its derivatives Kerf and Kerf2.
For example, among Kerf and Kerf2: - there is no explanation about why two versions exist; - the code does not build; - it has memory leaks; - both are abandoned; - there are dubious performance claims, but when I asked the author to clarify, he closed issues without answering.
The same experience with the K community: https://github.com/kparc
They maintain a cult-like impression of their software, and the examples are typically represented by a few code dumps with no clear way of using them.
That's not to dismiss the overall idea of array programming languages, which I admire.
It is a problem with this community. Until recently kdb+ was like this with barely any documentation. They've come a long way in recent years though on the website and there is at least one book published.
With regards to Kerf, I don't know much of the history. I think the Kona developer (Lawler?) and Scott Locklin tried to make a company like kdb+, but it never worked out.
I somehow doubt that is the actual URL, and doubly so since it's 404. I tried URL surgery into http://www.finnapl.fi/texts/Idiom.htm but that was also 404 and searching for "FinnAPL blue book" didn't cough up anythng
This is funny, I copy pasted the URL at the time I was writing and didn't notice that it spelled "Idiot". Anyway, that link is now dead and tosh and moonchild provided good alternates
Note that this is about K, the APL derivative language for the kdb database software.
Not to be confused with K, the rewrite based semantic execution language built by the University of Illinois Urbana-Champaign and now primarily built and maintained by Runtime Verification.
It should also be noted that the k4 implementation has evolved since Arthur left/sold to First Derivatives. Kdb 4.1 release touts some nice additions to q [1](syntactically I'm not sure if this cascades into underlying k... I don't have that depth of knowledge)
I searched in the book for the passage the GP mentioned. It is on page 2, the 6th step in solving an interview problem. The steps are outlined in the book, and the code snippet for each listed. Here they are:
1- Scanning an array x and finding the running maximum is expressed by |\x (read “max scan of x”), the highest altitude to the West.
2- Reversing an array is achieved by inserting a | (“reverse”) before the array. ||\|x (“reverse max scan of reverse x”) will give us the highest altitude to the East.
3- Comparing two vectors element-wise is done with the & (“min”) operation, this is the water level.
4- The element-wise subtraction is -. Adding parentheses to ensure correct sequence of evaluation, and brackets to make it a function that takes an argument, results in {((|\x)&||\|x)-x}. This is a function that returns the height of the column of water at every point.
5- Using this function, we can easily get derived results, for example for the maximum height we can use |/ (“max over”), resulting in |/{((|\x)&||\|x)-x}.
6- For total water we can use +/ (“sum over”), resulting in +/{((|\x)&||\|x)-x}
Maybe you find this satisfying, I find it horrendous.
Your Python solution is very deeply coupled with this specific problem, more than half of the code consists of names taken directly from the problem description. I don't think it is equivalent with the K solution. You could add names in a similar manner to it and they would be more equivalent.
For me a nice thing about the Iverson languages is that idiomatic programs tend to be mathematically general and explain the problem in more general terms than a naive, immediate solution in a more popular scripting language.
Doesn't work on empty arrays, whereas the K does and gives the reasonable result of 0 (for total water) and -inf for maximum height. Maybe it doesn't matter, but sensible defaults are a good thing!
Edit: didn't look at the error closely enough to see that the only thing that breaks is the maximum height. That's not that bad, but still not ideal imo.
The result may be reasonable, but this is just a fluke.
Numpy complains that: "ValueError: zero-size array to reduction operation maximum which has no identity".
This error makes sense. The only meaningful result for the maximum of an empty array would be minus infinity. It should not be zero. If it's zero, it means the programming language makes the silent assumption that the elements are non-negative.
I very much prefer to have the code throw back a meaningful error at me than a reasonably looking result, and then fail silently in a different situation.
Would you agree with the statement, "the maximum of an array should be contained in that array?" That seems like a reasonable constraint that most languages uphold.
No. I think the maximum of (x:xs) should be equal to max x (maximum xs). (Using haskell notation). Logically this implies maximum [] should be -infinity if it is defined. On finite lists, maximum is also the same as the supremum, and on infinite sets the supremum doesn't necessarily have to be contained in the set. So there's at least some precedent for not having the maximum in the list.
Going by your definition, max (x:xs) = max x (max xs) = max x (max (head xs) (tail xs)) = ... ad infinitum. The basic problem with that definition is that it only really defines max for non-empty lists, otherwise the pattern-matching against (x:xs) will throw an exception.
In mathematics maximum and minimum of a set are elements of said set. So the empty set has neither a maximum nor a minimum.
There is another pair of terms that obeys the rules you want max and min to obey: supremum and infimum: <https://en.wikipedia.org/wiki/Infimum_and_supremum>. And indeed supremum of the empty set is -inf. And infimum of the empty set is +inf. That’s because supremum and infimum of a set don’t need to belong to said set. They only need to belong to the set of bounds (upper for supremum; lower for infimum) of said set.
That definition strikes me as intellectually satisfying but less useful for engineering software systems around (for the same reasons as the other commenter). But maybe it's more appropriate in the domains K is used for, I wouldn't know.
Adding to RodgerTheGreat's comment: for me, k's power is its rich set of adverbs that are convenient to write.
Notice that aside from reverse (monadic |), all verb symbols in the above code are in most other languages (max as |, min as &, plus as +, minus as -). Adverbs (such as / for reduce and \ for scan-reduce) make the symbols we already have more generally applicable.
I think a language could go a long way with a small set of basic arithmetic symbols alongside a rich set of adverb symbols.
And perhaps further worth noting, the reason K uses dyadic & and | for minimum and maximum (respectively) is because logical AND and logical OR are identical to minimum and maximum if the arguments happen to be boolean (0/1). The notation and selection of primitives illustrate a useful and memorable symmetry.
Just for reference, and since I work with Trino these days.
The benefit it gives me is that I almost never have to care about RAM or CPU. It just scales in all directions.
The negative compared to vector languages is that order is not guaranteed (as can be seen)
And the other negative of course is that it is not possible to write generalized functions.
I am not saying it is better, but I'll leave it here.
And yeah, it does not work with empty arrays. But this is my lunch break!
WITH measurements AS
(
SELECT index, m
FROM UNNEST (ARRAY[3, 2, 5, 5, 3, 6, 5, 7, 2, 9, 2, 6, 1, 7, 2, 6, 9, 1, 1, 4, 2, 9, 2]) WITH ORDINALITY as t(m, index)
),
tmp AS
(
SELECT index,
m,
LEAST(MAX(m) OVER ( ORDER BY index), MAX(m) OVER( ORDER BY index DESC)) - m as water_level
FROM measurements
)
SELECT MAX(water_level) as max_level,
SUM(water_level) as total_water
FROM tmp
Why dont the symbols just have names instead of stuff like ||\x.
I assume it gets easy to read once you practiced it a bunch but I feel like its such a hindrance to readability/popularity for the sake of appearing simple (look we can reverse a list with a single character!)
If you want a serious answer, as a k programmer ||\|x (which I assume your example is based on) is instantly readable as producing a bitmask of ones up to the last. ie 0 1 0 0 1 0 -> 1 1 1 1 1 0. Or a max scan from the right on integer arrays. I don't know how to even describe that in English succinctly, let alone most mainstream programming languages. (Haskell - reverse.scanl1(||).reverse - ok, not too bad)
You've sort of answered your own question. None of those three would obviously mean the same thing to me. And that's just one possible combination. If you consider all possible combinations of 4 symbols in K, most of them will have a distinct but obvious (and useful) meaning that is extremely hard to summarise in a single word.
The symbols do have names, eg | is commonly called a ‘vertical bar’ or ‘pipe’. I think when people read the code though, they say the name of the operator instead, e.g. your example is ‘reverse max scan x’. A complication is that the meaning of a symbol depends on the surrounding symbols (eg | can be reverse or max) and another is that the meaning can depend on context, eg | is max but when the inputs are arrays of bits (where 1 is true and 0 is false), max is equivalent to logical or, and so people may call the operator ‘or’ in those contexts.
I think advocates would claim that the symbols allow for faster interaction with the computer and for larger programs to fit in one’s working memory.
This always comes up in discussions about APL, J, and K.
Have you ever tried reading Euclid's "The Elements"? It's all prose and appreciably less clear than the modern algebraic formulations. Of course, that's only because we're all already familiar with the algebra, but once you know both notations, the terse symbolic one is an obvious win on readability and clarity.
And it's just not that hard to learn some tens of symbols. In practice, languages like Rust and whatnot require you to learn orders of magnitude more words, which require you to learn new mental models to understand what the mnemonic names mean anyway. The "readability" is really just smoke and mirrors, IMHO.
Once you know APL or K, then clusters of "unreadable" symbols become immediately recognizable and, frankly, stupidly straightforward. And to top it off, instead of some opaque identifier, the "name" in APL is usually the entire implementation! That empowers you to make variations as needed, reason about performance, and observe meta-patterns between names. Those are higher-level cognitive tasks that the symbols make much more legible than is possible with "readable names" everywhere.
In our software development industry, the word "readability" is mostly just code for "familiarity to me".
I agree with you for the most part. Brainfuck certainly would be way more readable if you gave a collection of symbols names and used those instead.
But thats mostly because base level constructs of brainfuck are too low level. Nonetheless pragmatically, what we're already familiar with is part of the equation. If you wrote a language where + actually meant - then surely youd consider this to be harmful to the adoption of the language.
Theres also the fact that the time it takes to learn these symbols is multiplied by ever person learning it, so if youre going to introduce new constructs/symbols/mental mappings, it has to be worth the tradeoff. If the tradeoff is just less characters to type imo its not worth it.
If one language has say "sort(x,y,z)" and another says "!" means sort, I just dont think that's particularly interesting
many of K's symbols are, in fact, familiar. the dyads + - * & | < > = have meanings that should be immediately obvious to most programmers, even if they don't initially appreciate how general the K versions of these primitives are.
some symbols have simple mnemonic associations to operations that are common; the monad # is "count", the dyad @ is "at-index", the dyad ? is "find".
primitives have enlightening symmetries that simplify memorization. the adverb / ("over") has a twin that captures a trace of intermediate steps, \ ("scan"). all valences of "over" have corresponding scans.
overall the set of primitive operations is very carefully chosen such that simple combinations fit together in many useful ways. more than simply saving keystrokes- which has benefits for interactively exploring data and algorithms- k provides an efficient way of thinking about and communicating a large range of algorithmic ideas to other k programmers.
why does Chinese or Japanese have symbols? Why does mathematics have symbols? Symbols represent concepts. Once you know what a symbol means you can grok a lot from it. They are limited by using keyboard characters, if the authors had their way, they would have a custom keyboard with custom symbols, but then most people will not be able to use it with any computer, so they stuck to such characters. The idea is notation as a tool of thought. You don't have to think of chunks of words to think of reversing a list. Yet, that's how we often think when we program, we read keywords or list expressions or loops and run the code in our head which slows comprehension.
I assume it gets easy to read once you practiced it a bunch
Yup, but for me at least, the learning curve is far gentler than a supposedly simple language like C or Go. After using C professionally for 12 years I still have to look up operator precedence or use cdecl from time to time. Meanwhile I feel like I can store all of K in my head with room to spare.
[0] https://github.com/kparc/ksimple [1] https://shakti.com