Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learning K programming: idiom by idiom [pdf] (nsl.com)
100 points by tosh on May 12, 2024 | hide | past | favorite | 54 comments


Note that there is [0] but Arthur/Shakti[1] are also working on k/fun/junior which should be open source and about 25kb binary.

[0] https://github.com/kparc/ksimple [1] https://shakti.com


My experience when trying to give such tools to data scientists is that they can't use them effectively and will attempt to circumvent the idea of vector programming whenever they can get away with it.

They will also often write code that doesn't do what they really want but has the appearance of doing so.


Sounds like a problem with the term “data scientist” to me.


Is it? The nature of their work is that it's more about reaching the conclusion about what the data says rather than build nice reusable software.


Sounds like your data scientists are not big on R.


Looks like a pretty neat reference to have when dabbling in J as well, which I prefer over Kona for several reasons, including that I can run it on Android.

https://www.jsoftware.com/


I have a somewhat negative experience with Kona and its derivatives Kerf and Kerf2.

For example, among Kerf and Kerf2: - there is no explanation about why two versions exist; - the code does not build; - it has memory leaks; - both are abandoned; - there are dubious performance claims, but when I asked the author to clarify, he closed issues without answering.

The same experience with the K community: https://github.com/kparc They maintain a cult-like impression of their software, and the examples are typically represented by a few code dumps with no clear way of using them.

That's not to dismiss the overall idea of array programming languages, which I admire.


It is a problem with this community. Until recently kdb+ was like this with barely any documentation. They've come a long way in recent years though on the website and there is at least one book published.

With regards to Kerf, I don't know much of the history. I think the Kona developer (Lawler?) and Scott Locklin tried to make a company like kdb+, but it never worked out.

Dyalog APL and J both have decent doc.



> http://www.finnapl.fi/texts/Idiot.htm

I somehow doubt that is the actual URL, and doubly so since it's 404. I tried URL surgery into http://www.finnapl.fi/texts/Idiom.htm but that was also 404 and searching for "FinnAPL blue book" didn't cough up anythng




This is funny, I copy pasted the URL at the time I was writing and didn't notice that it spelled "Idiot". Anyway, that link is now dead and tosh and moonchild provided good alternates


Note that this is about K, the APL derivative language for the kdb database software.

Not to be confused with K, the rewrite based semantic execution language built by the University of Illinois Urbana-Champaign and now primarily built and maintained by Runtime Verification.

https://kframework.org/


KDB implements k3. By contrast, Arthur's latest endeavor, Shakti, is k9.


KDB+ provides Q, which is implemented in terms of k4.


It should also be noted that the k4 implementation has evolved since Arthur left/sold to First Derivatives. Kdb 4.1 release touts some nice additions to q [1](syntactically I'm not sure if this cascades into underlying k... I don't have that depth of knowledge)

1 https://kx.com/blog/discover-kdb-4-1s-new-features/


> For total water we can use +/ (“sum over”), resulting in +/{((|\x)&||\|x)-x}.

And they say Perl is bad.


Perhaps you could grace us with your clearest alternative in a language you approve of?


I searched in the book for the passage the GP mentioned. It is on page 2, the 6th step in solving an interview problem. The steps are outlined in the book, and the code snippet for each listed. Here they are:

1- Scanning an array x and finding the running maximum is expressed by |\x (read “max scan of x”), the highest altitude to the West.

2- Reversing an array is achieved by inserting a | (“reverse”) before the array. ||\|x (“reverse max scan of reverse x”) will give us the highest altitude to the East. 3- Comparing two vectors element-wise is done with the & (“min”) operation, this is the water level.

4- The element-wise subtraction is -. Adding parentheses to ensure correct sequence of evaluation, and brackets to make it a function that takes an argument, results in {((|\x)&||\|x)-x}. This is a function that returns the height of the column of water at every point.

5- Using this function, we can easily get derived results, for example for the maximum height we can use |/ (“max over”), resulting in |/{((|\x)&||\|x)-x}.

6- For total water we can use +/ (“sum over”), resulting in +/{((|\x)&||\|x)-x}

Maybe you find this satisfying, I find it horrendous.

Here's the implementation in python/numpy:

    west_scan   = np.maximum.accumulate(A)
    east_scan   = np.maximum.accumulate(A[::-1])[::-1]
    water_level = np.minimum(west_scan, east_scan)
    height_water_column = water_level - A
    max_height  = np.max(height_water_column)
    total_water = np.sum(height_water_column)


Your Python solution is very deeply coupled with this specific problem, more than half of the code consists of names taken directly from the problem description. I don't think it is equivalent with the K solution. You could add names in a similar manner to it and they would be more equivalent.

For me a nice thing about the Iverson languages is that idiomatic programs tend to be mathematically general and explain the problem in more general terms than a naive, immediate solution in a more popular scripting language.


Doesn't work on empty arrays, whereas the K does and gives the reasonable result of 0 (for total water) and -inf for maximum height. Maybe it doesn't matter, but sensible defaults are a good thing!

Edit: didn't look at the error closely enough to see that the only thing that breaks is the maximum height. That's not that bad, but still not ideal imo.


I have to emphatically disagree here.

The result may be reasonable, but this is just a fluke.

Numpy complains that: "ValueError: zero-size array to reduction operation maximum which has no identity".

This error makes sense. The only meaningful result for the maximum of an empty array would be minus infinity. It should not be zero. If it's zero, it means the programming language makes the silent assumption that the elements are non-negative.

I very much prefer to have the code throw back a meaningful error at me than a reasonably looking result, and then fail silently in a different situation.


I agree that the maximum of an empty array should be -infinity. As does K! (numpy does not)


Would you agree with the statement, "the maximum of an array should be contained in that array?" That seems like a reasonable constraint that most languages uphold.


No. I think the maximum of (x:xs) should be equal to max x (maximum xs). (Using haskell notation). Logically this implies maximum [] should be -infinity if it is defined. On finite lists, maximum is also the same as the supremum, and on infinite sets the supremum doesn't necessarily have to be contained in the set. So there's at least some precedent for not having the maximum in the list.


Going by your definition, max (x:xs) = max x (max xs) = max x (max (head xs) (tail xs)) = ... ad infinitum. The basic problem with that definition is that it only really defines max for non-empty lists, otherwise the pattern-matching against (x:xs) will throw an exception.

In mathematics maximum and minimum of a set are elements of said set. So the empty set has neither a maximum nor a minimum.

There is another pair of terms that obeys the rules you want max and min to obey: supremum and infimum: <https://en.wikipedia.org/wiki/Infimum_and_supremum>. And indeed supremum of the empty set is -inf. And infimum of the empty set is +inf. That’s because supremum and infimum of a set don’t need to belong to said set. They only need to belong to the set of bounds (upper for supremum; lower for infimum) of said set.


It's not ad infinitum on finite lists though, that's the point. If you want that property to hold, then maximum [] has to be -inf.


That definition strikes me as intellectually satisfying but less useful for engineering software systems around (for the same reasons as the other commenter). But maybe it's more appropriate in the domains K is used for, I wouldn't know.


if you insist on verbose names, they're always an option in K:

    west_scan           : |\   A
    east_scan           : ||\| A
    water_level         : west_scan & east_scan
    height_water_column : water_level - A
    max_height          : |/ height_water_column
    total_water         : +/ height_water_column


Adding to RodgerTheGreat's comment: for me, k's power is its rich set of adverbs that are convenient to write.

Notice that aside from reverse (monadic |), all verb symbols in the above code are in most other languages (max as |, min as &, plus as +, minus as -). Adverbs (such as / for reduce and \ for scan-reduce) make the symbols we already have more generally applicable.

I think a language could go a long way with a small set of basic arithmetic symbols alongside a rich set of adverb symbols.


And perhaps further worth noting, the reason K uses dyadic & and | for minimum and maximum (respectively) is because logical AND and logical OR are identical to minimum and maximum if the arguments happen to be boolean (0/1). The notation and selection of primitives illustrate a useful and memorable symmetry.


Thank you!


Just for reference, and since I work with Trino these days.

The benefit it gives me is that I almost never have to care about RAM or CPU. It just scales in all directions.

The negative compared to vector languages is that order is not guaranteed (as can be seen)

And the other negative of course is that it is not possible to write generalized functions.

I am not saying it is better, but I'll leave it here.

And yeah, it does not work with empty arrays. But this is my lunch break!

  WITH measurements AS
  (
      SELECT index, m
        FROM UNNEST (ARRAY[3, 2, 5, 5, 3, 6, 5, 7, 2, 9, 2, 6, 1, 7, 2, 6, 9, 1, 1, 4, 2, 9, 2]) WITH ORDINALITY as t(m, index)
  ),
  tmp AS
  (
    SELECT index, 
        m,
        LEAST(MAX(m) OVER ( ORDER BY index), MAX(m) OVER( ORDER BY index DESC)) - m as water_level
    FROM measurements
  )
  SELECT MAX(water_level) as max_level, 
          SUM(water_level) as total_water
  FROM tmp


It's one of them "write-only" languages :)


Why dont the symbols just have names instead of stuff like ||\x.

I assume it gets easy to read once you practiced it a bunch but I feel like its such a hindrance to readability/popularity for the sake of appearing simple (look we can reverse a list with a single character!)


If you want a serious answer, as a k programmer ||\|x (which I assume your example is based on) is instantly readable as producing a bitmask of ones up to the last. ie 0 1 0 0 1 0 -> 1 1 1 1 1 0. Or a max scan from the right on integer arrays. I don't know how to even describe that in English succinctly, let alone most mainstream programming languages. (Haskell - reverse.scanl1(||).reverse - ok, not too bad)


I dont see why one cant have both. Give it names and translate them down to the base symbols so it has a readable syntax.

Id just call it something like exceptLast, lastOf, or endOff or something.


You've sort of answered your own question. None of those three would obviously mean the same thing to me. And that's just one possible combination. If you consider all possible combinations of 4 symbols in K, most of them will have a distinct but obvious (and useful) meaning that is extremely hard to summarise in a single word.


You can give them names if you want, and Q does so.

However doing so looses the ease of recognition and the malleablity the symbols provide.


The symbols do have names, eg | is commonly called a ‘vertical bar’ or ‘pipe’. I think when people read the code though, they say the name of the operator instead, e.g. your example is ‘reverse max scan x’. A complication is that the meaning of a symbol depends on the surrounding symbols (eg | can be reverse or max) and another is that the meaning can depend on context, eg | is max but when the inputs are arrays of bits (where 1 is true and 0 is false), max is equivalent to logical or, and so people may call the operator ‘or’ in those contexts.

I think advocates would claim that the symbols allow for faster interaction with the computer and for larger programs to fit in one’s working memory.


This always comes up in discussions about APL, J, and K.

Have you ever tried reading Euclid's "The Elements"? It's all prose and appreciably less clear than the modern algebraic formulations. Of course, that's only because we're all already familiar with the algebra, but once you know both notations, the terse symbolic one is an obvious win on readability and clarity.

And it's just not that hard to learn some tens of symbols. In practice, languages like Rust and whatnot require you to learn orders of magnitude more words, which require you to learn new mental models to understand what the mnemonic names mean anyway. The "readability" is really just smoke and mirrors, IMHO.

Once you know APL or K, then clusters of "unreadable" symbols become immediately recognizable and, frankly, stupidly straightforward. And to top it off, instead of some opaque identifier, the "name" in APL is usually the entire implementation! That empowers you to make variations as needed, reason about performance, and observe meta-patterns between names. Those are higher-level cognitive tasks that the symbols make much more legible than is possible with "readable names" everywhere.

In our software development industry, the word "readability" is mostly just code for "familiarity to me".


I agree with you for the most part. Brainfuck certainly would be way more readable if you gave a collection of symbols names and used those instead.

But thats mostly because base level constructs of brainfuck are too low level. Nonetheless pragmatically, what we're already familiar with is part of the equation. If you wrote a language where + actually meant - then surely youd consider this to be harmful to the adoption of the language.

Theres also the fact that the time it takes to learn these symbols is multiplied by ever person learning it, so if youre going to introduce new constructs/symbols/mental mappings, it has to be worth the tradeoff. If the tradeoff is just less characters to type imo its not worth it.

If one language has say "sort(x,y,z)" and another says "!" means sort, I just dont think that's particularly interesting


many of K's symbols are, in fact, familiar. the dyads + - * & | < > = have meanings that should be immediately obvious to most programmers, even if they don't initially appreciate how general the K versions of these primitives are.

some symbols have simple mnemonic associations to operations that are common; the monad # is "count", the dyad @ is "at-index", the dyad ? is "find".

primitives have enlightening symmetries that simplify memorization. the adverb / ("over") has a twin that captures a trace of intermediate steps, \ ("scan"). all valences of "over" have corresponding scans.

overall the set of primitive operations is very carefully chosen such that simple combinations fit together in many useful ways. more than simply saving keystrokes- which has benefits for interactively exploring data and algorithms- k provides an efficient way of thinking about and communicating a large range of algorithmic ideas to other k programmers.


That makes a lot of sense. I was being overly skeptical, Ill have to spend time on learning it before criticizing it.


why does Chinese or Japanese have symbols? Why does mathematics have symbols? Symbols represent concepts. Once you know what a symbol means you can grok a lot from it. They are limited by using keyboard characters, if the authors had their way, they would have a custom keyboard with custom symbols, but then most people will not be able to use it with any computer, so they stuck to such characters. The idea is notation as a tool of thought. You don't have to think of chunks of words to think of reversing a list. Yet, that's how we often think when we program, we read keywords or list expressions or loops and run the code in our head which slows comprehension.

https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p... or https://www.jsoftware.com/papers/tot.htm


Probably the same reason in algebra people went from using "plus" to "p." to "+": for readability.


    I assume it gets easy to read once you practiced it a bunch
Yup, but for me at least, the learning curve is far gentler than a supposedly simple language like C or Go. After using C professionally for 12 years I still have to look up operator precedence or use cdecl from time to time. Meanwhile I feel like I can store all of K in my head with room to spare.


That’s roughly where q comes in, as syntactic sugar on top of k


Nice just what I was looking for.


It is like asking why don't all human languages use the Latin alphabet and use drawings instead.


Not to be confused with the language “K” I built for my masters thesis in Auckland in 1985 :-)


How antipodean. "I'll make my own from what I have handy."

Deepest respects


what's your K like?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: