A story, since in retrospect I think it's worth telling.
Some years ago I was at the SF Clojure meetup. Arthur Whitney's daughter worked at the sponsoring company, and he agreed to come tell us about K.
In retrospect, I don't think we gave him the welcome he deserved. No one was rude or anything, but it seemed there was a disconnect: Arthur was keen to show off how fast K (and Kdb) was, over zillions and zillions of rows.
But the thing that the Clojurists were all searching for (that got them into Clojure in the first place) was expressivity. Is Clojure fast? I don't know, generally the problems I face come down to avoiding balls of mud rather than performance bottlenecks. And I think that was true of most there.
So Arthur got something of an underwhelming reception. I remember someone asking "Does K have the ability to self-modify, a la Lisp macros?" When Arthur said no, you could see most people in the room just sort of mentally shrug and move on.
And this was too bad. Because recently I've been playing around with J (another APL descendant) and been very impressed by some expressivity/readability benefits. Some small things that have very big effects on the codebase you actually end up with.
The first thing is the avoidance of abstraction. To use a Twitterism:
Broke: Just write your code and don't divide it into functions, creating one long main method
Woke: Divide your code up, naming parts that get reused
Bespoke: If your code is made up of really really short things, it ends up being shorter than the names you would use, so you can just write the thing rather than your name for it. An analogy would be: there is no human-comprehensible way to communicate the idea of "picosecond" in less time than an actual picosecond.
The other thing I didn't expect was the benefit of multiple dispatch being baked into e v e r y t h i n g. In Clojure I might write (map + a b) to add each index together; in J I could just write a+b.
This is neat stuff! Best practices for keeping complexity down in APL's tend to be the opposite of what they are in other languages. Aaron Hsu gave a talk about this: https://www.youtube.com/watch?v=v7Mt0GYHU9A
It's too bad! Arthur came to tell us about speed---there's a reason it's used on giant datasets in finance, where performance translates directly into cash---but I wish we'd had the presence of mind to ask more about experience of writing K.
So, Arthur, if you're reading this: Sorry everyone seemed kinda bored in SF a few years ago when you kindly came to present. We missed out!
really really short things, it ends up being shorter than the names you would use, so you can just write the thing rather than your name for it
I inherited a kdb/q codebase like this, the problem with it, is that when you making a change, you need to find all these little code snippets and change them.
You need to rely on TDD to make it work, and this doesn't really scale to large projects.
Much better strategy if you don't want to name functions, is to use their hash.
But still conventional approach of choosing meaningful (longer than 1-2 letter) identifiers and writing comments works much better.
Also the same expression in q can represent several different things, i.e. a symbol might be a name of the column in the table, a key in the dictionary, an enum, a string, a constant, a global variable, etc. Without naming it, after a week you will no longer remember the original meaning.
To add salt on injury - you are limited on the number of global constants (really variables) you can use, so it's not always possible to name everything.
I was at that meetup, and I do remember the Clojure crowd not being all that impressed. But it definitely came across that Arthur Whitney was a type of programming wizard. He had some offhand remark that he was running his laptop off of an OS he made with K, and was editing with an editor he made in K, but somehow what still working on getting a sound driver to work.
kOS! Some of the people who helped work on that hang around news.YC!
There's some pretty interesting stuff on it. The Impending kOS paper has probably introduced more people to APL and k than just about anything at this point.
It was directly the reason I wrote oK. I wasn't satisfied with learning k4 when I knew that somewhere out there Arthur was tinkering with k5, and the only way to have it was to build it...
K1 was never available outside of Morgan Stanley AFAIK, and I’ve never seen any docs of it. So ...
APL -> A+ : opinionated (vector indices always start at 0, for example) and “electric” GUI (deserves a whole post to describe)
A+ -> K2 : significantly simplified - only ascii, no multidimensional arrays (uses nested vectors instead), much smaller vocabulary but just as effective in the basics department (e.g. the Whitney where/replicate monastic operator replaces 6 or 7 APL operators), added simple dictionaries, workspaces, dropped a lot of math from language. Simpler electric GUI. Time and date native types (as floating point)
K2 -> K3 mostly dropped GUI afaik. Added integral time and date.
k3 -> K4/kdb+: simplifies language more, comprehensive dictionary, TSDB rolled deeply into language (was previously a product written in K), native 64 bit support.
K4->K6: still more simplification, I think k6 is the kOS language. Never left dev state.
K6->k7->k9: continuous simplification, now at shakti rather than kx/firstderivatives - and with much more community involvement.
“Replaces” in the sense that you would use it instead, and not feel like you are missing something, not in the sense that it is equivalent char-per-char.
APL has a few expand/compress operators that take a boolean vector; all of these in K use monadic & (where/replicate), either on the at-index e.g. +/a[&x>3] which is read “sum over a where x is greater than 3”, or at the amend-index, e.g. @[&7 9 15;i:&x>3;:;7-x@|i] which is read “in a vector with seven 0s, followed by nine 1s and fifteen 2s, in the indices where x is greater than 3 (call those indices i), place 7 minus those values from the same indices in reverse order.
Obviously contrived examples, but in APL you’d have to compress/expand a couple of times instead. (Sorry, don’t remember the operator names, my APL is rusty, last wrote a real program in 1992....)
I'm not familiar with J but it also appears to have something similar. However the resulting code seems a bit more convoluted than the K version.
+/ (I. a>3) { a
I see the one letter email is spreading..... I’m actually surprised Arthur can live with such a long and verbose email address - 12 whole letters is almost enough for him to write an music player or something.
In it he mentioned that abstraction is bad, which was also mentioned in the previous K discussion. I still really struggle with that belief.
I find abstractions essential to mentally manage the complexities of the overall solution. For example I just implemented some REST calls at work, and when coding a GET or a POST call I don't want to think about the gritty details of how to conjure up an OAuth token from the multiple supported ways just so I can stuff it into the HTTP headers. I far prefer to know that the passed OAuth token provider abstraction (interface instance) will provide a suitable token if asked nicely.
Though perhaps he must mean something else than what I mean by abstraction. After all, the APL sample code he showed is abstracting away quite a lot by way of the various operators and whatnot.
I believe our industry overuses the term "abstraction", and incorrectly. You're probably right that he meant something different by it.
In your example, the OAuth token provider is concrete, and not having to know the details of OAuth would be an example of encapsulation.
Abstraction would be deriving general rules (of authentication, perhaps?) by drawing from the OAuth example, and by induction being able to generalize for anything not OAuth specific.
So wouldn't a further abstraction be to introduce an auth provider that returns a set of HTTP header values to be used, or something along those lines?
I could then implement the OAuth auth provider based on the OAuth token providers interface I got now. And this would allow me to not think about how exactly the auth is done at all, just that it is done somehow.
Isn't this the layers of abstractions mentioned? If so, how is this bad?
There are many good points about KDB/Q the product (most of all simplicity) and it certainly was well-engineered. But K is not a modern language and its performance is only impressive as an interpreted language; and these gains come at a significant cost (various limits on expression sizes / variable counts / branch sizes, no closures, poor error messages). It's only really "fast" on bulk operations, much like Python.
If you want to see what a modern K or Q might look like (with static types and an embedded JIT compiler!), checkout the excellent Hobbes project from Morgan Stanley:
https://github.com/Morgan-Stanley/hobbes
Well, this is one of the reasons why I left the Clojure community (along with the complete disregard for performance and the excessive personal branding with low effort, 101 blog posts). It really feels like a cult sometimes. They already program with the perfect language that Hickey designed for them, so why bother with anything else?
Great point re: the “Bespoke” style of programming. It takes some unlearning and getting used to tacit programming [0] and trains [1] to appreciate this.
I’ve worked on fairly complex systems using kdb/q that are a joy to debug because the codebase is a handful of files. q is admittedly a more simplified and “mainstream” cousin of J, but allows a fair bit of expressiveness. The powerful data store (kdb) and q-sql that comes along with it
makes it one of the most concise data platforms ever. Too bad it’s not free for commercial use!
> so you can just write the thing rather than your name for it
I do not find this convincing. Names, especially good names, can convey additional meaning, which is helpful even if the names are longer than what they stand for in terms of raw character count.
One obvious example is that I would often rather see a seven-letter manifest constant than the two-digit decimal literal it represents.
In a similar vein, I would rather have a structure with named fields, than array indices or offsets. employee.department is objectively better than e[13]; a compiler can generate the equivalent of the latter from the former.
In calculations that have many nodes in the syntax tree, it's helpful to break up the tree and give intermediate values names, which remains true even if the names are longer than the intermediate calculations.
A name may longer than what it denotes in raw character count, but in spite of that it may less complex due to its flat structure which only indicates "same or not". If a 13 character identifier stands for some 9 character syntax that denotes a tree with 7 nodes, the identifier is certainly simpler.
Lastly, different expressions that occur in a program can sometimes be very similar to each other. If we do not use names, then these differences can be hard to find. It might seem that identifiers which are longer than what they denote make the problem worse, but that is not necessarily so. Identifiers which are clearly different from each other can be selected for denoting forms that are very similar to each other. If instead of ((<]):) and ((<[):) I write foolyfoo and barlybar, that will be less error prone.
Do you have trouble discerning "cobble" vs "coddle"? Maybe; how about with context? "We cobbled together some crazy hack," vs. "We coddle our infants to show love and affection."
Arguably this comparison is quite similar to your "((<]):) and ((<[):)" example. Certainly the relationship between "]" and "[" is the same as between "b" and "d". Kids first learning to write often do mix up "d" and "b". The difference certainly seems subtle and arbitrary in the first stages of study.
Do you have much experience in APL/J/K?
As I see it, the point is that these languages offer some really spectacular ergonmics and expressibility _once one has gained fluency_ and in a way not possible elsewhere.
This is really surprising! Perhaps that's why there's a lot of initial skepticism.
In the space of all possible programming languages, it certainly seems like we've explored a pretty darn good chunk. There are thousands of languages trying out myriad different possibilities. We have everything from C to Java, from Haskell to Scheme, from Forth to Bash, and this still doesn't do justice covering all the paradigms and methodologies we programmers have dreamed up. Is it really possible that there's some random corner of language space that offers something more than localized and incremental benefits? Is it really true that there's some wacky scheme for writing code that is way shorter to write, pleasant to read, while also running fast?
The answer is yes! And that's super cool! The catch is that it's a really weird family of languages, but if you spend the time to learn them well, I suspect that you will also see what we are talking about.
That’s an excellent comment. The only thing I have to add is that is’s not some newfangled concept that’s bad but we’ll only figure out why in a couple of years: APL has been around for over 60 years on various forms - it is about as old as Fortran and Lisp.
In fact, APL was designed for paper, as a well-defined notation for algorithms in math and CS papers (which are still either sloppy with algorithm semantics or spend a lot of space specifying their non standard notation to this day). But since it was well defined, it was actually executable, and when computers were good enough, an interpreter was developed at IBM. That’s why APL has so many weird symbols (it’s a paper notation for math, at heart). K is a practical simplified version/dialect that uses only ascii (as does J, which is essentially full APL with ascii)
One of the main complaints against APL is that they are write only. They take longer and more effort to read per character, granted, but per “semantic information”, for any reasonable definition, APl/J/K are way faster to read for someone proficient (and yes, I take proficiency for granted - the minute details in any language between Boolean short circuit, non local control transfer, integer/float conversions etc make proficiency essential)
With context, I can understand "We co___ed together some crazy hack" and "We c___le our infants to show love and affection."
These sentences almost give definitions to the partially obliterated words, which is why we can guess what those words are. We can form plausible definitions, and then work backwards through our vocabulary by pattern matching.
(I would probably guess "cuddle" for the second one; I have negative associations with "coddle", since it's mainly used in describing situations when ineffective adults are supported economically, and protected from the consequences of their bad decisions.)
The strings in your APL/J/K are not existing vocabulary words that you're re-iterating with a context that gives their definitions. (Some of them may be idioms that appear in other programs, including other people's programs, but others are local inventions in the program.)
If you obliterate parts of them with underscores and then hand that to the next programmer, they are screwed.
Cobble and coddle are confusable. If we use cobble as a parameter name in some C function, and later refer to it as coddle elsewhere, only the compiler will notice. (Provided it's not the case that they are both defined, and have compatible type.)
My eyes will likely will not spot that the "coddle++" isn't "cobble++", and the compiler has nothing to say. Doubly so if I'm not aware of the existence of the coddle global.
I remember copious instances in the past when my brain decided to hide a difference like that, and then "cached" the decision so that it remained hidden even after staring at the code. Once you believe that coddle++ is incrementing the cobble parameter, your brain may not let your eyes take a fresh look at it.
Very recently, I was tripped up for a while by a sentence in some documentation that talked about PTP (precision time protocol) and P2P (peer-to-peer) in the same sentence. It didn't make sense until a fifth re-reading with a coffee break in the middle.
Not sure what to tell you if you're convinced I'm lying.
The latter half of your argument I find strange. It is arguning against APL/J/K using a C example. Mimicking this tactic, I could argue that C is completely unreadable because it indents things at so many different levels. If my x86 assembly did that it would be a huge mess and hard to follow.
The languages are just so different that the syntax of one offers properties completely alien to the other, "What is this weird x=2 syntax? I just want 2 in my rax register and C doesn't let me do that?!"
Part of the issue with APL/J/K is that to understand some of the coolness, you really do just need to trust the evangelizers and drink the koolaid. Maybe you'll end up thinking we're all just loonies, but if you give it an honest try, I suspect something will click and you'll see the point we're trying to make.
I see; it's okay for you to use the English language with sentences involving cobble and coddle, but not okay for me bring that back around into a computing context, using a language other than APL, J or K?
I used C because it's a lingua franca of computing, in which it is natural to use identifiers like cobble and coddle.
We can use Python, if you like, or JavaScript, Pascal, Lisp, Haskell, Basic, ... or a Makefile, shell script, HTML, CSS, Sendmail config file, LaTeX, ...
Thea argument is that coddle and cobble are, in fact, plausibly confusable as identifiers in computing, where we don't have the right kinds of linguistic context to spot a mix-up.
If I have <div class="cobble"> in some HTML, and the CSS has a .coddle { ... } selector, I can see not being able to spot the problem.
Furthermore, I have a point in that a sequence of symbols in your APL program which is particular to that program and not a common idiom is not comparable to a word like "cobble". It's more like "cmzblx" versus "cmzdlx".
Because (at least local) identifiers can usually be freely chosen by the coder, I can avoid this kind of problem. I can easily change "cobble" to "horse" (which doesn't look like "coddle") throughout a source file and be confident that it still means the same thing, because the letters are not operators that encode meaning.
> to understand some of the coolness, you really do just need to trust the evangelizers
They're too late; I might have been fooled as a 16-year-old in 1987.
> What is this weird x=2 syntax? I just want 2 in my rax register and C doesn't let me do that?!
When I first saw a BASIC program with assignments to some variable X, I thought they were a system of equations for the machine to solve. I parsed the syntax right, though; it was just the semantics that was off.
Hrm. Let's step back for a moment. Your responses read to me as defensive and it is not my intent at all to attack your position.
If you do not want to deal with APL/J/K, that's perfectly fine. Your interests and goals are very likely different than mine. All I wish to communicate is my experience, "Hey, this language seems really weird and counterintitive and completely non-sensical on the suraface, but I spent some time actually learning it and it turns out that these first impressions were wrong!"
You know how modern mathematical notation is really compact compared to the "more readable" prose that used to be used in the past? The funny part is that the prose is a lot harder to parse than the modern notation, provided you already know the latter. APL/J/K is a lot like executable math notation in that way.
What about you? Is there some core point you are trying to get across that I/we seem to be missing?
> I see; it's okay for you to use the English language with sentences involving cobble and coddle...
So my answer is that, yes, my analogy with English is relevant, because it attempts to communicate to you what it feels like to read unfamiliar J code in practice.
Your analogy with C is making a different point, that opaque identifiers with similar spellings and similar contexts are confusable. I agree.
I used to think the exact same things, but after using kdb+/q for AoC2019, I felt like I had a revelation. When you get rid of all abstraction, your code size is reduced so dramatically that abstraction is no longer needed. My solutions were maybe 10-100x shorter than everyone else's (except some guy's k code I found) and to this day, I can still mostly remember them, character for character (much like how an equation is easier to remember than a traditional line of code).
Being able to fit all of your code in your head at once is a truly magical feeling.
You should give definitions a chance, because they point to a way to make your code even smaller. Consider that dictionary compression, like LZ77 and LZSS works by encoding redundancy using symbols that index into a table.
Even APL/J/K programs with repeated patterns in them are further compressible. Occurrences of any repeated string more than two bytes long can be replaced by a 16 bit symbol.
Speaking of which, why don't you just learn the data structure and algorithm of a compression program like gzip inside out? Then you can write compressed code in any language. Others can easily work with your code by uncompressing it and then compressing it again when their changes are applied. Or vice versa; you can work with other people's code by compressing it first, then doing your changes in the more convenient, ergonomic compressed format. Maybe some checkout/commit hooks can do that transparently.
Compression will ferret out gratuitous frivolities like 17 copies of remoteDatabaseConnectionHandle, and turn them into a convenient, tiny code.
Think of how much smaller it can be if you go beyond reducing tokens to character. You've now got Huffman and dictionary coding at your disposal, and possibly more.
Mine; progressively porting from k7 to k9. Not as many answers and probably worse coding than ngn's, but perhaps more approachable for newcomers.
https://github.com/chrispsn/aoc2019
This is a program which generates stereograms. It not only does that, but its source code is one; if you stare past it into the distance, you will see the letters IOCCC appear floating in the foreground.
You might be right that good code needs names. But since in a terse language like J or K it may take longer to think up a good name than to write the code, it's handy that one doesn't have to have names for everything when prototyping.
The "write the thing" attitude perhaps comes to some extent from (human) languages that have that as a natural word invention mechanism - such as German, Finnish, Sanskrit. In Sanskrit, etymologies seem to often go down to single syllables.
I quite often miss that in programming languages. To some extent, functional languages let you construct functions that compose like that and we also see that happen with CSS with frameworks like Semantic-ui and Bulma. But it would be fun to have or experiment with it at the identifier level.
Edit: I generally prefer to use "abstraction" to refer specifically to "beta abstraction" and not the "make named functions" kind ... Which is more about encapsulation. Beta abstraction always results in something more general and abstractions are about leaving out detail.
Each method has its own innate complexity, the complexity of "whats this method do? when do I use it? Why would I use this method instead of a different method in this class? Now I have to dive into this method to read its implementation which pulls me out of the reading context I was in before"
Also, your point about the meaning beyond the code can be served by a comment, which drops the constraint of "don't make it too long" and "write in camel case", and also allows inline code reading!
There's a time and a place for everything. I reccomend "A Philosophy of Software Design" for a book that elaborates on the idea of "less and longer methods tend to be better than many concise methods".
What you want to make short/simple is the API, rather than the implementation!
A function has a well defined scope, parameters and output. A comment does not. A function name describes an intention. Within the scope of this function, someone can check whether the intention is respected by the implementation. If this function calls other functions, just assume that they are working as their name suggests. If that is not the case, fix those other functions later.
This has been my overall approach to maintaining code. To such extent that I design things around the possible names I can find. Some call that DDD. I call that respecting my team mates. Without writing and naming function, this is not possible.
Languages such as J and K are fine for single developers & prototypes but really bad for bigger projects.
about naming, h.baker said that combinators were lambda calc without the need for names. summarize a feeling i had when programming. i love sets of combinators to plug rather than languages.
Not sure I actually understand why K is pushed here almost daily. Is it just the esoterics? Alright, APL and K are expressive for array computation. Do many people here deal with array computation? K is also supposedly fast—not once have I seen an explanation of what makes it fast and why those techniques aren't looted for open-source languages.
I'm told that the interpreter is so small that it fits in the cache. Yeah, so how much time does a non-K program normally spend loading the code? How does this free K from loading and writing data?
Is K in fact a thin layer on top of assembler (specifically SIMD and other extensions), for number crunching in the vein of shaders/CUDA?
Not sure I actually understand why K is pushed here almost daily.
Most programmers find k interesting. k is fast. k is small. k & APL are great as a tool of thought. It has a "lost art" feel. It's an intellectually satisfying endeavor.
Do many people here deal with array computation?
Like three Dyalog employees occasionally contribute to discussion, fintech employees make up like 10% of the people regularly writing comments on the site, and array programming is incredibly popular when in disguise and made less efficient as Python (numpy), pretty much essential for multiple subfields of computer science, so forth.
K is also supposedly fast—not once have I seen an explanation of what makes it fast
People talk about it all the time here, though. 'geocar mentions why in almost every thread, and he should know: he helped write an operating system in the language.
and why those techniques aren't looted for open-source languages.
Because they don't want to believe it to be true! When told you're messing up something simple, you have to question your assumptions a bit. People hate doing this, especially when they've invested time in seeing them through.
Is K in fact a thin layer on top of assembler (specifically SIMD and other extensions), for number crunching in the vein of shaders/CUDA?
Kinda, yeah. It's more interesting--along some axes--than 'how we went from 100 customers to 101 customers' (which some how required a re-write in Java?)
But also I feel like a lot of people look at their C/Python/SQL/JS/HTML/CSS stacks and their thousand person company and hundred thousand lines of code that kind a sorta just directs people at a Stripe widget and think: We Clearly Are Doing Something Wrong.
Not that K/APL or Prolog or Smalltalk or whatever would actually magically make people do-the-right-thing or do-a-thing-better but they're sort of different worlds so they inspire a bit more wonder and reflection than 'how we moved from python to go to rust'
> Not sure I actually understand why K is pushed here almost daily. Is it just the esoterics? Alright, APL and K are expressive for array computation. Do many people here deal with array computation?
People like k for different reasons. For me, it's the right blend of fast to develop and runs fast enough. Someone recently made a database engine called okon which is designed to process HIBP and looking at their github it took 20+ days of C++ code to do. And they managed to get under 49µsec so they pat themselves on the back for it being so good.
But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.
So for me, the decision is, 5 minutes, or 20 days, and that's a no-brainer.
> How does this free K from loading and writing data?
Your CPU has a shockingly small amount of directly-attached memory (around 64kb). That's been true since the 1970s, and is likely to be true for another fifty years.
This used to be called ram, but now we call anything directly addressable by the CPU as ram, and we call this thing cache (or L1 or whatever) because that's basically how most people use it.
Now what your CPU does all day, is look at a memory address, and if that's not in the cache, it sends a message to a memory controller and waits for that memory to show up (somewhat a simplification). Those instructions show up, now your CPU can execute them, and those instructions require more memory so again the CPU sends a message to the memory controller and waits for that memory to show up. All day long, alternating between these two wait states, we wait a bit for code, we wait a bit for data, and then we repeat.
Because K is small: Because the interpreter, the application source code, and all your state variables fit into that 64kb, you can see that memory waiting drops by half.
> Is K in fact a thin layer on top of assembler (specifically SIMD and other extensions), for number crunching in the vein of shaders/CUDA?
No. It's a very simple interpreted language that (in most implementations) dispatch directly on the source code (or in some cases, on the AST).
> why those techniques aren't looted for open-source languages.
If you really understand the above, the answer to this question is obvious (hint: try to think about values, not features). Until then; until you really understand the question you're asking, you're not going to find the answer very useful.
> But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.
Out of curiosity I did the same in python (with numpy) simply memmap-ing the sorted binary hash list and doing a binary search (again using numpy) on that.
I got around 40us lookup time using that solution.
Not too bad for about 5 minutes of coding.
Also tried using leveldb as datastore and got around 6us when using that (and solution is also only couple lines of code).
The leveldb solution is quite straightforward (running in ipython3 for easy benchmarking with timeit):
import leveldb,hashlib
db = leveldb.LevelDB('db')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
try: # the leveldb api is a bit minimalist..
db.Get(m.digest())
return True
except:
return False
%timeit lookup(b'password')
Creating the database was just calling db.Put(x) for every entry.
The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:
import numpy as np
db = np.memmap('db3', dtype=np.dtype('S20'), mode='r')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
idx = np.searchsorted(db, m.digest())
return db[idx] == m.digest()
%timeit lookup(b'password')
And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.
> Not sure I actually understand why K is pushed here almost daily. Is it just the esoterics?
Kind of - they have best practices that are almost the opposite of the standard software dev best practices - and they still produce high quality maintainable software.
It’s interesting to see that a lot of “truths” in software development are at the very least context dependent if not outright pseudoscience.
I don’t think you can simply “loot” their best practices - although some devs do write c code the way they would write k code.
Edit: To expand on this for example even if you want to use short identifiers, multiple statements on one line, and avoid abstraction in the Java world you will be fighting the entire ecosystem to do it.
although some devs do write c code the way they would write k code.
Including Arthur Whitney himself; this is one famous example of that, and is worth close inspection if you're curious about the whole concept of array languages in general:
Observe how quickly it "climbs the abstraction ladder", making heavy use of macros --- if you try to read it like "normal" C and just jump into the middle it's definitely not going to be readable, but if you start at the top and "consume" the definitions first, it becomes surprisingly easy to understand. One thing that stands out is how it makes almost all the loops implicit, via the DO macro.
I am not a professional programmer, I use it as a tool and most of my stuff is written in high level languages, but I have some C in there. Having said that, I feel like I am competent enough programmer that if I look at someone's low level code I get the gist of it almost immediately. I looked at this and it in no way was clear to me at all. I don't see the reason to shorten stuff like printf and main. Reducing key strokes adds something?
> if you try to read it like "normal" C and just jump into the middle it's definitely not going to be readable, but if you start at the top and "consume" the definitions first, it becomes surprisingly easy to understand.
I will take your word for this as you might know more than me about programming, but I feel like at least half of my colleagues need to get a gist of what is happening not what exactly is happening to the word. I make simulators for very complex machines. If we went around "climbing the abstraction ladder" every time something needs to be modified or added I think I will personally not get any work done.
> I don't see the reason to shorten stuff like printf and main. Reducing key strokes adds something?
Bugs hide in the extra chars.
On a more serious note, there are several (debatable) benefits to brevity (in programming & in general communication);
- less code to keep in your head
- less chance of stupid typos (e.g. prinft instead of printf)
- more "expressive" in that each char has more meaning
- to someone who is fluent in the "shorthand", it is quicker to both write & read
All these benefits don't seem like benefits to me. I don't want to learn a new short hand language when I already learned an extra language, English, to learn programming. Also I don't see expressiveness in terms of keystrokes, you don't store bytes in your head, you store tokens. I can have very different words for a character (char, "", etc.) but in my head they take exactly the same amount of space.
Are simple typos that annoying? Writing a wrong function call will lead to either a compile time error or a wrong behavior, but latter happens when the function names are not chosen smartly. Also we have IDEs now, they are pretty good at handling typos.
I don't understand how it is quicker to remember printf("hello world") than P("hello world").
> I can have very different words for a character (char, "", etc.) but in my head they take exactly the same amount of space.
how sure are you? given how little we know about the brain, i'd be cautious with asserting something like that as a fact. while i don't think we store bytes in our head, i'm not certain that the length of a "symbol" doesn't matter either.
Maybe different people do it differently, after all some have photographic memory, I don't. But I certainly don't think about all the letters involved in a variable but rather the concept of the variable, or what that variable represents. Doesn't matter if it's called "numberOfBurgersPerCustomer" or "nb" as such. But in case I forget, "numberOfBurgersPerCustomer" immediately tells me what is going on compared to just "nb".
I guess the point is that I don't want to wade in definitions just to get the gist of a code. I want to jump in there, see immediately what's going on and where the problem can be.
Another thing about that code that I think irks some folks but is also reminds me of reading math is that it's much easier if you are aware of some common conventions and notation. For instance, the lines:
#define V1(f) A f(w)A w;
#define V2(f) A f(a,w)A a,w;
Are much more quickly recognizable if you know that in APL the left and right arguments to a function are alpha (α) and omega (ω), so you'd immediately recognize these macros as for defining 1-argument and 2-argument functions.
For folks in the know, conventions like this allow quicker communication and expression, but they also raise the barrier somewhat to the uninitiated.
Side note for folks used to "modern" (post 1989) C code - this is old K&R style function definition. This SO post (https://stackoverflow.com/a/3092074) shows a comparison.
There's definitely been a marketing push, I believe kx systems is aggressively trying to diversify outside of finance. They recently scored a contract for a Canadian electric company for storing electric usage data for 5 million customers. In a data science meetup I went to, they presented and I thought it'd be about machine learning. It was just a generic presentation on kdb.
They are marketing agressively. For example they were pitching to a bunch of people at AWS re:Invent, and very definitely marketing along the lines of it being a generic language for data + a database.
When I told their marketing folks that I had used kdb/q a bit, didn't think it a good fit for our use-case and knew from a brief sojourn in hedge fund-land how hard it was to hire kdb programmers their response was on the lines of "It's not hard to program in and if your guys find it hard they can just use our python wrapper".
Honestly it is an interesting thing but the extent to which it's being pushed here on a daily basis at the moment feels "inauthentic" and somewhat tiresome.
Honestly it is an interesting thing but the extent to which it's being pushed here on a daily basis at the moment feels "inauthentic" and somewhat tiresome.
Try posting something else? The front page is populated with links that others find interesting. I started posting on HN mainly because I thought the front page was declining; it's worked out for me, really, APL is on the front page every day now. If you have any good links hanging around, people would be glad to see them, and it will probably tilt the front page into things you like a little more.
There are also 29 other spots on the front page, and some of those spots contain interesting links too! No one has time to read all of the comments on HN every day, so only reading the comments on posts you find interesting will probably be more enjoyable than reading the ones you find tiresome.
Damn, really? Has it been kind of a waste then? Seemed like a good way to handle smart grid data, for an old school utility that can't feasibly roll its own large time series storage solution.
Not 100% sure, but my limited understanding is that they have everything, but don't do anything with it at the moment. However, when they need it, it'll just be there which is much better than needing to build something and get all that data.
Have you tried programming in it? I would call K a general purpose language that can do pretty much anything. I can only speak personally, but learning k and q were a transformative experience that have changed how I think in a fundamental way (in a similar way that lisp and haskell did when I learned them in the past).
> not once have I seen an explanation of what makes it fast and why those techniques aren't looted for open-source languages
The language design is such that the programs you write can make optimal use of cache and SIMD. The interpreter can only take advantage of that if the language offers you that. It's like saying "not once have I seen an explanation of what makes c so fast and why those techniques aren't applied to python and javascript."
(Note also that the reference J interpreter is opensource and very competitive, performance-wise.)
This should apply to any high-level glue language that's wrapping primitives implemented in a "fast" language, right? I'm thinking in particular something like python with numpy. As long as most of the work is done within the primitives I would think they'd be pretty comparable.
Is there something about the primitives that APL-derivitives provide that mean that more of your program can happen inside the primitives so the interpreter is doing less work?
I'm also curious how the performance compares against something like Julia, which is JIT compiled so you avoid the distinction between the high-level and low-level code. Seems like a rather different region of the high-performance design space.
edit: also how does this relate to the "the whole interpreter fits in the cache" explanation that's also widely-invoked to explain K's speed? It seems like if you're spending most of your time in the primitives than the interpreter time doesn't matter. Maybe it's a combination of both - K and friends encourage you to organize your program in ways that support burning through data quickly, and additionally when the interpreter is operating it's a particularly fast interpreter?
> Is there something about the primitives that APL-derivitives provide that mean that more of your program can happen inside the primitives so the interpreter is doing less work?
I'm not an {APL,K,J} expert by any definition, but FWIW I think that's the "array-oriented" part of it: syntactically the operators don't specify the "rank" of their args so the interpreter is free to select the most efficient low-level semantics. Something like that.
Is there something about the primitives that APL-derivitives provide that mean that more of your program can happen inside the primitives so the interpreter is doing less work?
Yes. Well, maybe not the primitives themselves, so much as how the language pushes you to use them.
Last I looked, execution of the J interpreter, especially on large data, tends to be a handful of very tight loops inside the built-in verbs' implementations, and those loops are glued together with a small amount of the higher-overhead dispatch code normally associated with interpretation. Implicit looping incurs a once-per-array-op cost, but the cost of the computation done on each array element dominates the total cost. The interpreter is not especially clever, but the language encourages the programmer to write code that's easy for the interpreter to handle.
Performance is a bit fragile though. If the operation you're lifting over a whole array includes array-lifted operations internally, now you're doing the heavy interpretive dispatch inside the loop. As a simple example, {.+{: means "add the first item to the last item." If you apply it to a vector, it adds the first and last scalars. If you apply it to a matrix, it adds the first and last rows (and so on). Applying it to a very wide matrix is cheap because {. and {: just have to select the right rows, and then you have a simple loop over those rows' elements. It's about 166ms on my laptop with ten million row elements:
wide =. i. 2 10000000
6!:2 '({.+{:) wide'
0.166085
With the data oriented the other way, there's no inherent reason computation should be any slower, but it is. ({.+{:)"1 means, "apply {.+{: to each individual vector." So we've changed from selecting the top and bottom from each column to selecting the left and right from each row. The way the J interpreter is structured, we have to figure out the loop for {.+{:)"1, but the loop body isn't just grabbing two numbers and adding them. The loop body jumps back to the interpreter to figure out how {.+{: should work on a 2-element vector. That takes about as long as figuring out how it should work on a 2-row matrix, but we do that for all ten million vectors.
tall =. i. 10000000 2
6!:2 '({.+{:)"1 tall'
2.99855
I think it's the dreams of possible job heaven. With an esoteric language that can be used for finance, you can not only grow your mental gonads ("it's even smarter than Haskell!!") but also possibly get a six-figure job that will be very secure.
>not once have I seen an explanation of what makes it fast and why those techniques aren't looted for open-source languages.
It's fast (for an interpreter), because instead of for example a bytecode interpreter where each instruction operates on a single value, APL/j/k interpreters execute operators over whole arrays at once with optimized C code.
And the set of operators has been designed to work with each other and cover the basic concrete needs (i.e what the CPU actually needs to do rather than how you would describe it in words)
,// is a complete implementation of “flatten”.
|/0(0|+)\ is a complete implementation of an efficient the maximum-subarray-sum
In both cases, each character does one very well defined thing implemented in tight C (sometimes SIMD) code, and K interpreter just orchestrates how they interact.
(Parentheses here group as a pair, but everything else is an independent operation per character)
K/KDB is a pile of shite. It's loved by quants and managers in financial firms. But it's hated by programmers who actually have to write code in it and deal with shitty code written by quants. K is full of arbitrary restrictions (two many global/local variables - what is it, 1970s?) and crazy quirks (function closing brace has to indented, otherwise you'll get a weird error). All in the name of performance, of course. The error reporting is insanely cryptic. It doesn't tell you what caused an error or even where it happened. It just tells you something like "'rank" - and that's it. Once your code grows beyond 10 lines, you can kiss your sanity goodbye.
And let's talk about performance. K is inherently single threaded. If you have a powerful 96-core machine, it's of no use to you because your K code will be using only a single core. I've seen numpy easily beating K when numpy uses MKL library with automatic parallelization.
There is a reason K/KDB is not very popular outside of a certain domain. It's not because it's proprietary and it costs a fortune. It's just bad.
Having worked a bit with KDB, my experience is totally different. The investment bank I worked at had a large and thriving KDB community of developers. In one particular project, we were able to replace a system consisting of 10's of thousands of fairly good but heavily abstracted java code reduce it to a few hundred lines of Q. The java application at took 7 hours to run its worst case query and the Q code that ran the same query in less than a second. The Q code was solved the business case better, it was more readable mainly because there were no unnecessary abstractions. There was still a OOP middle tier, but it was mainly pass through. Calling any technology a "pile of shite" should be done with great care. Most solutions solve a use case and KDB/K solves its use case exceptionally well. To be honest, understanding what the 'Rank' error means is not that hard. For the project I referenced earlier
I personally found it much easier to debug Q than trying to find out what an an AbstractCalculationFactorybuilder was doing. There are very few poor languages, just poor developers.
The restrictions, lack of lexical scoping, and weird error messages are indeed frustrating, especially when starting out.
> And let's talk about performance. K is inherently single threaded. If you have a powerful 96-core machine, it's of no use to you because your K code will be using only a single core. I've seen numpy easily beating K when numpy uses MKL library with automatic parallelization.
I'm not sure where you got that from, but it's not true. You can use peach to quite great affect. For example, to do a matrix multiply that'll utilize all slave threads (which you can set at runtime):
mmu[;x] peach y
For more info, check out https://code.kx.com/q/basics/peach/. And while the parallelization isn't automatic, it's pretty damn easy to just s/each/peach/g (you should profile first, though, of course). Moreover, you can just run multiple interpreters and communicate through IPC. It works really well and is pretty easy to do. Give me some numerical python code and I guarantee k/q can do it at least as fast. If your pipeline is something like do some number crunching and save a bunch of rows to MySQL, kdb+/q is always going to be faster, hands down. The performance benefits of having the db and language in one process cannot be underestimated.
> There is a reason K/KDB is not very popular outside of a certain domain. It's not because it's proprietary and it costs a fortune. It's just bad.
I strongly disagree. If it was up to me, I would throw out all the crap numerical Python code I write everyday and switch to kdb+/q. kdb+ is a pleasure to work with and the query language is so much better than SQL it's not even funny.
The only real problem with kdb+/q is that there's not many libraries for it, in particular, neural net and ML libs would be on my wishlist. Though, I'm sure if you're a paying customer First Derivative will whip up some super high quality shit for you.
Also, you're argument doesn't really make sense: if k just sucks, why is it used at all? Because quants and their managers are stupid?
> it's hated by programmers who actually have to write code in it
False. Many people really enjoy programming in it (myself included).
> It doesn't tell you what caused an error or even where it happened. It just tells you something like "'rank" - and that's it.
Since version 3.5 (~2 years ago?) it tells you exactly where the error occurred (prints out the line with a carat showing where in the line the error occurred). And with a little practice you easily learn what the common errors like 'rank & 'type mean. I actually find the error messages pretty helpful.
> If you have a powerful 96-core machine, it's of no use to you because your K code will be using only a single core.
Technically you can run q multi-threaded, but the typical architecture used in kdb+ systems will be using multiple processes, each doing one thing (data ingest, in-memory intraday data storage, data persistence, historical querying, gateway). For example it's generally fairly trivial to add a bunch of extra processes for historical queries so many users can query in parallel.
I work with KDB daily. This comment is misleading and uninformed. I'll agree the language is cryptic but that's what makes it fast. In regards to single threaded, that's just false. There are many options from slave threads to parallel processing you can use to speed up processing if your data sets are large (billions +) or you need to query historical data that is in the size of petabytes. Also once you learn the syntax and error handling it's not bad, just like every other language.
I'm currently working on my own APL dialect. Like the author of K, I decided to not be bound to the restrictions of APL, and want it to be more functional. As I am also doing a lot of Lisp, a lot of ideas come from there as well.
The biggest difference in my language (which as of yet does not have a name) is that it is immutable, and function results are lazy. This means that it can be highly parallel, and the laziness means that there is much less synchronisation needed when parallelising computations.
I'm also making some of the syntax more traditional, in order to make large programs more manageable.
It's implemented as a Kotlin multiplatform project, which means that I can build the interpreter for the JVM, native and Javascript.
Why am I talking about this? Well, it's because my biggest issue with APL is that it's not really suited for large programs (where large means on the order of 100 lines of code or more). Dyalog provides some extensions that make imperative programming easier, but the syntax is really ugly.
Thus, I'm trying to merge the idea of concise APL syntax together with typical functional programming. So you'll be able to write code like this:
Even though the core syntax is still APL, the structure of the program is much easier to understand, compared to the way you'd write it in APL.
I still don't know if experiment will actually lead to something good, but at least it's a fun project where I can play with Kotlin multiplatform support.
The code is on my Github, but I'm not linking it here since it's quite early, and it's not really usable yet (there are a lot of functions missing). I'm expecting anyone who actually looks it up anyway to understand that it's not ready yet.
> The biggest difference in my language (which as of yet does not have a name) is that it is immutable, and function results are lazy. This means that it can be highly parallel, and the laziness means that there is much less synchronisation needed when parallelising computations.
In my experience, it is the other way around. Laziness when implemented as normally is essentially state plus control flow, which is death to parallelism. Parallel Haskell is obscure for this reason, and typical programming patterns use some form of "deepseq" construct whenever thread boundaries are crossed (explicitly or implicitly). Actually taking advantage of laziness in a parallel setting is very tricky. Offhand, I cannot recall anyone who has done so convincingly.
Yes, that it true. However, functional evaluation isn't lazy. It's the computation that is lazy.
Consider the case where you have a very large arrays a, b and c (let's say is contains a few hundred million elements each). You then evaluate the following in APL:
a+b+c
What interpreters such as GNU APL does is to first evaluate b+c, resulting in a new array, and then add that result to a.
In my case, b+c returns a deferred computation. This is then added to a which returns yet another deferred computation.
The actual values are not computed until I read the results out of the deferred computation. This means that there does not need to be a synchronisation point between the two addition operations.
There is also the benefit that if not all results from a computation are needed, they might not even have to be computed in the first place.
However, the functions themselves are still evaluated, meaning that side effects are called when you'd expected them to.
Well, there is an exception. Consider the following:
{print ⍵ ◊ ⍵+1}¨foo
This also returns a deferred result, and this will indeed result in print being called at an unexpected time. You can force evaluation in this case, but I haven't decided what to do about this case yet.
> What interpreters such as GNU APL does is to first evaluate b+c, resulting in a new array, and then add that result to a.
Is this really what most implementations would do? I'd assume it would parse the expression `a+b+c` into a tree like
+
/ \
a +
/ \
b c
But then run a pass on the tree to identify expressions like this that could be fused together without creating intermediate results. I know that most APL implementations have more efficient implementations of certain idioms (commonly-used sequences of operators/functions), so I figured they must be doing something like this to be able to identify those idioms.
This is just conjecture though, I'd love to know more about how this actually works in practice.
Some implementations use reference counting. So in your example, for + the output requires as much memory as the inputs, and if at least b or c have a ref count of 1, then you can reuse one of the inputs for the output. Otherwise you have to allocate memory to hold the sum of b and c. But that intermediate sum has a ref count of 1 and can be reused for the final output.
The key is to code your primitives in such a way that you can reuse one of the inputs as the output.
You can also do clever tricks that describe the array transformations rather than perform them. It’s similar to what happens during stream fusion in Haskell. See https://news.ycombinator.com/item?id=17506789 for more details.
IIRC numexpr got about x2-x5 faster when they switched from numpy's implementation (which naively mostly works as you describe, one op at a time, refcounted, though no good reuse) to a VM that reuses temporary result memory and works in 8KB pages (thus getting way, way better cache use).
But I'm not familiar with an APL/J/K implementation that does.
Typical practice seems to be having a large collection of ad hoc local transformations that look for specific combinations of operations. I've only really seen more general static analysis aimed at not materializing intermediate arrays in academic research.
Expression tree IR is maybe the wrong setting to do this sort of fusion for APL. What you want for a+b+c is a 3-argument scalar function applied to a, b, and c, but you need something other than actual APL expressions for that.
The strict top-down, right-to-left evaluation makes parsing unnecessary for interpreters; you just need a tokenizer. Obviously this became a problem when people started to think about compilers.
I love K/q, but there's some trade-offs (probably for speed) that I find extremely aggravating. The biggest sin of all is the lack of lexical scoping. If k/q had lexical scoping and macros, it would probably be my ideal programming language.
Regardless, the philosophy behind the language is so different, and so incredible, I had the most fun I've had in a long time programming when I learned it. kdb+/q is a work of art and probably the best engineered piece of software I've ever come across.
All Ks have basic module level lexical scoping, which does not deliver all the benefits of lexical scoping, but has none of the pitfalls of dynamic scoping either.
K2/K3 did have (a version of nested) lexical scoping - an inner function would close over the values at the time definition was encountered ; you can still simulate this in later Ks by passing what you need to close over and immediately projecting over existing values, e.g.
make_adder:{[n] {[n,x] n+x}[n;]}
Which defines an inner two-parameter function, and then projects it over the outer n parameter (this is how K2 implemented it IIUC). But that doesn't let you modify names in an upper scope, so is not really comparable to Lisp's lexical scoping or Lua's upvars and Python's nonlocals or whatever they are called these days.
Andrey Zholos' kuc[0] dialect has a JIT and real closures, you might find that interesting, though it hasn't seen an update in a while.
> probably the best engineered piece of software I've ever come across.
As you can probably tell, I'm a K fan and evangelist myself. Yet personally I cringe at some of the kludges. I love K. It's awesome, it's amazing, but I wouldn't use the "best engineered" label myself; "Best set of tradeoffs" would probably be my description (which is a distinction without difference to some, I recognize - those who define engineering as the art and craft of the tradeoff).
As for the content of this page, there's a line within that now applies to itself:
Introductory articles – somewhat dated but nevertheless rather informative:
k4, what it lists as the most recent version, is about 20 years old now, and it talks about it like it's new. Still informative, though!
EDIT: Giving it another look, it looks like the links have been updated over time (for example, the reference to oK), though the main content of the post is about ten years old
and the initial list of links is over ten:
I posted something similar to this on the last k-related thread that came up, but new responses and perspectives are always nice:
Lots of the discussion about the disadvantages of k/q/other APLs seem to revolve around its syntax and mental model. If said barriers were removed (e.g., if the relative obscurity of the syntax and semantics were not an issue and you could assume that you/your team/anyone who looks at your code is familiar), when would I not want to use k/q/APL/etc.?
GC'd languages may not be good for latency/memory-sensitive programs. C/C++ may not be good for reliability/security-sensitive programs. Ruby/Python/Javascript may have performance issues. Haskell may have issues around predicting resource usage. Lisp may have issues around being too expressive.
What issues may I run into that would make k/q/etc. not a good choice?
All APl variants I know are GC’s so what you wrote applies (though refcounted and cycle free so deterministic, and well defined resource consumption)
They are mostly functional, so as usual care must be taken with memory usage when updating data structures.
If your workload does not lend itself to APLization, your program will not enjoy any of the benefits, of course. But in my experience most projects have very significant parts that do.
"Too expressive" is suboptimal wording on my part; the intent was to cover the criticism that Lisp is flexible enough that Lisp programs tend to evolve their own potentially programmer-specific DSLs for the given problem domain.
I guess a potentially more interesting question would be how I could identify workloads that APL-alikes would or would not work well for. In particular, are there resources where I can learn more about ways to translate problems to a format that lends themselves to APLization? One example is the substring search problem; I know about naive string search using loops, Bayer-Moore, etc., but I don't think I would have been able to come up with the APLish version of match-char-of-needle-in-haystack, rotate, and AND. Is this just something that gets picked up as one does more work with APL-alikes?
> Is this just something that gets picked up as one does more work with APL-alikes?
I suspect that it is, much like picking up SQLisms - there are many things you'd do in SQL[0], such as SELECT AVERAGE(x>3) FROM T instead of SELECT (SELECT COUNT() WHERE x>3 FROM T) / (SELECT COUNT() FROM T) FROM DUAL, the latter would be a result of translating Javaism to SQL (though, that is still more SQLish than iterating through a cursor, which would ACTUALLY be equivalent to doing it in Java with a for loop)
The APLisms that I've picked, and carried to other languages (which makes my programs usually much shorter and faster, though less idiomatic) are:
1) Process things in stages - which processing a list, instead of trying to process each element from beginning to end, try to do one stage on each element of the list, and only then go to the next stage; This often requires rethinking e.g. error handling, but on the other hand, stages tend to become generic, commutable, cachable, etc.
2) not independently, grouping is an important stage: having a stage that (solely) decides which element needs to be processed as a part of which group, and label it accordingly - that allows the next stages to run on different groups in parallel, for example, and otherwise tends to make resource use much better and deterministic.
3) not independently, almost everything can be processed using linear memory access patterns almost exclusively. Gather/scatter is often needed, but can be put in well defined points and optimized accordingly (that's probably the main K speed engine - L1 dominates much better than in most languages, because everything tends to be sequential, and even gather/scatter are delimited enough to often prefetch)
I suspect the only way to learn it is to do it.
[0] In this case, the Javaism may be significantly more efficient if the optimizer is stupid and x has an order-capable index; but the idiomtic shorter SQL is the first one, and in most cases the added conditions and real computation make the Javaistic SQL less efficient.
Guess it's time for me to go looking for my advent of code repo...
Those APLisms you list there sound really useful. Some questions about them:
1. Is a shell pipeline an appropriate analogy for what you mean by processing things in stages? And if so, do you have any tips for deciding how big to make each stage?
2. Is this sort of like submitting jobs to a thread pool in a more controlled manner? Is this something that APL-alikes tend to handle better than other languages?
> 1. Is a shell pipeline an appropriate analogy for what you mean by processing things in stages? And if so, do you have any tips for deciding how big to make each stage?
Yes, it's a good analogy and mental model, but not good experience to draw on, because the primitives are on such different levels. An example off the top of my head: Checking if an expression has balanced parentheses. The way you would do it in (not $APL and not $PARSER_GENERATOR) is to go character by character, increase a counter for '(', decrease it for ')' and bail out if you ever go negative. In K you'd do:
open: x='(' / vector of 1 when '(', 0 otherwise
close: x=')' / vector of 1 when ')', 0 otherwise
change: open-close / vector of +1 when '(', -1 when ')', 0 otherwise
running: +\change / running sum of change
balance: *|running / last element - "first of reverse of", this is an idiom in K
dipped: &/running / minimum over running sum of changes
good: (balance=0)&~(dipped<0) / we balanced at the end, and no close before open at any point.
You can't really break things this fine in pipes, and the "one input one output" ergonomic used in pipes tends to restrict the data passed between stages (open and close are two "streams" in the pipe model - you could use more numeric descriptors, but then you lose the elegant pipe structure - and people rarely ever do that).
And after you've grokked it, you'd just write it as two expressions:
b:*|r:+\(x='(')-(x=')') / r:running balance; b:balance at end
g:(b=0)&~(&/r)<0 / good if balanced and no close-before-open dip
Which through time, you'd likely prefer to write as the one liner function
balanced:{b:*|r:+\(x='(')-x=')';(b=0)&~(&/r)<0}
That almost everyone on HN would claim is an unreadable abomination, but would still be perfectly readable to you five years down the line after one minute of reflecting. And then you'll see that Arthur Whitney's function achieving the same is twice as short, five times as fast, but may take you 5 minutes to grok. (Yes, I can write it shorter than this, but I won't).
> 2. Is this sort of like submitting jobs to a thread pool in a more controlled manner? Is this something that APL-alikes tend to handle better than other languages?
I don't know about Dyalog, but the underlying data model and K implementations do threads and processes quite easily (K threads have more sharing limitations than you are used to from other environments, though - but processes are just as easy)
the = (group) operator returns a dictionary from key to indices; e.g.
You can now process the letters independently using "each", e.g. "#:'. group" read 'count each value of group"
#:'. group
2 4 4 2 2 1
Want to do that in parallel? Switch from each (') to parallel-each aka peach (':):
#:':. group
2 4 4 2 2 1
Except this time it was done on multiple threads (or processes on the same machine, with some setup, or even on multiple machines). I think this qualifies as "better than most languages", and is enabled by (a) being mostly functional, so synchronization is rarely needed, and (b) making all loops implicit - "each" instead of explicit "for". Order of iteration is NOT guaranteed, and unhandled exceptions can only abort the entire computation. In return you get parallelization that actually works well with a single character change.
> 3. What do you mean by gather/scatter here?
Let's say we want to implement "uniq" for letters ourselves (even though it is built in; written "?" and called "distinct" these days. Remember our "group" from before? We could take the keys - !group would give that (ordered lexicographically in this case, but that's not always the case). But let's say we want it in order of appearance - we need to take the first of each list, so {x@*:'.=x} - x at the first of each index list of the grouping of x. The "x at" accesses memory non-sequentially (it does in ascending order in this case, but random in general). This is a "gather" operation because we gather data from many places, and if the indexes read are random, is the worst case use for cache systems. By making it a separate stage, it is much easier to optimize - for both the person, and the interpreter/compiler.
The "scatter" is the opposite. Want to do counts of each letter? "c:@[&256;0+"yabba dabba doo";+;1]" read: "in an array of 256 zeros, in each index you get from converting the string to ascii, add one. (From which you can get the lex sorted range with "`c$&c>0" : convert to char those indices where c is greater than 0). But the updates are scattered inside the vector. Do note that grouping allows the "+1" for different groups to happen in parallel without more help (though you'd have to explicitly parallel-each it with the current implementation)
Your explanations really help! I think I can see where k/APL-alike proponents are coming from; the semantics make sense, and it'd come down to recognizing what symbols or group of symbols do. It might take a bit of going back and forth with a reference, but it isn't impossible.
Yet another question: how would performance analysis of this function work? Does the interpreter actually materialize arrays for stuff like (x='(')-x=')', so you'd have several iterations over the entire input array?
Again, thanks for your great explanations! I'll definitely put k on my list of languages to explore in my free time.
Yes, almost all current interpreters and compilers do materialize the results and making whole passes through the array in such a case. In practice, since the primitives (e.g. “=“) are implemented in tight, predictable, branch free, often SIMD code, it ends up faster than the obvious branching C-like routine despite the interpreter overhead - especially since any strung under 16K is essentially guaranteed to be in L1 for the 2nd pass (and even if it’s not - it’s predictable linear prefetched access even in the first pass, almost as good).
When was the last time you wrote prefetching branchless SIMD code for any task other than your inner loop? In APL/K you pay interpreter and GC overhead, but often get those perks for your whole program “for free”.
Indeed, k/APl make use of your visual pattern matching in ways that no other language does (or can, with standard practices, due to code density). Some APl compilers identify idioms and replace them with a faster implementation (e.g. sum which is +/ is done by essentially all, more complicated things like “average” by some); but the cognitive load is independent of that - you don’t need a manual because a semantically equivalent implementation is in front of your eyes.
The closest thing I can think of to prefetching branchless SIMD code was some numpy scripts I did some work with, but I'm not sure it would be quite as tightly optimized.
It's really quite fascinating, and I'll have to see if I can handle the pattern matching as well as the more experienced APLers here. Thanks for taking the time to answer my questions!
Small correction: I forgot to sort the indicies by first appearance to guarantee by-order-of-appearance, so this should be {x@{x@<x}*:'.=x} where I've added "{x@<x}" - sort ("x at grade x"). Inner x and outer x are not the same one, this is an implicit argument; you can extract it out to a named function or name the argument e.g. {[v]v@<v} -- and I encourage you to do both when you start. But over time, "{x@<x}" becomes its own object, in the way that the word "while" is its own word and not the 5 letters that make it.
Are there any papers or such that have any details about what allows k/q to be so fast? Or would those be considered implementation secrets?
It's always nice to be able to read about the technology and design underpinning something, but in this case at least it'd be understandable if that were not available.
These languages come up here every now and then; they look very interesting, but it also seems like they must be very heavily optimized to be as fast as they are.
So, sorry if this is a dumb question, but does K run on architectures other than x86/64?
I checked https://kx.com/ but I couldn't find a definite answer. They mention IoT a lot, which makes me think it might run on embedded devices too?
kx also have a version for ARM (available here https://kx.com/download ). Note this is the non-commercial version, not sure if they have a commercial version for ARM.
Their frequent mention of IoT is (I think) more to do with storing data from IoT devices, not directly on the devices themselves.
Shakti (Arthur Whitney's current company) has also mentioned a wider support of architectures I think, but at the moment their k (k9) is in pretty early stages, still x86/64 only I think.
There are various other open source implementations of k (posted elsewhere on this HN thread), some of them might also support other architectures (although they won't have the performance of Kx/Shakti versions).
Ah, that makes sense - thanks. I guess that's just for Cortex-A ARM cores? I was hoping to try a lean language like this on a fast micro like the new Teensy boards, but I suppose those kinds of chips probably aren't used very much in their target applications like HFT.
I do wish that people would stop using "IoT" to mean "sending tons of data to the cloud", but I guess that's how the buzzword crumbles.
That's really interesting! What allows k to be so small compared to interpreters for other languages? Does it come down to the semantics of the language itself?
This is an interesting thread - thx for posting. I used to know a bunch of folks that used both J and K. I still use APL - the old APLWin 3.5 from APL2000 (Manugistics/STSC).
APL was the first language I learned as a tiny child. I write my pseudo code in it. I live in a ranch house out on the tundra, with a 50ft transponder antenna behind the place which links me to the 'net. I've used C, c++, Python with all the numpy stuff, etc and a few flavours of TensorFlow.
But it's APL that gets it done. I've heard of K, and I knew a guy who worked for Arthur Whitney. Languages like K and J and APL can be fast, if you use them right. And it is almost tragic-comical, that the genius of the APL glyphic character set was given up, just as wonderful devices were created which could paint any character at all on the screen. My girlfriend uses Japanese Kangi and Hirogana on her iPad, no problem. In a GUI environment, the original APL characters are easy and clear - no need for silly ASCII overloading.
See, I have these simple GUI apps, written many years ago in Windows, using APLWin 3.5. They still run, and I use them daily. They keep track of financial time series data, and I can use them in a variety of ways. The APL stuff can convert tables of various formats into conformable vectors, which can be transformed into input for GNUplot or TensorFlow or even (my favourite), Hinton's original Xerion product, which I have running on some Linux boxes.
APL is simply a wonderful, easy, useful language that can allow one single person to support some mission-critical production-grade software.
Example: This AM, I discovered that all the data tables that my data-table downloader had got last nite, had a new format, with a zero-row of data, and today's date.
This meant when I updated the database of time series data, I got tables where every series had a bogus, zero-valued most recent date.
It was 7:30 AM, and markets would open at 9:30.
A bit of code inspection, and a simple fix:
Update the MERGETABS function with two lines of code:
And this solved the problem. Update the runtime version
of the workspace by changing QuadLX latent expression to
start the GUI when the W/S is loaded, and roll out the new
W/S to a few different boxes. All done before Mkt open.
The code above just takes the new table PNEW, selects the
data columns, finds those that are zero valued and removes
them using a boolean vector reduction along the highest
dimension (the table's rows). Easy peasy. Done quick,
and done right.
With my data updated and correct, I run some stuff against
it, and get the courage to hold my positions - concluding
it will be an up day, as the stuff I run helps me make that
decision. I close the day up, just under $40,000 to the good. (Crazy day...)
And oh, btw, the APL stuff runs on both Linux and Windows
boxes - and the Linux is both 32-bit and modern CentOS 64-bit. I still run the old 32-bit boxes because they are very
fast and very reliable. The APL stuff runs on the Linux
boxes under WINE, which works amazingly well.
It's old school stuff, but it just works. APL is the magic that lets me do the other magic, that pays the bills. I gather that K can provide something similar for big corporate environments, when the datasets and timeseries get crazy large.
You can write anything in any language - but one must not lose sight of the objective, which is to make the business side of things work well. One can write APL - or any language - in a way that is unclear and confusing. But you don't have to do so. If you are careful, you can build APL solutions that are fast, easy to maintain, and smoothly flexible. Maybe this is possible in K as well?
At Shakti they are working on that; there are quite a lot experimentations with embedded boards in the (small) community of people who work on the core. They might provide ARM compiles (they already work) so it should work on quite a range of devices; however you do need somewhat more memory than for instance I am used to work with (MCU's with 23 kb usable memory); I think it needs at least a few 100kb to do anything and (probably) a few meg to do actual work with it. It'll improve.
Some years ago I was at the SF Clojure meetup. Arthur Whitney's daughter worked at the sponsoring company, and he agreed to come tell us about K.
In retrospect, I don't think we gave him the welcome he deserved. No one was rude or anything, but it seemed there was a disconnect: Arthur was keen to show off how fast K (and Kdb) was, over zillions and zillions of rows.
But the thing that the Clojurists were all searching for (that got them into Clojure in the first place) was expressivity. Is Clojure fast? I don't know, generally the problems I face come down to avoiding balls of mud rather than performance bottlenecks. And I think that was true of most there.
So Arthur got something of an underwhelming reception. I remember someone asking "Does K have the ability to self-modify, a la Lisp macros?" When Arthur said no, you could see most people in the room just sort of mentally shrug and move on.
And this was too bad. Because recently I've been playing around with J (another APL descendant) and been very impressed by some expressivity/readability benefits. Some small things that have very big effects on the codebase you actually end up with.
The first thing is the avoidance of abstraction. To use a Twitterism:
Broke: Just write your code and don't divide it into functions, creating one long main method
Woke: Divide your code up, naming parts that get reused
Bespoke: If your code is made up of really really short things, it ends up being shorter than the names you would use, so you can just write the thing rather than your name for it. An analogy would be: there is no human-comprehensible way to communicate the idea of "picosecond" in less time than an actual picosecond.
The other thing I didn't expect was the benefit of multiple dispatch being baked into e v e r y t h i n g. In Clojure I might write (map + a b) to add each index together; in J I could just write a+b.
This is neat stuff! Best practices for keeping complexity down in APL's tend to be the opposite of what they are in other languages. Aaron Hsu gave a talk about this: https://www.youtube.com/watch?v=v7Mt0GYHU9A
It's too bad! Arthur came to tell us about speed---there's a reason it's used on giant datasets in finance, where performance translates directly into cash---but I wish we'd had the presence of mind to ask more about experience of writing K.
So, Arthur, if you're reading this: Sorry everyone seemed kinda bored in SF a few years ago when you kindly came to present. We missed out!