I worked in Pascal, C/C++, Java, Python, R, Nim, Clojure, Rust, Go, Js, and created various projects in these languages.
However, I couldn't get used to Q. I understand that it is fast, and I also started to like the functional programming aspect of it. But oh boy, there is no proper error messages (no, "`type" is not helpful). The short versions of various map and apply were handy, but there are no equivalent versions with longer names, therefore there is no way to write a code which is readable to a non Q-expert. The strange evaluation order made it impossible for me to read other people's code, I often started to add parentheses and checked whether the behavior changed. There is no debugger (I used KdbStudio). Once I used more than 16 local variables, which gave a runtime error with some strange unhelpful error message.
I'm not questioning its usefulness, but I think that it could be much more developer friendly without compromising speed.
I have worked in C (twenty-five years), C++, Python, Perl (twenty years), JavaScript, Common Lisp (for ten years), OCaml, forth, postscript, and also q/Kdb.
I think a lot can be done to improve teaching q/Kdb.
The current “best practice” is to change you until it makes sense but this takes time- anywhere from 6 months to a couple years based on your other experiences. And you still have to want to “get it”.
I want that to be better.
However this is alien technology: the terseness is a feature. Limited locals are a feature. The evaluation order is a feature. These things actually help it go fast (as surprising as that is!)
We only just recently got “a debugger” and good error messages because of repeated complaints and wishes for it from newbies, but there is a reason experienced Kdb/q programmers never wanted it. Why doesn’t that reason prompt those newbies to figure out why?
Yes, ideally you would learn the way and the why we do things, but there are a number of serious hurdles to overcome: To you q/Kdb seems merely unpolished even if maybe “fast”. Is it worth it? “Maybe”, you suggest, but then there are crazy people like me telling you something preposterous; that q/Kdb is actually incredibly well crafted, highly readable and an absolute pleasure to use.
I'm also trying to say that everything is worth learning: Nim and Python and JavaScript are all basically the same thing. You learned one of them, you kindof learned them all, so adding another one feels like these things are easy to learn. Alien technology is alien though, you haven't learned any of it. How can we even talk to each other?
I'm hopeful: Lambda was tricky, but it snuck in to things. Can we get tables, views, and high code density?
We need to find something better- some better way to talk about it, but the peanut gallery is loud.
Thank you for the time to respond. I understand that speed is very important and I'm happy that Q takes it seriously. I was never questioning that part. I also understand the beauty of functional programming paradigms (working with maps and applys), I wrote many lines in Mathematica without "for" loops. I also understood many code snippets written by developers working only with Q (I guess they are experts). I don't know the full power of Q, but I can imagine what is achievable.
I haven't worked with Q for 18 months, the debugger sounds cool. Forgive me if my knowledge is not up-to-date. Also I wanted to use the language immediately without learning it for months, which might be the main source of my frustration.
I believe that there could be a better developer environment without making the system slower, and that could reduce the time required to use the language efficiently from the several months to several days or hours.
The evaluation order is not clear to me. I understand this expression:
q){x+2*y} scan 2 3 5 7
(the result is: 2 8 18 32). However, I often had to read more complicated code which contained 1 or 2 character long operators. Without fully understand their syntax, I was unable to tell whether those operators take values from both sides, which stopped me from understanding the evaluation order. It would be great if a tool could convert this expression to
q)scan[{x+2*y};2 3 5 7]
and that tool also would replace cryptic 2 character long operators by long readable names (e.g., MapForTables).
I think that the error messages need to be as verbose as possible. You mentioned that experts don't ask for the debugger and verbose error messages. I think that they simply got used to not having these useful features, but they would use them eventually. I'm okey with the limited number of variables until it gives a proper error message if violated.
People working with Q told me that it was hard for them to restart working with Q after 1 month break, because they forgot the lexical knowledge needed for the efficient work.
I'm happy to, and by all means reach out -- instructions to find my email are on my profile page and I'm on github, etc.
I'm not speaking tongue-in-cheek when I say the short variable names and terseness are a feature. I wrote briefly on this subject previously[1], but there's something difficult to get across here because I have been changed.
Learning how to read very dense code changed the way I think about programs, and it has completely eliminated several classes of mistakes that I used to make.
I can see repetition where I never noticed it before. I require far less abstraction to get the business problem solved. And yes, my programs are faster.
It's also put me across the table- really, might as well be an ocean: q/Kdb is fast because it is written in this way. If you try to write another language/interpreter that isn't all bunched up on one line, that has friendly error messages and long variable names, it won't be as fast. From here it is plain and obvious why, and yet I struggle to show you over there.
> The evaluation order is not clear to me. I understand this expression {x+2*y} scan 2 3 5 7
Single-character operators, and functions in the dot-q namespace (like scan, which is really called .q.scan) take an "argument" on the left-hand side. Everything else doesn't.
> It would be great if a tool could convert this expression to
parse[2] can do this, and it was really useful to me when learning q.
Short names and terseness are indeed features - for those who have already internalized the meaning of names and symbols. The problem is that bridging this ocean takes such a long time that for many people it's just not worth it (and for many others it doesn't _seem_ worth it).
It's doubly frustrating because it doesn't have to be that hard. With a projectional editor descriptive names, explicit evaluation order, inline expansion of functions and macros (e.g. for the C code mentioned below) could just be a button press away. And it's not like experienced coders would never benefit from such tools, either.
> Nim and Python and JavaScript are all basically the same thing. You learned one of them, you kindof learned them all, so adding another one feels like these things are easy to learn. Alien technology is alien though, you haven't learned any of it. How can we even talk to each other?
This is an understatement. It helps explain why a person who has never programmed will pick up APL/J/K easier than someone who already knows Python or C.
One of the things that the GP complains about is evaluation order - in k it follows a simple rule very consistently - programs are evaluated top to bottom, and each expression is evaluated right to left. There are no precedence rules. The instance that I see trip people up all the time is the cast operator $ - people coming from other languages like C or Java expect this to bind more tightly than other operators, but doesn't.
> It helps explain why a person who has never programmed will pick up APL/J/K easier than someone who already knows Python or C.
I disagree with this.
It is probably easier to teach someone APL/J/K if they don't think they already know Python or C, but, an experienced C+K programmer will be able to teach a willing C programmer K very quickly.
Many q/Kdb guides start with simple things like addition and subtraction but with vectors, like a Python programmer really cares about the difference between:
prd 1+til x
and:
numpy.arange(1,1+x).prod()
and heck, that Python programmer probably thinks that looks better or "cleaner". They might argue about whether it should be two steps or an extra parenthesis, or whatever, but the relationship between capability and semantic appears roughly the same.
That's why I start with tables, IPC, and views.
It's one of those things that makes python programmers go "whoah": They quickly get kdb tables are way better with queries, that IPC is "like pickle times asyncio on steroids", and "just wow" to views.
If you do not see someone do something that you are struggling with quickly and effortlessly it will be difficult to drop your ego; If you really think "yeah but everything else is easier in python" then it is a handle to hold onto.
q/Kdb makes a lot of really hard things easy, and it gets them extremely right. That's right: Fuck speed. I like programming in q/Kdb because I'm more effective: It's a better language, with a better set of builtins and libraries.
I mostly love K/Q, and recently rearranged my career to do more of it. There are some things to be aware of, though:
* If your code doesn't spend most of its time in primitive verbs operating on large vectors, it's gonna be more or less as slow as any other interpreted language. Q and kdb+ can be fast and beautiful if you can arrange your problem in the right way, but it's not magic.
* The internals are locked away. If you don't like the way something fundamental works, tough. I've known some folks to go to heroic lengths with debuggers and hacked up shared objects to get Q to do what they want. You could also get Kx to add the stuff you need (they're pretty reasonable and responsive). But, you can't really take it apart and put it back together again like you can with, say, Lua, Ruby, or Python.
* Relating to the above point, one of the weaknesses of the language is that there are a lot of useful (even necessary) features packed into weird corners. There's little room for abstractions beyond the basics, so you get stuff like CSV parsing controlled by the structure of lists passed to a function called "0:". It's getting better documented lately, but it's still not pretty.
* Various annoyances (no real module system, no lexical scoping, etc...)
In many of those cases, I'm not even sure what could be done without compromising some other aspect of the language. Most of the time (at least for me), it's really a joy to use.
1.) The in-memory DB .exe was around 500 KB. Imagine that.
2.) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.
3.) The documentation and driver support is abysmal.
4.) It's supposedly extremely fast, but I can't help but wonder if this is a lot of successful PR and hype (like hedge fund bosses insisting on Oracle because it's the only db that 'scales')
I used to use KX/kdb/Q/K daily for several years. I wrote a full implementation of reinforcement learning (15 lines), a lightweight MVC framework (to show reports and tables in an internal webapp) and even a Q syntax checker (abusing table as a data structure to hold parse trees). Good or bad, for the longest time, Q was my "go-to" programming language.
Based on that experience...
1) Yes, but that's not huge by modern standard.
2) Q is a DSL version of K. As others have commented, K is a pretty clean implementation of APL, and Q makes K more approachable.
3) I have to agree here, but Q for Mortals makes up for it.
4) It is really fast. As we all know, a vast majority of us actually don't have terabytes and terabytes of data, especially after a reasonably cleanup / ETL / applying common sense. I suppose it helped that I worked in finance, which meant my desktop had 16GB of memory in 2009 and 128GB of memory on a server shared by 4-5 traders.
Finally, Q was never intended for general-purpose computing nor a widespread adoption. At least when I was an active user, the mailing list had the same 20-30 people asking questions and 3-4 people answering them, including a@kx.com (= Arthur Whitney, the creator). Back then, I'd say there were at most 2-3k active users of Q/K in the world. Now that Kx Systems is part of First Derivative and has been working on expanding their customer base, perhaps they have more...?
It is worth pointing out that really fast is ... well ... really fast. See [1] for some benchmarks they did for small, medium, large data sets.
The machines that $dayjob-1 used to build dominated the STAC-M3 for a few years (2013-2015) because we paid careful attention to how kdb liked to work, and how users liked to structure their shards. Our IO engine was built to handle that exceptionally well, so, not only did in-memory operations roar, the out of memory streaming from disk ops positively screamed on our units (and whimpered on others).
I miss those days to some degree. Was kind of fun to have a set of insanely fast boxen to work with.
OP could have phrased it better, but I presume his point was that 500KB is extremely small by modern standards. The whole executable fits comfortably in L3, so you'll probably never have a full cache miss for instructions. On the other hand, while it's cool that it's small, I'm not sure that binary size is a good proxy for performance. Instruction cache misses are rarely going to be a limiting factor.
> Instruction cache misses are rarely going to be a limiting factor.
k's performance is a combination of a lot of small things, each one independently doesn't seem to be that meaningful. And yet, the combination screams.
The main interpreter core, for example, used to be <16K code and fit entirely within the I-cache; that means bytecode dispatch was essentially never re-fetched or re-decoded to micro instructions, and all the speculative execution predictors have a super high hit rate.
When Python switched the interpreter loop from a switch to a threaded one, for example, they got ~20% speedup[0]; I wouldn't be surprised if the fitting entirely within the I-cache (which K did and Python didn't at the time) gives another 20% speedup.
Yes, I presume it's very fast because of a number of smart design decisions. I would guess that the relatively small on-disk size of executable is a consequence of these decisions, rather than a cause of the high speed. And as you point it, it's really the design of the core interpreter that matters.
When Python switched the interpreter loop from a switch to a threaded one, for example, they got ~20% speedup[0]; I wouldn't be surprised if the fitting entirely within the I-cache (which K did and Python didn't at the time) gives another 20% speedup.
I'm familiar with this improvement, and talk it up often. Since certain opcodes are more likely to follow other opcodes (even if they are globally rare) threaded dispatch can significantly reduce branch prediction errors. But despite not having measured the number of I-cache misses on the Python benchmarks, I'd be utterly astonished if there were enough of them to allow for a 20% speedup. My guess would be that the potential is something around 1%, but if you can prove that it's more than 10% I'd be excited to help you work on solving it.
I am not involved with k, and things might have changed significantly, but around the 2003-2005 timeframe, Arthur had very conclusive benchmarks that showed I-cache residence makes a huge difference (IIRC I-cache was just 8KB those days ...).
The people who surely know what difference it makes today are Nial Dalton and Arthur Whitney.
around the 2003-2005 timeframe, Arthur had very conclusive benchmarks that showed I-cache residence makes a huge difference
That sounds quite plausible. The front-end of Intel processors (the parts that deal with making sure there is a queue of instructions ready to execute by the backend) has made some major advances since then. The biggest jumps were probably Nehalem in 2007, and then Sandy Bridge in 2009.
It's not that binary size no longer matters, but you almost have to go out of your way to make instruction cache misses be the tightest bottleneck on a hot path. And when it would be the bottleneck, the branch predictor and prefetch are so good that it's usually only a problem when combined with poor branch prediction, so it really only adds to the delay rather than causing it.
In order for the Q interpreter to fit in that small size, the language has some rather severe limits. For example, function parameters, local variables and conditional branch sizes. Forcing users to structure code around these limits feels a bit archaic to me. This is what compilers are for.
Would be really interesting to read a write up on your experience. What do you program in now? How do you look at other PLs now? What do you miss and what are you happy "just works"? What do you think other PLs (especially languages like Lisp, which are very high in terseness) can learn from Q?
I would compare Q (and other APL-related languages) to Vim editor. There you have some carefully chosen operations which are easy to perform. They don't take much efforts. They are also easy to compose in useful ways - because the corresponding properties support that. Since the basis of editing operations is fairly large, you have many operations; but when you know many of them, you can work powerful edits.
Lisp on the other hand is more like Emacs - naturally. Here we have a small, carefully chosen orthogonal basis of abstract operations - not domain-specific, but "theoretically-foundational" small basis. Then you have a library of macros on top of that, and ability, of course, to extend.
In other words, basis for APL is "classical" math, made executable and expanded with mechanisms required to put in one line programming constructs (logic, control flow, ordering...). It's harder to expand, but you don't often need that. Lisp is a specific branch of math, lambda calculus, which is provably enough to solve arbitrary programming problem. The "inner core" of Lisp is also hard to expand, but what you expand for your task is "the usage" of the language, which is made to be straightforwardly expandable.
> 1) The in-memory DB .exe was around 500 KB. Imagine that
It still is, but the hot path is much smaller than that.
> 2) The Q language syntax, while consistent, is fairly arcane and throwback to decades past.
I can't comment about this. I don't mind the syntax. I prefer k syntax though.
> 3) The documentation and driver support is abysmal.
This is getting a lot better. The fusion API[1] goes a long way towards better "drivers", and the new documentation site[2] shows a lot of energy being put into organization. There's also Q for mortals[3] which is linearized for people who like that.
2) I like K better (fitting the ideals of APL); I feel Q was done by Arthur to please some big client as it doesn't feel like he would choose that kind of thing (that's from reading interviews, seeing the iterations and his basic code philosophy)
I like K better than Q too, but J[1] clicks with me more.
J has JDB[2] and Jd[3] for things somewhat similar to qdb with Jd being the commercial offering similar to qdb rather than JDB.
I would probably choose APL over Q if that were a choice. In J you can always make your definitions (verbs, nouns, etc...) plain words if you like the way Q reads.
Dyalog appears to be more popular with conferences and more products, but it costs $1k ish for a commercial license and nobody else can run your code without a license and server licenses aren't cheap. It also pretty much needs a special keyboard and a key mapping. J is free for pretty much everything and uses standard characters (although I really like the APL characters). I think they're both nice.
Dyalog comes with a keyboard layout (on a Mac it just replaces the alt keys). It's quite easy to use. GNU APL's Emacs mode does the same thing, although mapped to super rather than alt (meta, in Emacs) by default.
I'm aware of the licensing costs, I'm more curious about whether Dyalog is obtaining popularity versus J, and if so, why. Of course, three new people going to the Dyalog conference would be a 10% increase in popularity, it looks like… so maybe this far out on the long tail it doesn't matter.
Yea, I was just saying the key mappings can be a pain and your favorite keyboard probably doesn't have the APL symbols on it. The Dyalog IDE has a virtual keyboard, but I don't like those too much. If none of that bothers you, than no biggie. I'm guessing Dyalog has more production users and a bit more users than you see at the conference as they are typically held in the UK. J is free, so I bet a lot more people try it even though Dyalog has a free hobby license. J has a nice built in plotting library"viewmat" while Dyalog has sharpleaf. Both are nice, but sharpleaf has a GUI like doing charts in Excel. Dyalog can easily hook-in to .NET, so that is pretty helpful on Windows in the real-world. I'd agree it's a wash right now. What is your background and needs?
My background is I know too many languages and don't get enough shit done and my need is probably to stop it and get back to work. :)
Being slightly more serious, I do web dev, mostly backend, for a radio astronomy observatory. I don't know anything about the science, but I wind up executing their routines in the cluster and doing typical database apps. I don't have much time on the side but I have been enjoying trying to learn J and realizing how much applied math is missing in my background!
I also dabble in Dyalog APL. I have an inexplicable bias for the symbols, however I really like J and the commmunity. I have played with Jd with a trial license, and as said above, J is free and the source is available for scrutiny. I have played with using the J DLLs in my C code. I am always amazed at my takeaway understanding of a mathematics problem after working it out in J. It somehow gels it in my mind, and fits with the equations in normal math symbols.
Probably, but like Cuneiform's wedge-shaped marks, it brings an easy familiarity to a line or lines of text or code. It would be cool to make APl in Vulcan or Predator symbols!
Any other downsides? Management has been convinced and we're apparently switching to it at work soon but there is so little information and most of that is marketing, so it's hard to get an idea of what we're walking into.
Aside: has anyone heard any recent news about the K5/K6 rewrites of the underlying interpreter? There was a fair amount of chatter about these a couple of years ago, but all gone rather quiet...
Currently it looks like k7 is going to be an actual commercial product in the near-ish future. Arthur is still hashing the details out, but the language design is substantially similar to k6. If anyone ideas or strong opinions about how to make k7 better, now would be a pretty good time to email Arthur.
For those not already in the loop with respect to k6, the reference card (http://kparc.com/k.txt) provides a good overview. Note that it is neither exhaustive nor completely representative of the current state of the language.
I have been an on and off q user for a few years. KDB+/Q is a great system, but in my opinion it is best used for its correct purpose;and that purpose is primarily time series and data streaming.
The columnar structure of the DB as well as its IPC layer make it very good at creating chains of processes that can be used to stream row updates and branch them out to different processes with different responsibilities. Likewise it's on disk database is great for running complex (time series) queries.
This speed and terseness comes at a cost of being fairly "old school" in its approach. It's only recently for instance the we got stack traces, and readability is definitely not for the faint of heart though this depends on how it is written.
In my view the biggest thing that is needed right now is better tooling and libraries. Some attempts have been made to do this, and I am hearing that the new initiatives by kx will be addressing this in the coming months. The lack of standardized testing library/framework also can be problematic, as every team that I have seen does it slightly differently, and a "best practice" would beneficial.
SQL's only virtue is that it is well known; It is otherwise not very good compared to other query languages, and every SQL engine extends it somewhat differently, extensions which you can't avoid cause the standard is too limited. I agree most query languages would be better off as minor extensions to SQL - but kdb+/Q is different.
Unlike SQL which pays lip service to Codd's relational model but breaks it with things like TOP, ORDER BY, LIMIT and others, the Q language embraces the order between tuples to great effect, making e.g. "as-of" queries which are quite common trivial; whereas in SQL and the relational model, as-of queries are inefficient in either execution time or storage space (usually both), and reasonable execution speed schemas cannot, in fact, guarantee their integrity (which is often quoted as one of the the main advantages of the relational model).
As another example, Q implements "foreign key chasing" also called "reference chasing", which is also implemented in the web2py DAL and surely others; compare[0] the equivalent tpc-h query:
in q:
select revenue avg supplier.nation=`BRAZIL by order.date.year from lineitem
where order.customer.nation.region=`AMERICA, order.date.year in 1995 1996, part.type=`STEEL
in sql:
select o_year,sum(case when nation = 'BRAZIL' then revenue else 0 end)/sum(revenue) as mkt_share
from(select year(o_orderdate) as o_year,revenue,n2.n_name as nation
from part,supplier,lineitem,orders,customer,nation n1,nation n2,region
where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey
and o_custkey = c_custkey and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey
and o_orderdate between date('1995-01-01') and date('1996-12-31') and p_type = 'STEEL')
as all_nations group by o_year order by o_year;
Q is great. I think wider adoption would come from two places:
1. adding more syntactic sugar to the language to improve readability
2. open-sourcing the thing or building an open-source compatible Q interpreter with all the nice features of KDB (functional programming, vector-based data structures, "scripting language within a database" approach, web features: HTTP server and websockets, etc.. etc.. )
The author of Nial, Mike Jenkins, has recently released v7 of Nial. Nial is akin to Q in that many of the operators are keywords rather than symbols. Its computational model is slightly different due to its roots in Trenchard More's array theory.
For a certain set of applications, I believe the difference is speed. K has been said to even outpace C, even though that's the language it's written in. AW did an amazing job with his optimizations.
> what makes kdb so special and why isn't there an open/libre alternative?
Maybe I'm reading too much into this, but it seems like you expect the answer "nothing much" to the first part of your question. I have done nothing more than read lots of articles on KDB's lineage and play around a bit with J, so take my answer with a considerable lump of salt, but my impression is that the answer to the second part of your question is "because there's something 'so special' about KDB"; my understanding is that it provides blazing-fast access to memory-compact databases, from a tiny codebase, building on the APL/J legacy. Why can't this be done in an open sourced way? Well, surely there's no inherent reason, but the fact that it hasn't been is probably evidence that it's not just a problem of trivially cloning existing work (or else someone would have done it).
The real answer is Arthur Whitney is a god-like being with coding powers beyond the mortal realm. Seriously though they're a lot of stories of his excellent work if you search for them. He's supposedly working on kOS so you can run kdb+ on bare metal. Tiny, fast code is what he does. Sure it might look incomprehensible, but it is a few pages of code he can store in his brain at once. Aaron Hsu (someone has some links to his HN posts on here) explains this with his compiler which converts APL to GPU code. His entire compiler (which he has worked on for a long time) is a handful of pages. He says something like "there's no need for an abstraction if I can see everything at once". Most programming languages have implementations that are very long...which do you think has less bugs? Steve McConnals Code Complete has a statistic somewhere on bugs per 100 loc. That doesn't leave a lot of room in Arthur's code base (although you could say his code being so terse makes it an apples to oranges comparison I guess).
Also important to note is that J is the final language from the inventor of APL (Ken Iverson) who also got the Turing award for APL. J has some advancements from vanilla APL & doesn't require a special keyboard or symbols. Some of these new features were added to Dyalog APL (modern APL with good support). Roger Hui is a well known figure in the APL community. He helped write J and works for Dyalog APL now.
.... and the J implementation is inspired by the "A" miniature APL interpreter written by Arthur Whitney who later went to create K. See http://code.jsoftware.com/wiki/Essays/Incunabulum for more (and google "J incunabulum" if you want to see other people's commentaries)
This code is basically obfuscated by hand. Absolutely unapproachable. Only the original author(s) can understand it.
Judging by other comments, it seems to work well. So they seem to be good programmers producing working code. It's just not intelligible by other human beings, which is a pretty bad thing, but not the only factor in software quality/health.
The Vim code base is at some parts straight batshit insane, but it's one of the most polished programs I've ever used.
All that being said, I would find infuriating to work with such code. Nope!
But it's just a foreign language; You could look at Japanese text[0] and make similar statements, and would be just as valid (or rather, invalid) as your statement.
You expect to be able to read it because you're used to a class of languages which are all similar enough at the surface level -- perhaps you are even familiar with more than one fundamentally different classes, say, "lispish and algolish" or "germanic and latin". But that doesn't make APLish or Japanese[0] horrible.
This is just APL using C syntax.
[0] assuming, the proverbial you does not know Japanese
And k/q is an effective method of writing very fast software for domains such as finance understood by thousands of people.
Just because you find it strange, it does not mean it's strange.
I'm sure you'd find languages in Papua New Guinea difficult at times, and you'd argue English is "better" because more people speak it, and there is a richer literature in it, but the ideas expressed in those languages - and how they are expressed uniquely in those languages - are valuable to the people who speak them.
No one here said it was strange. The original word was 'unhealthy'.
I'd be very interested to learn more about which semantics of APL-derivatives correspond to which of the 4 or 5 distinct meanings of the keyword 'static' in C.
I don't see where this code is getting a definition for K. (The return type of the ctype function).
It's not in the "k.h" header, though that references it also.
Maybe the build system injects it through the compiler command line or a forced-include header. Though the gcc command line alluded to in the block comment header shows no evidence of that.
Edit: Found it! It's a typedef for a pointer to a struct k0:
That's the dominant data structure in the whole program. It's a little different from what's used in APL and J, but essentially it represents an array which carries around its length, shape, and pointer to items. An explanation for the J version is in An Implementation of J [1].
There are a handful of helper functions to get/set various parts of this structure, and most of the functions in the language (and its implementation) take 1 or 2 of these and return 1.
I can see the argument for terseness. In fact I loathe how verbose Java is, I try to minimize LOC, and I enjoy using ?:. However, I don't think it's a good idea to use this extreme single-character, single-line style in languages not designed for it. If you really want this style, you should use APL or design your own language.
That would be a good step, but I don't think a preprocessor alone is enough to make this a good idea. You want good compiler error messages, static analysis tools, debuggers, code editors. You want a whole language.
They have a whole language there, didn't you notice? Namely, this Q thing that the submission is about.
What we're looking at here is some of its C implementation internals. They are using short identifiers and the C preprocessor to help with terseness. The justification is that this is "like APL". But there is enough C cruft there that it's not really like APL. A possibility would be to generate whatever code those C macros are generating, but with some other preprocessor which polishes the notation a little bit.
Even if those analysis tools and debuggers were developed for Q, they likely wouldn't apply to this code.
The point (mine, that is) is that I basically agree that it's not a good way to code in C, but perhaps the style could work better as a notation that transliterates to C.
what a gap. an abyss. inventor, and users of kdb explicitly and deliberately DO NOT WANT these things (static analysis tools, debuggers, code editors). nor they need it. language is so simple, there is no need for static analysis or code editors. (they compromised on error messages somewhat recently, though; now error messages are two words instead of one)
personally, i disagree with them on debugger. would be nice to have one.
You are right. There is an impassable abyss. I will never agree with anyone who believes that good error messages, analysis tools, or debuggers are not worth having.
I don't think it's a good idea to import the style of a language into its implementation written in a different language; at least, not to this extent. When in Rome...
> You think if I need to write C, that I should do things that make my program larger and slower and less correct?
Yes (marginally), when those things are at odds with the tools and conventions of C, because there are serious downsides you don't list. Compiler error messages will be difficult to understand. Tools for code analysis and debugging will be less useful or even completely unusable. Most importantly, because your personal style is alien to other people you lose the ability to collaborate with other developers.
A language designed for this would mitigate most of those downsides.
Longer identifiers don't make C programs larger. (Well, shared libraries have larger symbol tables, and unstripped executables have bigger debug info.)
Using multiple lines and indentation doesn't change executable code.
> This is only software I have ever seen that actually got smaller over time.
I agree, this is admirable!
Even if it is under different names (which I think is a far better approach), Sustrik does this too: http://250bpm.com/blog:50 He went from AMPQ (not his own creation) -> ZeroMQ -> nanomsg -> Libmill (essentially Go in C/UNIX style)
Also, OpenBSD comes to mind (LibreSSL for example).
Libmill/Libdill (go coroutines) are not on the AMPQ->ZeroMQ->Crossroads->Nanomsg (socket abstraction) path ; in fact, they have essentially nothing in common except sustrik.
I wish you'd actually read the article before shooting your mouth off with a Well Actually. FTA:
> Next one: nanomsg. An alternative to ZeroMQ.
In comments:
> How would you split nanomsg, then?
> [...] 2. coroutine library, e.g. libtask, libmill [...]
Not to mention that this is obvious if you understand what the core of each thing is in terms of capabilities. If you are gonna be the "technically..." guy, at least do it right.
It's about decomposing things, which is the essence of good design, and - supposedly - the UNIX philosophy at its heart.
As an actual user of ZeroMQ, nanomsg and dabbler of libdill, I assure you I know what I am talking about. Libdill/libmill is NOT on the same arch. It could be used to implement the next generation of zeromq/nanomsg, but it does not, in fact, provide any functionality that these provide today - it would be an internal implementation detail if sustrik followed through on that comment. But it would not be a nanomsg rewrite, likely something completely new - nanomsg has been rewritten, several times in several ways, by Garret D'amore who has taken over. D'amore explicitly rejected using e.g. libuv (mentioned in that comment) as a basis for a nanomsg rewrite. There's an "uncertainty" principle at work here - separating of concerns generates dependencies which in the grand scheme of things are not necessarily good.
In fact, one of D'amore's rewrites used native OS threads in lieu of a co-routine library, which worked perfectly well on FreeBSD and Solaris, and abysmally on all other OSses. The main reason to use a coroutine library in a messaging library is actually that the underlying OS threads implementations scale so badly on most modern systems.
Commenting like this will get you banned on HN regardless of how much you know or how right you are. Please reread the site guidelines (linked at bottom of every HN page) and take them to heart from now on.
Enjoyed the blog post from Sustrik. I also found Hintjens' work in line with my own sensibilities, e.g., his libero code generator. That is perhaps a good example of "finished" software.
Whenever I have suggested on HN that there is such a thing as "finished" software that is free of serious bugs, I get some resistance. There is a consistent knee-jerk reaction citing the same tired, old meme, "All software has bugs", and "Software is never finished."
Sustrik's post proves I am not the only one perplexed by this strange belief that no software is ever finished.
IMO, it is not a question of being infallibile or being available to fix bugs. The point is that there are programs that are not continuously growing in size and complexity. They are not "dead". They are "finished".
As for OpenBSD, certainly some programs I would consider "finished" but overall the size of the kernel and base distribution are in fact growing. Not only new drivers, but new programs and new libraries continue to be added. More code means more probability for bugs and vulnerabilties.
Anyway, less code, more terse syntax means it can be easier to find problems. Not everyone will agree with this of course. But I agree with Whitney and others. Less code makes it easier for me.
> There is a consistent knee-jerk reaction citing the same tired, old meme, "All software has bugs", and "Software is never finished."
It's not as much of a meme as fairly good heuristics. Most software has bugs, or depend on other pieces of software who have bugs. Most software can be improved or integrate with new technologies that weren't prevalent a few years ago. If software is good, it generally has a solid user base (relative to its target audience), and people will ask for tweaks and features--author can rightfully reject most of them, but it's a rare thing that absolutely none of them merits to be accepted.
So, when you discover a project that might be useful to you but you never heard of it and it's been inactive for years, I still believe it's quite a good rule of thumb to be leery before committing serious time and efforts to using it in anger. Which doesn't mean that counter-examples don't exist, naturally.
I tend to think it's more about people using languages that fail to correctly provide enough assurances about how they function in all cases that assumptions have to be made in practical use, and those assumptions end up being wrong in odd, minute ways, or on new platforms with slightly different behaviors, or after compiler writers decide they want to take advantage of some ambiguity for the sake of performance.
When someone trying your software with a newer compiler or a newer CPU or a slightly different architecture than you wrote it on and it doesn't work right, it's easy to come away thinking programs are never "done".
With earlier versions, e.g. k2.8, there is a `show command to trigger a pop-up window that reminds me of Tcl/Tk, containing the values in editable fields.
The interpreter is terminal friendly and works without the GUI but it has no formatted output of tables in ASCII like in k4. x11 libraries are a dependency.
Yes, the "electric" GUI was incredibly fast and effective. The K2 GUI was basically the only GUI system I've ever used in which it is easier to write a GUI for a simple system than a batch command line. It looks basic and wouldn't have won any design awards, but it was crazy fast and crazy effective. See e.g. The S- spreadsheet (copied here in its entirety), see near the bottom of http://nsl.com/papers/spreadsheet.htm for screenshots and discussions
(a) it was a lot to maintain compared to the rest of the k environment, as it needed to be implemented for all systems independently (X, Win, Mac at the time), and introduced dependencies that were about 10x larger than the base system.
(b) it wasn't actually a selling point - being simple and local GUI meant that while it was nice for the programmer users, it wasn't useful for end users, who usually had no K license of their own, and who expected everything to be web-able and importable to Excel.
(c) it wasn't extendable with more widgets. Needed a small HTML control or video player in your GUI? touch luck.
1010data's web UI isn't actually based on the K2-era GUI framework; it is architected as a reasonably conventional web application that happens to use K3 as a server-side language. The 1010data query language offers facilities for making data-driven UIs for interactive reports and the like, but any semantic similarity to the K2 GUI system is probably coincidental.
>any semantic similarity to the K2 GUI system is probably coincidental.
Hardly coincidental! I very much had the K gui in mind when I originally architected it (and I'm not sure I'd describe it as a "reasonably conventional web application"...)
So 1010data's purpose is to sell analytics tools to kdb+ users? Does kdb+ not come with a simple way to pipe output to a chart? For the price, I'd hope they had something more than a REPL.
1010data's analytics tools and database are entirely their own product, developed separately from kdb+. While these are implemented in K (K3, which precedes kdb+/q/K4), the end users are not K programmers any more than WordPress users are PHP programmers.
Not directly related, but Microsoft just released a new language for developing quantum languages, Q#, which doesn't seem to be related at all. I hope this doesn't cause too much confusion.
You can purchase a license if you want to be unaffected by future license changes. It might not be the right price for you, but it is for some. They are, in fact, courting only rich customers.
Some languages are worth learning to expand your horizons. Lisp is one of them, even if you never use it, and the APL family (of which K/Q are members) is another. My C code has become faster, simpler, shorter and less buggy after I dabbled in K. YMMV, but using it in a commercial setting is not the only reason to look at it.
1. K (and the entire APL family) eschew many of the layers upon layers of abstraction that modern software engineering uses, whether they are justified or not. It turns out, that they are mostly not justified. K gently pushes you toward thinking in a lower level of "what's really happening here?"; I'm not sure I can give a good example here - but the world looks different after taking the red pill. e.g., it is not uncommon in K to represent a tree as an two arrays, one of data, and one of parent pointers. Once you shake the "but I must abstract this!" feeling, you realize it works better.
2. K gently encourages doing work on batches of data. That is, it is idiomatic (and easier) to write functions that operate on arrays, and return arrays of processed data, then writing functions that operate on one element at a time. In turn, this means that the resulting program more often than not works in stages where each stage processes its entire input before going to the next stage (which uses the output from this stage as its input). In the "old" C/C++/C#/Java/Python world, it is idiomatic to push each element through a pipeline before going to the next one.
3. K encourages building a set of orthogonal, non trivial operations and combining them in various ways, rather then building layer upon layer of abstraction. It gives an example by giving an extremely useful basis of functions. E.g. http://nsl.com/k/t.k implements a very fast, reasonably capable in memory database with aggregation, joins, projects and more. It takes all of 14 lines, all quite short. While you can't implement it in 14 lines of C, using the same principles you can probably do that in less than a hundred. Achieve the same in idiomatic C is going to be much harder and longer.
4. Finally, K gently encourages solutions which work well with modern memory and storage hierarchies. E.g., it encourages linear scan operations that touch all elements of an array over random access ones that touch only 1/10 of the elements. Instinct and idiom in other languages will guide you towards the latter, but the former is often much faster.
This is my experience as well, especially with respect to the "layer upon layer of abstraction". Abstraction isn't bad, but it's notoriously hard to do well.
Also, I had been looking at J (again) recently, and noticed that it has a similar concept of composing operations as K does. I guess it is because both are from the APL family of languages, as shown in the "Influenced" section here:
Yes, APL is the ancestor; Arthur Whitney wrote A, a miniature APL interpreter which inspired Roger Hui’s implementation of J (designed by Ken Iverson who previously designed APL).
J optimized for purity, K for practicality.
I remember reading a commentary from Iverson that, despite J’s beauty and theoretical niceness, at least two practical choices made by K turned out to be better:
1. Doing left-to-right scan and fold; this is inconsistent with parsing, but turns out to be significantly more useful
2. K’a minimalistic currying (juxtaposition) is not as nice theoretically as J’a trains and forks, but turn out to be much more useful in practice.
However, I don’t remember where I read that and cannot find the source now.
Got it, thanks. Agree about the layers of abstraction. In fact, a similar example, though from BASIC: way back I had read a very good book on it, in which the author showed the creation and use of many data structures and algorithms using nothing but arrays, and indexes as pointers to other elements (in other arrays). It was not difficult to follow, either.
Don't get me wrong... I love kdb+/q ... I want kOS. I love forth (bare metal forth + tcp? win!). I just don't really do much with them, though, other than play.
> Some languages are worth learning to expand your horizons. Lisp is one of them, even if you never use it, and the APL family (of which K/Q are members) is another.
However, I couldn't get used to Q. I understand that it is fast, and I also started to like the functional programming aspect of it. But oh boy, there is no proper error messages (no, "`type" is not helpful). The short versions of various map and apply were handy, but there are no equivalent versions with longer names, therefore there is no way to write a code which is readable to a non Q-expert. The strange evaluation order made it impossible for me to read other people's code, I often started to add parentheses and checked whether the behavior changed. There is no debugger (I used KdbStudio). Once I used more than 16 local variables, which gave a runtime error with some strange unhelpful error message.
I'm not questioning its usefulness, but I think that it could be much more developer friendly without compromising speed.