What the actual f? It's like someone threw up the noise that modems make during ...

geocar · on May 20, 2021

I am going to tell you something fantastic, but first, I want to explain some things about this:

    us:{$[#i:&{(y~*K)&"*"~*x}':x;@[x;i;:[;,"_"]];x]}

The first is that there's a typo in what bidirectional wrote. The above is correct. The second, is what it is. Once I have explained that, I can tell you the fantastic thing.

k syntax is very simple. There's just a few forms you need to be aware of:

f x

which applies x to f.

    a f b

which is apply f to the two arguments a and b, and:

    f[a;b;c]

which allows you to do three arguments. You can write the first one as f[x] and the second as f[a;b] if you like even more consistency. f can be an "operator" -- that is a symbol. The symbols ' / and \ are special and called adverbs. These adverbs have a special form if followed by a colon, so ': is different than ' and has nothing to do with : or '. I think Arthur just ran out of keys on the keyboard. Once you have those, parenthesis () and braces {} have some special syntax, just like double-quotes " do.

With the syntax explained, let us try to understand what we are looking at.

us: is how we start assignment. You can say "us gets" if you like (the colon can be pronounced). {} braces surround a lambda, this one takes a single argument "x" (the first argument). $[a;b;c] is cond like in lisp; if a then b else c. # means count. i: is another assignment.

& means where -- the argument to which is going to be a bitmap like 000100b or 01101b or something like that, and where returns the indices of the set bits; the former example being the list 3, the latter example being the three-element list 1 2 4.

Another lambda comes next: We can see it takes two arguments because there's an x and a y in there (y is the second argument). We can get a clue as to what it expects because the following adverb ': means each-prior. This tells us "x" is going to be a list of things, and this lambda is going to consume them pairwise. If given the list {(x;y)}':"iliketacos" we get the result:

     {(x;y)}':"iliketacos"
    i 
    li
    il
    ki
    ek
    te
    at
    ca
    oc
    so

y is the "previous" value, and "x" is the current value. The "where" before it tells us we want to know the indices where the condition inside is true. Let's try and understand that condition.

y~*K is in parenthesis. Parenthesis group, so we execute them first (just like in other languages). We're looking for a situation where the previous value is the first (that's what asterisk means here) of K. What is K?

    K:("select";"distinct";"partition";"from";"where";"group";"having";"order";"limit")

So we're looking for a value (x) whose previous (y) is the first of K which is "select". The "&" that follows here is "and" - Arthur likes to overload operators since there aren't many symbols on the keyboard and this is something you get used to.

So you can read {(y~*K)&"*"~*x}':x as simply trying to find the sequences "select star" -- given a list ("select"; "*"; "from"; "potato") you get 0100b and from ("select"; "*"; "from"; "("; "select"; "*"; "from"; "potato; ")") you get 01000100b. I think the attempt is to disambiguate the asterisks in the sql:

    select * from tacos where cat=4*42

but sql is a strange and irregular language, so this kind of thing is necessary. Back to our query:

    us:{$[#i:&{(y~*K)&"*"~*x}':x;@[x;i;:[;,"_"]];x]}

i is going to be the locations of the asterisks following select. If the count of that is nonzero; we're going to do the @-part, and if not, we're just going to return x.

    @[x;i;f]

is called amend. It returns x, but at indices i, we apply them to f, so it's x[i]:f[x[i]] which is pretty cool. f in this case is a projection, of "gets" (the function colon) with the second-argument bound to an underscore. That is:

    :[;,"_"]

is just a function. That's how @[x;i;:[;,"_"] replaces all of the asterisks that follow select with an a "_"

Almost. It's actually a list of length one, rather than the scalar "_". I haven't read everything in sql.k but this is probably important elsewhere.

Ok. Now that I have explained what this is and what it does, I am ready to tell you something fantastic. I read this:

    us:{$[#i:&{(y~*K)&"*"~*x}':x;@[x;i;:[;,"_"]];x]}

as:

"us gets a function, that finds the indices of asterisk following the first element of K, and then replaces the things at those indices with underscores"

Literally. From left, to right. Just that fast. And I only program in k part-time. That's not the fantastic thing. The fantastic thing is that by learning to read k, I am almost miraculously able to read other languages faster. This:

    copied=False
    for i in range(1,len(a)):
      if a[i] == "*" and a[i-1] == "select":
        if not copied:
          copied = True
          a = a[:]
        a[i] = "_";

gives me some grief for being so irregular and gross, and I have to look up range/xrange and len and memorise a much more complex set of rules for syntax, and I have to track the order of things carefully and so on, but I have places in my brain, made by k, for those things, and so I am able to absorb code in other languages faster.

If that does not amaze you, I do not think you have considered the ramifications of what I said. I can suggest maybe reading it again (or maybe actually reading what I wrote instead of skipping to the punchline), but if after two or three tries you are still lost, maybe you can ask a question and I can try to answer it.

jiggawatts · on May 21, 2021

To me, this is infinitely more readable:

    let mut input = ["select", "*", "from", "potato"];
    for i in 1..input.len() {
        if ["select","*"] == input[i-1..=i] {
            input[i] = "_";
        }
    }

If I could be bothered to dig up a Haskell compiler, I'm sure it's possible to do a one-liner list comprehension that is both terse and readable.

If I was doing this in Rust, it's easy to create an iterator extension that does something like "map_lookback" which explains the intention without requiring comments.

I'm going to be blunt: Unreadable array languages are popular with quants because they work in a highly competitive, cut-throat industry where "write only" languages provide job security. I've come across developers purposefully obfuscating code by using only one-character identifiers and zero comments. One of them literally blackmailed their employer, demanding their salary be doubled on the grounds that there was no chance their code could be maintained by anyone else. He should have gone to jail for that, but he had his managers by the balls, got what he wanted, and bought a house with cash soon after.

Why are we applauding this?

geocar · on May 21, 2021

> To me, this is infinitely more readable:

Yes, to you. Just like Chinese would be infinitely less readable to you, if you don't know Chinese.

Do you think this is a deep observation?

The real challenge is knowing whether it is worth it to learn k so that it becomes readable.

- How many characters is it? This is a useful metric when you realise bug/defect rate is proportional to the physical size (in rows and columns of source code) of a program given equivalent processes (Moore 1992; McConnell 1993) but that process matters more than anything else. Not tooling, not "memory safety", and certainly not "readability" by people unfamiliar with the language.

However "readable" you think your rust code is, did you notice the bug in your rust code? (hint: input is not supposed to be mutable)

- How fast is it? On my 2014 i7 macbook air, the supplied us averages 232 msec for 100k cycles. My version

    us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':

is faster: 126msec for 100k cycles.

- How quickly was it written? I can't speak for sa/atw on us since I didn't see them write it. I wrote mine in about 30 seconds including testing. Yes really. How long did your rust program take to write? Did you think simply because you thought you understood the requirements that you didn't need to test it?

- How quickly can someone familiar with the language read it? Again, I can't speak for anyone other than myself, but I'm not a k expert -- I program in it very infrequently, and the new amend-syntax in k9 I had not run across previously. And yet I read it as quickly as I said.

These four values (less code, fast run, fast write, fast read) are the biggest most important things to me. And anyone who shows me all four of things will get my attention.

Rust? Simply does not impress.

> Unreadable array languages are popular with quants because they work in a highly competitive, cut-throat industry where "write only" languages provide job security.

That's another interesting opinion. This one might even be true amongst some quants (Most of the ones I know that use k don't particularly like k). But I suggest you try not to expect the worst in people. Yes, some people are assholes, but most people aren't. And for what it's worth, I'm not a quant (I work in Advertising).

jiggawatts · on May 21, 2021

> 126msec for 100k cycles.

Or to put it another way: 1,200 nanoseconds. That's about 3,000-5,000 instructions on a modern CPU. Believe it or not, that's actually pretty bad.

After jumping through some hoops to ensure that rustc doesn't just compile the whole thing down to a constant, I benchmarked my version as taking 15-20 nanoseconds per iteration. About 45-80 instructions!

I actually couldn't quite believe it myself, so I jumped through more hoops to ensure that it wasn't being optimised away, wasn't getting inlined too aggressively, etc... No change.

Ran it through Godbolt to inspect the assembly, and then I realised that, yes, modern languages, compilers, and CPUs really are this good!

Think about it: For the specific input example with 4 strings the algorithm boils down to: compare 7 bytes with 7 bytes, replace a pointer with another pointer, then compare 2 bytes with 2 bytes three times. That's about a hundred assembly instructions, or thereabouts.

Godbolt output:

    push   rbp
    push   r15
    push   r14
    push   r13
    push   r12
    push   rbx
    push   rax
    mov    r12,rdi
    mov    ebx,0x8
    xor    ebp,ebp
    lea    r14,[rip+0x3cb3c]        # 444e8 <anon.6527da4acb4810bb73692fa85a2e25ef.0.llvm.5506334730328533235+0x20>
    mov    r15,QWORD PTR [rip+0x3f3b5]        # 46d68 <bcmp@GLIBC_2.2.5>
    cs nop WORD PTR [rax+rax*1+0x0]
    nop    DWORD PTR [rax]
    cmp    rbp,0x2
    je     79f2 <example::testabc+0x62>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1]
    cmp    rdx,QWORD PTR [r12+rbx*1]
    jne    79ec <example::testabc+0x5c>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1-0x8]
    mov    rdi,QWORD PTR [rbx+r14*1-0x8]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     79c0 <example::testabc+0x30>
    cmp    r13,0x2
    jb     7a07 <example::testabc+0x77>
    lea    rax,[rip+0x3060e]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x10],rax
    mov    QWORD PTR [r12+0x18],0x1
    xor    ebx,ebx
    xor    ebp,ebp
    nop    DWORD PTR [rax+rax*1+0x0]
    cmp    rbp,0x2
    je     7a43 <example::testabc+0xb3>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1+0x8]
    cmp    rdx,QWORD PTR [r12+rbx*1+0x18]
    jne    7a3d <example::testabc+0xad>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1+0x10]
    mov    rdi,QWORD PTR [rbx+r14*1]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     7a10 <example::testabc+0x80>
    cmp    r13,0x2
    jb     7a58 <example::testabc+0xc8>
    lea    rax,[rip+0x305bd]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x20],rax
    mov    QWORD PTR [r12+0x28],0x1
    mov    ebx,0x8
    xor    ebp,ebp
    nop
    cmp    rbp,0x2
    je     7a93 <example::testabc+0x103>
    mov    r13,rbp
    mov    rdx,QWORD PTR [rbx+r14*1]
    cmp    rdx,QWORD PTR [r12+rbx*1+0x20]
    jne    7a8d <example::testabc+0xfd>
    lea    rbp,[r13+0x1]
    mov    rsi,QWORD PTR [r12+rbx*1+0x18]
    mov    rdi,QWORD PTR [rbx+r14*1-0x8]
    call   r15
    add    rbx,0x10
    test   eax,eax
    je     7a60 <example::testabc+0xd0>
    cmp    r13,0x2
    jb     7aa8 <example::testabc+0x118>
    lea    rax,[rip+0x3056d]        # 38007 <_fini+0xd13>
    mov    QWORD PTR [r12+0x30],rax
    mov    QWORD PTR [r12+0x38],0x1
    add    rsp,0x8
    pop    rbx
    pop    r12
    pop    r13
    pop    r14
    pop    r15
    pop    rbp
    ret    
    nop    WORD PTR [rax+rax*1+0x0]

Rust? It simply impresses.

geocar · on May 21, 2021

I'm not sure getting the wrong answer fast is something to be proud of.

Your rust code has a bug in it, is longer, and you spent more time on it.

jiggawatts · on May 22, 2021

Please elaborate on that bug. The rust function I provided replaces "select, *" correctly with "select, _". Is that not what it was supposed to do? I checked that it works with empty arrays, arrays with one entry, etc...

Thinking about it, this version boils down the logic to its barest essence:

    fn test(input: &mut [&str]) {
    
        let mut was_select = false;
        for i in input.iter_mut() {
            if was_select && *i == "*" { *i = "_"; was_select = false; }
            else { was_select = *i == "select"; }
        }
    }

It runs in 5.5 nanoseconds per iteration, which is just absurdly fast. That's about 200x faster than your K version!

What I like about this Rust version is that it reflects the approach that for a computer processor is optimal, yet it is high-level and readable. I could convert that "test" function easily enough to a generic "replace" function similar to a string search & replace, but one that can operate on any mutable iterator with the maximum possible efficiency, not just arrays.

Then, your K code that requires "careful reading" to slowly tease apart the meaning could be converted to a form that is practically prose:

    let mut input = ["select", "*", "bar", "potato"];
    replace( &mut input, &["select", "*"], &["select", "_"] );*

geocar · on May 22, 2021

> The rust function I provided replaces "select, *" correctly with "select, _". Is that not what it was supposed to do?

No. It is not supposed to modify its input but return a copy.

jiggawatts · on May 22, 2021

Granted, if replacing arbitrary sequences with arbitrary sequences, then copying is necessary, as this could result in the output length increasing.

However, for replacing scalars only, the in-place mutable version is in some sense superior: it doesn't force a memory allocation. Moreover it can be trivially converted into a copying version by simply wrapping it in a function that first copies the input, and then mutates it in-place. The reverse is not true: the copying version cannot be wrapped to create a non-allocating version.

To be honest, I wish more languages put this kind of effort into their standard libraries, but most have about 10% of what I would like to see. For example, this kind of "string matching" is really "sequence matching" and ought to be fully generic for any underlying comparable and copyable type, not just arrays of characters. String search algorithms like Boyer-Moore ought to be directly applicable to arrays of integers or enums, e.g.: for recognition of code patterns in lists of parsed tokens.

At least one person has done something like this for Rust: https://github.com/peterjoel/rust-iter-replace/blob/6c575eeb...

Similarly, there's the new InPlaceIterable, which is interesting but not quite enough to suit my taste: https://doc.rust-lang.org/std/iter/trait.InPlaceIterable.htm...

geocar · on May 22, 2021

> However, for replacing scalars only, the in-place mutable version is in some sense superior

I mean, we are talking about someone else's code and someone else's decisions. If we can change the rules, you're absolutely right we can do much much better.

But beware microbenchmarking too much: Giving the treatment you gave your rust to your entire program can be more than exhausting, it can actually end you up with a slower program simply because your program gets too big!

k makes a lot of compromises to stay small enough to keep both the interpreter and the application in L1, but whole-program speeds benefit from this treatment sometimes by factors of 1000x or more, and that's hard to show with these microbenchmarks as well.

> To be honest, I wish more languages put this kind of effort into their standard libraries, but most have about 10% of what I would like to see. For example, this kind of "string matching" is really "sequence matching" and ought to be fully generic for any underlying comparable and copyable type, not just arrays of characters

In APL, this is called ⍷ (pronounced "find") and sometimes even APL-ers momentarily forget it exists[1], but in k you always have to make it yourself, usually (as we did today) with ⍸ (where) and ≡ (match), but sometimes some other way[2], and this works on all the different data types k supports (including integers, enums, dates, times, symbols) and across multiple cores as well. You are right to predict it would be useful: It is useful :)

[1]: https://news.ycombinator.com/item?id=27229994

[2]: https://news.ycombinator.com/item?id=16851862

rscho · on May 21, 2021

Because you're wrong. Do spend six months doing hobby array programming. Your view will change.

jiggawatts · on May 21, 2021

It's on my bucket list. But it seems so niche that I suspect I'll never use it.

My only motivation is that I've been tempted to develop a toy programming language myself that is as high-level as Rust or C++ but explicitly designed for "wide" processing platforms such as GPUs or many-core processors with SIMD instruction sets.

All I'm saying is that the concept of array programming -- like pure functional programming -- may be valuable, but the syntax is not.

klibertp · on May 21, 2021

I just can't understand how even experienced programmers can fixate on syntax. In my - repeated tens of times by this point - experience the syntax is important for a few months (tops!) at the beginning of using a language, and then stops to matter almost completely. What experiences would make someone convinced otherwise? There's an argument against too high complexity of syntax, but most general-purpose languages out there are very similar in this regard...

jiggawatts · on May 21, 2021

There's empirical evidence, loads of it, that this simply isn't true.

Syntax absolutely does matter, in all walks of life, not just programming.

We're not infinite, error-free computers like some sort of mathematical abstraction. We have squishy meat brains evolved to tell stories around a campfire.

As a random example: I learned Rust just for fun, and something that repeatedly caused hours of frustration is that unlike every other modern language, it allows identifier shadowing. This compiles and runs just fine:

    let a = 5;
    let a = "wat?"

Practically no other language allows this, because it is a recipe for errors.

Internally, compilers typically use single static assignment, so the above would be processed something like this:

    let a_1 = 5;
    let a_2 = "wat"

For a compiler to track this is no problem. For a human? It's a problem. Maybe not for trivial examples like this, but if dozens of identifiers are littered across a huge function it can be a challenge to keep track of which one is which if the names are the same but they're... not the same. Especially if the change is subtle.

We're not computers, that's why we make them out of silicon and metal and sell them for money.

klibertp · on May 21, 2021

> There's empirical evidence, loads of it, that this simply isn't true.

Source?

> unlike every other modern language, it allows identifier shadowing

1. All dynamically typed languages allow this, all/most REPLs even for statically typed languages allow this, even if regular code doesn't.

2. This is not syntax, at all, this is semantics of immutable value declaration+initialization.

> but if dozens of identifiers are littered across a huge function it can be a challenge to keep track of which one is which

Littering dozens of identifiers across huge functions is going to be a problem, no matter the syntax. It's a programmer's job to manage complexity by utilizing various kinds of techniques, including syntactic sugar (true), but also factoring the code into manageable chunks, using higher-level or better suited for the task at hand abstractions, and so on. Syntax on its own is, in my experience, the least impactful technique for managing complexity, at least before going into DSL-land (but then it's syntax+semantics).

Another thing worth mentioning is that lots of general-purpose languages give you exactly the same constructs, just spelled differently. Simple examples:

    for (x : list) { ... }
    foreach (x; list) { ... }
    for x <- list do ... end
    for x in list: ...
    for my $x (@list) { ... }
    for x := list { ... }
    list each(x, ...)
    list each: [ :x | ...]
    (loop for x in list do ...)
    (for [x list] ...)

All the above denote exactly the same construct, with very little variation in the number of tokens. Can you tell me if you think one of them is better than the others - and if so, why?

rscho · on May 21, 2021

Syntax can certainly be more or less suitable to a particular problem or even to humans in general, but (dis)allowing identifier shadowing is part of the semantics of the language (i.e. what the syntax means). One could say: "there should be syntactic cues if we are to allow variable shadowing". That would be a well-defined criticism of syntactic issues. BTW, most modern languages do allow identifier shadowing. The particularity of Rust and the object of your criticism is allowing it in the same scope as the definition.

But again, the only way you'll be able to apprehend array language notation is by writing enough of it to become somewhat fluent. It doesn't mean you have to like it, but remember it was the object of a Turing award and that's usually a good indicator of something relevant.

geocar · on May 21, 2021

> the concept of array programming -- like pure functional programming -- may be valuable, but the syntax is not.

You would not be the first person tempted to try and separate the two, but I think it is impossible. Every attempt to do so I have ever seen has lost too much in the translation that the four-values I have for programming are no longer conserved.

rscho · on May 21, 2021

https://futhark-lang.org/

Is the closest you'll get to that, I think. It took a whole team of big brains to make it, and the author stated it was highly non-trivial to make something like that.

jiggawatts · on May 21, 2021

Interesting, but not quite what I had in mind.

My idea was loosely based on a code search engine Google had for a while before they killed it off.

I'm not sure exactly how they did it, but I very strongly suspected that they implemented regular expressions over database indexes.

A simple b-tree index is just a set of sorted strings. If you have a lot of sorted strings and you squint at it, you can see how it picks out common prefixes, like a tree. (You can literally store it with the prefixes factored out, and then you have a trie.)

For a fairly wide range of regular expressions, and a tree of sorted strings, you can implement a search over the b-tree. E.g.: the search "[bx](a+)foo" can be implemented via finding the range starting with "b", then the range starting with "x". For each range, find the sub-range starting with "ba", "xa", etc...

Each step takes logarithmic time, and can be done in parallel. In theory, it takes only about a hundred times longer to find a ten matches in a million strings than it would take to match a single string.

Wouldn't it be nice if you could take an algorithm such as "match regex" that normally takes a 'scalar' string and have the compiler automatically create a version that can take an array of sorted values, like the database index?

The idea is that much like how array programming languages eschew scalars and pass arrays all over the place to gain efficiency, my language would pass sets all over the place, and gain a different type of efficiency.

kazinator · on May 20, 2021

TXR Lisp:

  1> (defun rewrite (fun list)
      (build
        (while* list
          (let ((nlist [fun list]))
            (if (eq list nlist)
              (if list (add (pop list)))
              (set list nlist))))))
  rewrite
  2> (defmacro rewrite-case (sym list . cases)
      ^(rewrite (lambda (,sym)
                  (match-case ,sym
                    ,*cases))
                ,list))
  rewrite-case
  3> (rewrite-case x '(foo bar * select * fox select * bravo)
       ((select * . @rest) ^(select _ . ,rest))
       (@else else))
   (foo bar * select _ fox select _ bravo)

rewrite-case and rewrite appear in the TXR Lisp internals; they are used in the compiler for scanning instruction sequences for patterns and rewriting them.

E.g. a function early-peephole looks for one particular four instruction pattern (six items, when the labels are included). Rewriting it to a different form helps it disappear later on.

  (defun early-peephole (code)
    (rewrite-case insns code
      (((mov (t @t1) (d @d1))
        (jmp @lab2)
        @(symbolp @lab1)
        (mov (t @t1) (t 0))
        @lab2
        (ifq (t @t1) (t 0) @lab3)
        . @rest)
      ^((mov (t ,t1) (d ,d1))
        (jmp ,lab3)
        ,lab1
        (mov (t ,t1) (t 0))
        ,lab2
        ,*rest))
      (@else else)))

This is much more general and powerful than a hack which just looks at successive pairs for an ad-hoc match. rewrite-case can have multiple clauses, of different lengths, and arbitrary matching complexity.

kazinator · on May 20, 2021

The original requirements should be addressed. The thing being matched is not just select, but actually any one of a set of symbols that appear in K.

We can stick that data into the pattern matching syntax using the or operator:

  3> (rewrite-case x '(foo bar * select * fox where * bravo)
       ((@(or select distinct partition from where
              group having order limit) * . @rest) ^(,(car x) _ . ,rest))
       (@else else))
  (foo bar * select _ fox where _ bravo)

Or put it into a variable:

  4> (defvarl K '(select distinct partition from where group having order limit))
  K
  5> (rewrite-case x '(foo bar * select * fox where * bravo)
       ((@(member @sym K) * . @rest) ^(,sym _ . ,rest))
       (@else else))
  (foo bar * select _ fox where _ bravo)

"If an object sym which is a member of K is followed by * and some remaining material, replace that by sym, underscore and that remaining material."

Hash table:

  6> (set K (hash-list '(select distinct partition from where group having order limit)))
  #H(() (select select) (where where) (limit limit) (order order)
     (having having) (distinct distinct) (partition partition) (group group)
     (from from))
  7> (rewrite-case x '(foo bar * select * fox where * bravo)
       ((@[K @sym] * . @rest) ^(,sym _ . ,rest))
       (@else else))
  (foo bar * select _ fox where _ bravo)

This code golfs moderately well: https://news.ycombinator.com/item?id=27227276

specialist · on May 20, 2021

Wow. That's some explanation. Thanks.

Do you use "rainbow brackets"?

This example is first hit I found: https://kristofferc.github.io/OhMyREPL.jl/latest/features/ra...

Such an obvious idea once you see it. Wish I had syntax coloring and rainbow brackets when I coded LISP for hire.

geocar · on May 20, 2021

> Wow. That's some explanation. Thanks.

Happy to help.

> Do you use "rainbow brackets"?

No. I flash brackets, but I tend to turn off other forms of syntax highlighting. I find it extremely distracting when the syntax highlighter "decides" wrong, and I've become convinced comments at the end of lines like //} to "fix" the highlighter deal with complicated stuff is hurting more than it's helping.

smir · on May 21, 2021

In J language single sided brackets are functions/verbs

kazinator · on May 20, 2021

> I have to look up range/xrange and len and memorise

All you're communicating here is that you don't regularly work with Python.

The claim that you have to look up len seems disingenuous; I might believe it if you didn't look like a speaker of English.

> & means where

And that's something any engineer would know, unlike having to look up what len means?

geocar · on May 20, 2021

> The claim that you have to look up len seems disingenuous

In what way?

I have to look up "how do I get the indices of a list" to get range(1,len(x)) -- I think I could have also used enumerate() and a bunch of other things, but this seemed the shortest.

> All you're communicating here is that you don't regularly work with Python.

I hope I'm communicating more than that because I put a lot of effort into my comment. I don't regularly work with k either.

What exactly do you think you are communicating?

kazinator · on May 20, 2021

What I mean by that is that len is an obvious abbreviation for length that crops up in numerous languages. Not to mention code bases. It's simply not plausible you could forget what len means in Python after confirming one time that it is exactly what it looks like. (Unless you're a struggling non-native speaker of English forgetting the word length itself.)

kazinator · on May 20, 2021

Let's do this using the same approach, "find indices and assign over them in a copy of the sequence":

  This is the TXR Lisp interactive listener of TXR 259.
  Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
  Do not operate heavy equipment or motor vehicles while using TXR.
  1> (defun subst-select-* (list)
        (let ((indices (where (op starts-with '(select *))
                              (cons nil (conses list)))))
          (if indices
            (let ((list (copy list)))
              (set [list indices] (repeat '(_)))
              list)
            list)))
  subst-select-*
  2> (subst-select-* '(foo bar * select * fox select * bravo))
  (foo bar * select _ fox select _ bravo)

(conses list) gives us a list of the list's conses: e.g in (1 2 3) the conses are (1 2 3), (2 3) and (3), so the list of them is ((1 2 3) (2 3) (3)).

where applies a function to a sequence, and returns the 0-based indices of where the function yields true.

(op starts-with '(select *)) yields a lambda which tests whether its argument starts with (select *). No brainer.

If we naively applied that to the conses, we would get thew rong indices: the indices of the select symbols, not of the asterisks.

The workaround for that is (cons nil (conses list)): we cons an extra dummy nil element to shift the positions, and process the resulting list.

Once we have the indices list, if it isn't empty, we copy the original input, and assign underscore symbols into the indicated positions. To do that we generate an infinite lazy list of underscores; the assignment takes elements from this list and puts them into the specified index positions.

geocar · on May 20, 2021

If this were me, and I needed a function like us, I would have written this:

    us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':

I would be interested in seeing anything that was shorter[1] and faster than that in any language, and I would be very curious to learn from anyone who could also do that faster than me.

But I'm not a fetishist: I didn't learn k because it was cute, and I don't wake up every day looking for ways to rewrite other people's code so that it is slower and bigger. Do you? Or is today special?

[1]: Measured in source-code bytes.

rscho · on May 20, 2021

> Do you? Or is today special?

Today is the usual. He does that in all threads related to array languages.

kazinator · on May 20, 2021

> I would be interested in seeing anything that was shorter

One way to do that would be to pose that as a problem on the Code Golf Stackexchange.

My rewrite-case solution condenses (by removal of all non-essential spaces) to 69 bytes if the symbols are in a hash table K, and the input list is in a variable y, rather than a big literal:

  (rewrite-case x y((@[K @sym]* . @rest)^(,sym _ . ,rest))(@else else))

A one-letter name could be chosen for the macro. Furthermore, one-letter names could be chosen for sym, rest and else:

  20> (r x y((@[K @s] * . @r)^(,s _ . ,r))(@e e))
  (foo bar * select _ fox where _ bravo)

Still working! Now down to 43 bytes.

(It's not because I cannot that I do not do this with all my code.)

Doh, why use . ,r in the backquote if we are golfing? That should be ,*r: using the splice operator, like Common Lisp's or Scheme's ,@, getting us to 42 bytes:

  20> (r x y((@[K @s] * . @r)^(,s _ ,*r))(@e e))
  (foo bar * select _ fox where _ bravo)

I suspect Code Golf Stackechange regulars could get it down to way in one of the dedicated golfing languages like Retina or what have you.*

What else can we do? The @[K @s] predicate syntax could be replaced by a custom operator defined by defmatch, looking like @(K s).

  21> (defmatch k (sym) ^@[K (sys:var ,sym)])
  k
  22> (r x y((@(k s) * . @r)^(,s _ ,*r))(@e e)) ;; 41
  (foo bar * select _ fox where _ bravo)

Just noticed the ) * is not fully golfed:

  23> (r x y((@(k s)* . @r)^(,s _ ,*r))(@e e)) ;; 40
  (foo bar * select _ fox where _ bravo)

The k pattern operator macro could be non-hygienic: it could implicitly bind a variable called s:

  24> (defmatch k () ^@[K @s])
  k
  25> (r x y((@(k)* . @r)^(,s _ ,*r))(@e e)) ;; 38
  (foo bar * select _ fox where _ bravo)

These last few feel like cheating because they are too special purpose. Defining anything you can only possibly use just once isn't making the overall program smaller.

kazinator · on May 21, 2021

Also-rans:

  1> [window-map 1()(do if(and[K @1](eq'* @2))'_ @2)y]
  (foo bar * select _ fox where _ bravo)
  2> (mapcar(do if(and[K @1](eq'* @2))'_ @2)(cons()y)y)
  (foo bar * select _ fox where _ bravo)
  3> (mapcar(lambda(:match)((@[K] *)'_)((@a @b)b))(cons()y)y)
  (foo bar * select _ fox where _ bravo)

rscho · on May 20, 2021

you'll never get J/K brevity though. Because in opposition to Lisp where control flow is explicit in the syntax, most of the control flow in array languages is implicit. I love lisp too, but for some (rare) use cases it's not the best.

kazinator · on May 20, 2021

You will never get J/K brevity by writing a condensed for of normal Lisp because spaces are often required to separate tokens.

You can get implicit control flow via macros.

Furthermore, all control flow is implicit is some way.

Take basic old progn: (progn (foo) (bar)) evaluates (foo) and then control implicitly passes to (bar) because that's the next thing.

The only control flow which is not implicit is that of a state machine described by a table or graph, indicating the next state for every input and current state.

(How can you say you can't get implicit control flow in Lisp, when I'm using pattern matching to express if you see this prefix, replace it with that?)

Speaking of macros, you could always resort to a macro like this:

  (j"... line noise in character string ...")

The j macro expands the string to code at compile time. The code could be a bona fide J implementation, or something like it. That's still not down to J, because of the (j"") wrapping chewing up five characters. In a Lisp with a programmable read table, that could be reduced to two character framing or something.

Avshalom · on May 21, 2021

okay so it's not really smaller because it uses UTF8 also it relies on order of operations to get rid of the parens around ⍵≡'*' but

us←{('_'@(1+⍸2{(⍺≡⊃K)∧⍵≡'*'}/⍵))⍵}

is one(1) character shorter.

rak1507 · on May 21, 2021

⍵≡'*' should be ⍵≡,'*' like in the original k

us←⊃,⍥⊆2{(⍺≡⊃K)∧⍵≡,'*':,'_'⋄⍵}/⊢ is shorter anyway

geocar · on May 21, 2021

    us←{'_'@{¯1⌽⍵⍷⍨(⊂⊃K),'*'}⍵}

rak1507 · on May 21, 2021

Ah, ⍷, clever, don't even need the outer {...⍵},

  us←'_'@{¯1⌽⍵⍷⍨(⊂⊃K),'*'}

geocar · on May 21, 2021

Yes!

You could also do: ⌽'*',⊂⊃K to save another character. I don't know if that's "better" though.

rak1507 · on May 21, 2021

Or alternatively '*',⍨⊂⊃K which I think I prefer over ⌽, but neither seem that 'better' in my mind anyway.

wanderer2323 · on May 22, 2021

Is this the same language or a different one?

rak1507 · on May 22, 2021

This is APL, k is a descendant of APL.

rak1507 · on May 21, 2021

Why do you pass in (*K;,"*") as x instead of hard-coding it in x~(z;y)? Clarity?

geocar · on May 21, 2021

Great question!

     v:("select";,"*";"from";"tacos")

     us:{$[#i:&{(y~*K)&"*"~*x}':x;@[x;i;:[;,"_"]];x]}              
     \t:100000 us v
    233
     us:{$[(*K;,"*")~(y;x);,"_";x]}':
     \t:100000 us v
    206
     w:(*K;,"*");us:{$[w~(y;x);,"_";x]}':
     \t:100000 us v
    179
     us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':
     \t:100000 us v
    129

I think it's important to remember just how simple k is: We the programmer know that (*K;,"*") isn't supposed to change, so we should be explicit so k "knows" this as well. In addition to never changing, we also know the value is only used once, so we really don't want to look anything up in the workspace every time we call us: Again, let us be explicit.

On the other hand, this:

     us:{$[#i:& XXX ;@[x;i;:[; YYY ]];x]}

is an extremely recognisable idiom. It occurs several times in sql.k so the reader is probably used to seeing it at this point. It also has a similar syntactic structure to the APL '@' so it may be more "obvious" to an APL programmer. Maybe the reason I write it this way is that I'm not a very experienced APL programmer :)

rak1507 · on May 21, 2021

Thank you for the extensive answer, that's very interesting that there's such a performance difference. I would have naively assumed if anything keeping it as a constant would be faster.

Another question: Why do

  $[#i:& XXX ; @[x;i;:[; YYY]]; x]

when (if I'm not mistaken, which I could be) you can do

  @[x;i:& XXX;:[; YYY]]

Performance again? A quick test using ngn/k did seem to find the first one faster, but only marginally.

geocar · on May 21, 2021

That's a harder question. Why do other people do the things they do? One reason might be because it was convenient for them to write (and debug) things that way.

But it is possible they are doing it for performance reasons, so maybe it's worth considering why it should be faster? That is to say, should the programmer assume the latter is faster than the former?

To do that, I would suggest putting yourself in the shoes of the implementor: You have a choice of how it should be implemented. Knowing this decision will affect every use of @[x;i;f], should it begin with an if statement like this?

    if(y->n == 0)return x;

But before you answer, remember the next thing @[x;i;f] needs to do, is consider if x has any additional references, because if it does, it has make a copy of x. That means there already needs to be an if statement that looks like this:

    if(x->r > 0) x = copy(x);

So one way to think about this, is to ask if you are implementing @[x;i;f], should you choose one branch or two?

    $[#i:& XXX ; @[x;i;:[; YYY]]; x]

Also: Do you see the part that goes :[;YYY]? That's constructing an object. Do you think it is worth avoiding that allocation if possible?

rak1507 · on May 21, 2021

Ok, that all makes sense, thank you.

And thanks for pointing out the pattern in the first place, it makes sense now how you were able to parse the original expression so quickly.

leephillips · on May 20, 2021

Is the open source version (Kona) any good? Or do we have to go proprietary to try this out?

ithkuil · on May 20, 2021

I had much fun learning the klong language: https://t3x.org/klong/

I even bought the well written book, which was a pleasure to read.

RodgerTheGreat · on May 20, 2021

ngn-k is a different dialect from Kona, but actively-maintained and open source: https://codeberg.org/ngn/k

oK (which is also k6) can be tried here: http://johnearnest.github.io/ok/index.html

userbinator · on May 20, 2021

I bet Chinese and Japanese look like that to someone who knows only English too.

Avshalom · on May 20, 2021

As some one who only knows English. No that is not what Chinese and Japanese look like to me.

That's why I prefer APL and its special symbols. It's still utterly inscrutable if you don't know it but at it looks intentional. Those symbols shift your mindset or reframe what you're looking at.

rscho · on May 20, 2021

it's extremely readable when used to it, though. I use J for exploratory data analysis. It's really, really good at that kind of thing.

jiggawatts · on May 20, 2021

I've lost count of the number of times I've heard some theoretical mathematician say that about impenetrable gibberish.

dang · on May 20, 2021

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

Dismissing what you don't understand because it is unfamiliar is the essence of a shallow dismissal, no? And you did it twice in this thread. The first was a blessing in disguise because of geocar's excellent reply, but now you're just repeating it, with added name-calling. Please don't do that here.

https://news.ycombinator.com/newsguidelines.html

jiggawatts · on May 21, 2021

It's the essence of the thing in this case, and isn't a shallow criticism.

Syntax can be good or bad. I don't believe that it's just a matter of "getting used to it", there are objective metrics of readability, developer error rates, and speed of training that some languages do poorly at.

My point was that most array languages look like line noise. That's flippant, but true.

Ask yourself this question: Is it possible to develop an array-based language and use more "normal" syntax?

Of course it is! That was a rhetorical question.

Similarly, there are other branches of science or mathematics that through an accident of history have developed impenetrable syntax.

This is not dismissing something I don't understand. I understand several such branches of mathematics and still think the syntax is unnecessarily obtuse. Coughcategory theorycough.

PS: Geocar wrote about two pages to explain what is a trivial piece of code, and I still have no idea what it even does or how! I can write code in 10 languages and read about 20-30 without too much difficulty. This includes obscure languages like Mathematica, Haskell, and several flavours of assembly language.

Array based languages are the first time I've seen a high-level language that is less readable than the machine code that they are supposed to be abstracting away! It's also the second time that I've failed to understand a simple piece of code even with a detailed explanation. For reference, the only other case is quantum algorithms.

Okay, I tell a lie. Thinking back on it, I've also seen Perl scripts that are unreadable line noise, but that's about it...

geocar · on May 21, 2021

> It's the essence of the thing in this case, and isn't a shallow criticism.

Listen: I don't think you're qualified to speak to the essence of a thing that you don't understand, and I think you know that. I am telling you this is not very difficult and it is very useful, and I think you're angry at yourself for not "getting it". You are so smart you know ten programming languages, so you assume there's something wrong with "it" because to think there's something wrong with you is unthinkable.

Relax. There's nothing wrong with you either. You can learn this, but you aren't going to be able to skim it; the knowledge you have gained learning ten other languages is not going to help you very much. What is going on is stranger than you can realise at this point, so many array-people believe it is easiest if you try to forget everything you know about programming when you try to learn an array language. I don't think I agree with that, but the myth that children learn languages faster than adults is pervasive. Maybe that helps.

> Geocar wrote about two pages to explain what is a trivial piece of code, and I still have no idea what it even does or how!

Try reading it again! It is not complicated (nor optimal) but you are likely skimming or skipping important parts. If you can tell me where I lost you, I can probably help you find your way.

You might find it easier to follow along with a k reference guide. These typically fit on a note-card and can be helpful in memorising the k operators. These are all of the operators in the above program:

    $[a;b;c] cond (if a then b else c)
    @[a;i;f] amend i indices in a with operation f

    &x where (are the 1-bits in x)
    #x count (the length of x)
    *x first (element of X)

    x&y min/and
    x~y atom-equivalent (like lisp's equalp)

    f':x call f with each pair of x

A complete reference card isn't much longer and usually ships with the k interpreter. People learn in different ways, and that's ok! Help me help you learn and you will probably learn this very quickly.

> Is it possible to develop an array-based language and use more "normal" syntax?

"Array-based" isn't exactly the same thing as an "array" language - I think people get too hung up on the power of vector-operators (which are indeed powerful!) and miss the fact that there are so few operators which to learn.

I don't expect you to understand exactly what I mean by that, but I hope that if you take the plunge to learn this properly, you'll remember that I said this and when it does make sense, it will be useful.

> Array based languages are the first time I've seen a high-level language that is less readable than the machine code that they are supposed to be abstracting away!

You can't read Arabic either, can you? Why do you think this is different?

Try to stop thinking in terms of "readable": You can only say something is "readable" to you, or perhaps if you are a hiring manager, "to most people". I can read it perfectly fine. Worrying about how "readable" a programming language at this point speaks only to how easy it is for you to acquire the ability to learn it, not how valuable it will be when you actually learn it.

thrwaeasddsaf · on May 21, 2021

> I don't believe that it's just a matter of "getting used to it", there are objective metrics of readability, developer error rates, and speed of training that some languages do poorly at.

If we had such objective metrics on APL or K, that would be an interesting discussion. But what I've seen here is just opinion and anecdote.

I also think that "looks like line noise" is a fair characterization of how J and K look on the surface, but that does not mean they are unreadable.

I'm hardly qualified to give any deep insight, but after reading the first half of Aaron Hsu's thesis and playing around with a Dyalog APL interpreter online for a few days, APL starts to look quite readable to me (much more so than J, thanks to the additional symbols) and I found it rather easy to learn (unlike mathematics, but for me the struggle there is not at all in syntax but in the concepts, methods, and meaning).

rscho · on May 21, 2021

Geocar put in the effort of trying to make you see. But you are prejudiced with the equivalent of those who spite Lisp because of parentheses.

If you want a "more readable" array language, take a look at Nial.

PS: qualifying the Wolfram language (Mathematica) as obscure is a really, really myopic view of the programming world.

jiggawatts · on May 21, 2021

I've never seen anyone use Mathematica outside of academia other than myself. It also occupies a slightly unique corner of the language design landscape. One that I like!

I wish that Mathematica was modernised and made into a proper, compiled (or at least JIT-ed), heavyweight language with all the trimmings. A proper IDE, debugger, tracing tools, etc... The current version is a bit like Visual Basic -- a very old language that had grown past its original design capabilities and now needs a revamp. A bit like how VB.NET replaced VB, and was superseded in practice by C# on dotnet core.

rscho · on May 20, 2021

Then you have a different use case. Doesn't mean theoretical mathematicians are wrong for their use case. I do epidemiology, not pure maths though.