Let's do this using the same approach, "find indices and assign over them in a copy of the sequence":
This is the TXR Lisp interactive listener of TXR 259.
Quit with :quit or Ctrl-D on an empty line. Ctrl-X ? for cheatsheet.
Do not operate heavy equipment or motor vehicles while using TXR.
1> (defun subst-select-* (list)
(let ((indices (where (op starts-with '(select *))
(cons nil (conses list)))))
(if indices
(let ((list (copy list)))
(set [list indices] (repeat '(_)))
list)
list)))
subst-select-*
2> (subst-select-* '(foo bar * select * fox select * bravo))
(foo bar * select _ fox select _ bravo)
(conses list) gives us a list of the list's conses: e.g in (1 2 3) the conses are (1 2 3), (2 3) and (3), so the list of them is ((1 2 3) (2 3) (3)).
where applies a function to a sequence, and returns the 0-based indices of where the function yields true.
(op starts-with '(select *)) yields a lambda which tests whether its argument starts with (select *). No brainer.
If we naively applied that to the conses, we would get thew rong indices: the indices of the select symbols, not of the asterisks.
The workaround for that is (cons nil (conses list)): we cons an extra dummy nil element to shift the positions, and process the resulting list.
Once we have the indices list, if it isn't empty, we copy the original input, and assign underscore symbols into the indicated positions. To do that we generate an infinite lazy list of underscores; the assignment takes elements from this list and puts them into the specified index positions.
If this were me, and I needed a function like us, I would have written this:
us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':
I would be interested in seeing anything that was shorter[1] and faster than that in any language, and I would be very curious to learn from anyone who could also do that faster than me.
But I'm not a fetishist: I didn't learn k because it was cute, and I don't wake up every day looking for ways to rewrite other people's code so that it is slower and bigger. Do you? Or is today special?
> I would be interested in seeing anything that was shorter
One way to do that would be to pose that as a problem on the Code Golf Stackexchange.
My rewrite-case solution condenses (by removal of all non-essential spaces) to 69 bytes if the symbols are in a hash table K, and the input list is in a variable y, rather than a big literal:
(rewrite-case x y((@[K @sym]* . @rest)^(,sym _ . ,rest))(@else else))
A one-letter name could be chosen for the macro. Furthermore, one-letter names could be chosen for sym, rest and else:
20> (r x y((@[K @s] * . @r)^(,s _ . ,r))(@e e))
(foo bar * select _ fox where _ bravo)
Still working! Now down to 43 bytes.
(It's not because I cannot that I do not do this with all my code.)
Doh, why use . ,r in the backquote if we are golfing? That should be ,*r: using the splice operator, like Common Lisp's or Scheme's ,@, getting us to 42 bytes:
20> (r x y((@[K @s] * . @r)^(,s _ ,*r))(@e e))
(foo bar * select _ fox where _ bravo)
I suspect Code Golf Stackechange regulars could get it down to way in one of the dedicated golfing languages like Retina or what have you.*
What else can we do? The @[K @s] predicate syntax could be replaced by a custom operator defined by defmatch, looking like @(K s).
21> (defmatch k (sym) ^@[K (sys:var ,sym)])
k
22> (r x y((@(k s) * . @r)^(,s _ ,*r))(@e e)) ;; 41
(foo bar * select _ fox where _ bravo)
Just noticed the ) * is not fully golfed:
23> (r x y((@(k s)* . @r)^(,s _ ,*r))(@e e)) ;; 40
(foo bar * select _ fox where _ bravo)
The k pattern operator macro could be non-hygienic: it could implicitly bind a variable called s:
24> (defmatch k () ^@[K @s])
k
25> (r x y((@(k)* . @r)^(,s _ ,*r))(@e e)) ;; 38
(foo bar * select _ fox where _ bravo)
These last few feel like cheating because they are too special purpose. Defining anything you can only possibly use just once isn't making the overall program smaller.
1> [window-map 1()(do if(and[K @1](eq'* @2))'_ @2)y]
(foo bar * select _ fox where _ bravo)
2> (mapcar(do if(and[K @1](eq'* @2))'_ @2)(cons()y)y)
(foo bar * select _ fox where _ bravo)
3> (mapcar(lambda(:match)((@[K] *)'_)((@a @b)b))(cons()y)y)
(foo bar * select _ fox where _ bravo)
you'll never get J/K brevity though. Because in opposition to Lisp where control flow is explicit in the syntax, most of the control flow in array languages is implicit. I love lisp too, but for some (rare) use cases it's not the best.
You will never get J/K brevity by writing a condensed for of normal Lisp because spaces are often required to separate tokens.
You can get implicit control flow via macros.
Furthermore, all control flow is implicit is some way.
Take basic old progn: (progn (foo) (bar)) evaluates (foo) and then control implicitly passes to (bar) because that's the next thing.
The only control flow which is not implicit is that of a state machine described by a table or graph, indicating the next state for every input and current state.
(How can you say you can't get implicit control flow in Lisp, when I'm using pattern matching to express if you see this prefix, replace it with that?)
Speaking of macros, you could always resort to a macro like this:
(j"... line noise in character string ...")
The j macro expands the string to code at compile time. The code could be a bona fide J implementation, or something like it. That's still not down to J, because of the (j"") wrapping chewing up five characters. In a Lisp with a programmable read table, that could be reduced to two character framing or something.
v:("select";,"*";"from";"tacos")
us:{$[#i:&{(y~*K)&"*"~*x}':x;@[x;i;:[;,"_"]];x]}
\t:100000 us v
233
us:{$[(*K;,"*")~(y;x);,"_";x]}':
\t:100000 us v
206
w:(*K;,"*");us:{$[w~(y;x);,"_";x]}':
\t:100000 us v
179
us:{$[x~(z;y);,"_";y]}[(*K;,"*")]':
\t:100000 us v
129
I think it's important to remember just how simple k is: We the programmer know that (*K;,"*") isn't supposed to change, so we should be explicit so k "knows" this as well. In addition to never changing, we also know the value is only used once, so we really don't want to look anything up in the workspace every time we call us: Again, let us be explicit.
On the other hand, this:
us:{$[#i:& XXX ;@[x;i;:[; YYY ]];x]}
is an extremely recognisable idiom. It occurs several times in sql.k so the reader is probably used to seeing it at this point. It also has a similar syntactic structure to the APL '@' so it may be more "obvious" to an APL programmer. Maybe the reason I write it this way is that I'm not a very experienced APL programmer :)
Thank you for the extensive answer, that's very interesting that there's such a performance difference. I would have naively assumed if anything keeping it as a constant would be faster.
Another question:
Why do
$[#i:& XXX ; @[x;i;:[; YYY]]; x]
when (if I'm not mistaken, which I could be) you can do
@[x;i:& XXX;:[; YYY]]
Performance again? A quick test using ngn/k did seem to find the first one faster, but only marginally.
That's a harder question. Why do other people do the things they do? One reason might be because it was convenient for them to write (and debug) things that way.
But it is possible they are doing it for performance reasons, so maybe it's worth considering why it should be faster? That is to say, should the programmer assume the latter is faster than the former?
To do that, I would suggest putting yourself in the shoes of the implementor: You have a choice of how it should be implemented. Knowing this decision will affect every use of @[x;i;f], should it begin with an if statement like this?
if(y->n == 0)return x;
But before you answer, remember the next thing @[x;i;f] needs to do, is consider if x has any additional references, because if it does, it has make a copy of x. That means there already needs to be an if statement that looks like this:
if(x->r > 0) x = copy(x);
So one way to think about this, is to ask if you are implementing @[x;i;f], should you choose one branch or two?
$[#i:& XXX ; @[x;i;:[; YYY]]; x]
Also: Do you see the part that goes :[;YYY]? That's constructing an object. Do you think it is worth avoiding that allocation if possible?
where applies a function to a sequence, and returns the 0-based indices of where the function yields true.
(op starts-with '(select *)) yields a lambda which tests whether its argument starts with (select *). No brainer.
If we naively applied that to the conses, we would get thew rong indices: the indices of the select symbols, not of the asterisks.
The workaround for that is (cons nil (conses list)): we cons an extra dummy nil element to shift the positions, and process the resulting list.
Once we have the indices list, if it isn't empty, we copy the original input, and assign underscore symbols into the indicated positions. To do that we generate an infinite lazy list of underscores; the assignment takes elements from this list and puts them into the specified index positions.