If all of the methodFuncs fail, this will attempt to jump to whatever address happens to be in memory after the last element of methodFuncs. You've forgotten to stop iterating.
A terminating null element or logical equivalent in an array is helpful in lots of situations. As one example it simplifies the logic of looking for an element in an array and performing some special operation if it was not found. No bool found variable necessary.
That's not clever, that's language abuse and horrific -- You could do this in so many other ways there is absolutely no point in abusing #includes mid-source-code (without mentioning the code smell around statics.)
Now, as a working curiosity, this is pretty damn cool.
Not tooo uncommon either. It's a "reasonable" way to write generators in C (if you're masochistic Python programmer, I guess.)
Switch on the last return point at the top of the function, and make all variables that need to persist between calls `static`.
(I've written it before, and I swear I didn't invent it... Maybe it's just a short skip from Duff's Device? Still, using it to pick up before the last return instead of after is new trick for me!)
Ah, that's exactly the trick I was thinking of. I think I learned it from [1], but I'm pretty sure Protothreads predates that post. Thanks for the link!
> without mentioning the code smell around statics
Or the code smell around fall through switch statements.
There are legitimate uses for fall through switch statements and/or mid-file includes, and or static function vars, but combining all of them together is not clever, it's a recipe for bugs.
An interesting trick, but it seems like there would be a simpler way, like having set_the_mtime be a function pointer. Initially it would point to an implementation that figures out which method works and sets the function pointer to point to a particular implementation.
> has the potential to be really opaque and confusing
Yes, but.. compared to switch and preprocessor hacks described in the article? Granted it's subjective but I'd find a function pointer that's assigned to once to be way easier to comprehend.
One concrete issue with this technique: it doesn't "loop". Meaning, for example, if initially method #3 doesn't work but method #4 does, it'll settle on method #4. If later, method #3 starts working, it won't switch it. Worse, if method #4 stops working, it'll just always fail.
I imagine this isn't an issue for rsync's use but could be a concern if applying it more generally.
According to the article, a step isn't skipped because it had a transient failure, it's skipped because it returns a special error code ENOSYS that means the syscall hasn't even been implemented, and therefore there's no point in ever trying it again.
AIUI with containers it's now entirely possible that a running program may be migrated from a host where a syscall has not been implemented to one where it has.
Just remove the `static` keyword and there you go.
But it's an "optimization", and avoiding previously attempted methods is the whole point of the trick. Otherwise it would be an easy and uninteresting function.
> Just remove the `static` keyword and there you go.
That‘ll probably try all methods on every single file that is processed, so if method #16 is the one that works, you get 15 unnecessary syscalls for every file.
Lets say that there are only 3 methods and method #2 is the one that works initially. On the first call it'll try #1 and when that doesn't work it'll try #2 which will be successful. For the next however many calls it'll jump straight to #2. If #2 starts returning ENOSYS at some point it'll try #3 and if that works then all future calls will jump straight to #3. However if #3 stops working then it will keep on failing until #3 starts working again even if #1 started working after the first call.
I think (I haven't actually tested it) you can do what the parent commenter wants by wrapping the whole thing in a do-loop but the original implementation is questionable enough, sticking an extra loop in there isn't doing anything to help make it less horrible.
int set_the_mtime(...) {
static int switch_step = 0;
int loop_again = 1;
do {
switch (switch_step) {
case 0:
if (method_0_works(...))
return 0;
switch_step++;
/* FALLTHROUGH */
case 1:
if (method_1_works(...))
return 0;
switch_step++;
/* FALLTHROUGH */
case 2:
if (method_2_works(...))
return 0;
}
switch_step = 0;
} while (loop_again--);
return -1; /* ultimate failure */
}
My understanding of the code from the article is that if method #4 stopped working, it would just carry on down the list. The static variable just means where to skip to in the case list.
Andrew Tridgell’s thesis is also a very interesting read, as rsync has some pretty innovative algorithms that make the file transfer and remote diffing fast:
Thanks, I hate it. I'm a bit bemused about this bizarro case_N.h trickery. I'm also almost positive __COUNTER__ would be adequate and entirely remove the need for __LINE__ as suggested in the addendum which is likely to be quite large in any non-trivial source unit. I mean, something gcc this way comes I guess...
Note that if this were C++ (11 or later) this could be done with the static variable in a thread-safe way, since function-scoped static variables initialized by functions are guaranteed to be initialized exactly once. So this code could be something like:
int set_the_mtime(...) {
static int starting_from = set_mtime_starting_from(0);
set_mtime_starting_from(starting_from);
}
// Try various ways of setting mtime until it works.
int set_mtime_starting_from(int starting_from) {
// horrible switch statement
}
So without actually looking into anything. What id you have a directory structure which is built out of 2 mounts. One local fs and one network mounted.
Then lets say method 5 works on one, and method 1 works on the other, both are the only ones that work. Then it would basically fail, depending on the order of directories enumerated, eventhough there is a supported method.
I haven't looked at the code, but the ENOSYS error is more about the capabilities of the kernel. So when we eventually start swapping entire kernels at runtime then it'll break.
Edit: d'oh - I misread, the addendum is doing exactly this...
I don't understand the need for the macro-fu to generate the case values. The use of switch_step++ in each previous case seems to drive that, but couldn't the same be accomplished by setting switch_step to a constant?
static int switch_step = 0;
switch(switch_step) { // Falls through between cases
default:
#if METHOD_0_AVAILABLE
switch_step = 0;
case 0:
if (method_0_works(...))
break;
#endif METHOD_0_AVAILABLE
#if METHOD_1_AVAILABLE
switch_step = 1;
case 1:
...
}
So, slightly off topic, but one thing from C that I really miss in other languages is subroutine-local static variables. It always annoys me in Java or C# that you can't really make a static variable that's scoped to a single method or function, you have to scope them to the entire class. It's common enough that a private static variable is only needed in a single method or function, so why not do it the way C does it?
rsync is an amazing tool, full of delightful secrets.
One thing it doesn't have natively (unless it's really well hidden) is the ability to look at two directory structures, and generate a delta of the two out to a separate location (say a USB stick).
A very useful feature for backups where you've got good enough (to run an rsync --dry-run) network connectivity, but not good enough to actually do the transfer.
Some compilers turn large switches into lookup tables. Using side effects like this however, makes the compiler shy away from this optimisation, and compile it literally. So it's probably not a good idea anymore.