I see a lot of people taking issue with the idioms presented, and rightfully so in many cases.
Add the ability for people to improve or debate the solutions. Ultimately we should have a large curated cookbook (with additional variant selections and associated recipe variants).
The most important human element of programming is knowing what to build (and what pieces to build to make the bigger thing). How often do I have to lookup ways to read a file in Ruby?... most times I need to read a file, I have to refer to the different approaches. I just don't do that often enough to remember everything. What I do know is, "this will be a lot of data, so I need to read it line by line or in chunks". That should be all you need to know, and then you pull up a recipe.
For the languages that have considerable feature velocity (so let's say I'll include even C++ and Python there but not C) you'd want to have a way to update idioms to match the current language. You might in some cases want to split out a previously idiomatic solution too.
For example IntoIterator wasn't defined for Rust arrays, and then one day Rust 1.54 implemented IntoIterator on arrays (and hacked things so that this doesn't cause old code to do something unexpected, Rust 2021 will remove that hack since old code wouldn't claim to be Rust 2021, thereby unlocking the new behaviour for syntax that once meant something different). Anyway this changes the idiomatic way to do various trivial array operations, since previously you'd have created a reference to the array, that has IntoIterator so that you can iterate it. That still works, but it's no longer idiomatic.
for &x in &[2, 4, 6, 8, 10] /* needed until 1.54 */
for x in [2, 4, 6, 8, 10] /* makes more sense */
Am I the only one bothered that many of these implementations of idioms are often not idiomatic according to the language? e.g. in C# the idiomatic way to instantiate something is typically to use the var keyword (unless the type of its value can't be inferred). And the idiomatic way to format a string is to use the $ special character, not string.Format... I suspect other languages
C# has evolved quite a bit over 20ish years. I suspect those posts predate $.
Also: some people really oppose "var." I don't understand why, especially when newer languages like Swift and Rust really encourage similar idioms. Something something = new Something () is usually a smell to me. I personally find "var" much easier to work with.
The problem is people abuse of "var", and use it everywhere even when the type is not obvious from the right-hand expression/assignment. This is especially bad when reading code outside of an IDE, like in a GitHub PR, git/cli tools, etc...
From msdn/dotnet documentation:
> The use of var helps simplify your code, but its use should be restricted to cases where it is required, or when it makes your code easier to read.
And before C# got $-interpolation, almost every project had a static StringUtils class with an extension method FormatStr for string... seriously, I never quite understood the reasons why Format was made a static method instead of an instance method.
Most of the time I met people opposing type inference those people thought 'var' would be dynamic typing. I truly don't understand why this misconception is still so widespread.
The argument that code becomes unreadable without an IDE is only true if identifiers aren't named properly (for example by heavy abuse of abbreviations, or even the usage of single-letter symbols).
The only places where type-inference should be avoided are "public interface" kind of things. It's best practice to be explicit about an "interface"—even in languages with full type inference. This avoids changing an interface unnoticed only by changing an expression somewhere. (I don't mean here C#'s Interfaces verbatim, but for example signatures of "package public" methods and such).
I think it can be useful for initialising properties and fields, where you are not allowed to use `var` (for good reason, if you ask me). But of all the recent features it's the one that I would miss least if it were taken away.
I suspect type inference becomes pretty tricky in the presence of subtyping.
I don't have any F# experience, but I have done some Haskell back in the days and I found the type inference very convenient. But I also think there is something to be said for limiting type inference to "inside" methods where it is effectively invisible to users of a class. Rust has made the same choice: you have type inference inside functions, but never outside.
> Something something = new Something () is usually a smell to me.
var something = new Something()
... this one in C# is probably using type inference? Or maybe it doesn't matter in C# because it has a dynamic type anyways? Yes, Rust permits type inference but sometimes it can't infer it and you have to specify it. And sometimes it's clearer to be explicit about the type. C++ permits it too (auto) but depending on the team it can be pretty unpopular except for maybe verbose type names/iterator types, etc.
The compiler can almost trivially get the type. As long as the type on the right side is resolved it's a single pass to resolve the type of the left side. Only rarely is the compiler unable to and you will get an error and have to provide it yourself.
> Set boolean b to true if string s contains only characters in range '0'..'9', false otherwise.
char b = 0;
for (int i = 0; i < strlen(s); i++) {
if (! (b = (s[i] >= '0' && s[i] <= '9'))) break;
}
I appreciate the funny assignment-and-test-and-early-break in one (although I'd hardly say it's idiomatic), but I could do without the quadratic strlen().
That is one of the most confusing pieces of C code I have seen lately. And it fails on an empty string: I would expect it to set b to 1 for an empty string, but it sets it to 0. Of course that could easily be fixed by setting b to 1 at the top.
Also, code like this should always be put inside a function that returns a value, not just written inline. Making it a function allows simpler and more understandable code too.
The funniest part is that it is not necessary to call strlen() at all! The whole thing can be written in a single pass over the string. Here is how I would code it in C:
You do realize it returns 1 for an empty string right?
I mean it doesn't have any digits in it...
What about adding a check of str[0] == 0 -> return 0
Also, giving char str[] will make it char* str.
Which can be null. This may cause reading a random memory location (possibly segfault or use-after-free)
edit: I get the comments but empty string still contains no digits. Given the regex would be ^[0-9]+ (+ instead of *)
What I want to say is string has a numerical value or not. Empty string is NaN.
> You do realize it returns 1 for an empty string right? I mean it doesn't have any digits in it...
Yes, and that was deliberate on my part, as it meets my expectation of what such a function should do in this edge case.
The problem statement was "Set boolean b to true if string s contains only characters in range '0'..'9', false otherwise."
To my mind, the question "is every character in the string a digit" should be equivalent to "are there any non-digits in the string" (with the answer inverted, of course).
Returning 0 (false) for the empty string makes those questions not equivalent. It makes the empty string a special case.
Of course the real problem is that the problem is under-specified. It should call out specifically what should happen for an empty string, because as illustrated here, this is something where reasonable people may disagree.
> Also giving char str[] will make it char* str. Which can be null. This may cause reading a random memory location (possibly segfault or use-after-free)
Well yes, of course. The point of my comment wasn't to write bullet-proof library-ready code, it was only to illustrate two things: code like this should always go in a function, and the entire task can be accomplished in a single pass through the string.
Where did your choice of regex come from? The problem statement does not have any regex, and if you wanted to derive a regex from the problem statement you would probably need to get ^[0-9]*$ if you wanted to be correct.
For me the strlen call appears directly before loops backwards jump when set to -O0, resulting in a call every iteration as far as I can tell. However -O1 already seems to optimize it to a single call at the start of the function.
I find it every time funny that when using languages that want to give you "total control" over the execution of your code (mostly C/C++) you actually almost never know what code gets executed in the end.
It depends on the compiler, it's version, it's flags, and likely "the position of the moon".
Of course the compiler is only allowed to do transformations that the spec permits. But it's impossible for a human being to anticipate the exact outcome. It's more like: "Compiler, do something that has the same outcome as this code I show you here". The output can be than something that doesn't resemble the input even slightly!
There's obviously nothing wrong when the compiler is so smart that it sees some patterns and transforms your code into something much more efficient. Only that there's not much difference to what happens when you use a high level language. In both cases you in fact don't control the exact code that gets executed, and in both cases you rely on the smartness of your compiler to produce some efficient code, "whatever" you've written.
That's why I think it's mostly a function of the code-style how performant or efficient some language can be (to some extend of course). When you write low-level style code (even in a high level language) a smart compiler will (hopefully) create something like what you would get form writing your code in C/C++.
I guess that's true. Calling an expensive function effectively in the body of a loop like in this example is going to be an issue in every language.
It is difficult to optimise because you need the compiler to evaluate and prove at compile time that both the loop cannot affect the result of the function call and the function call will not affect the loop.
It's a common and easy optimisation to simply move function calls like this out of the loop.
For the cost of 1 line of code I've regularly seen 10%, 100%, 1000% speed ups.
It's actually one of the most common optimisations to do in non-compiled / "slow" languages if you know how functions are evaluated, you see that the cost of a simple getter function call can be the most expensive part of a loop.
It's great and sometimes even astonishing what GCC and LLVM can do.
Also it's clear that even the smartest compiler can't magically optimize any code.
My point was more about the fact that compilers for lower level languages like C/C++, exactly the two named, use the most "magic" possible and that it's therefore almost impossible to anticipate upfront how their generated code will look like. But C/C++ claim that you have the most possible control over the code. My point was that this is only true to some extend, and that you can get almost equally good generated code using a less low level language just by writing code in a style matching the usual low level languages. (Especially than you need to think about loop invariants and such like you said)!
So my point was more: The claim that you have "total control" over what happens at runtime when using a language like C/C++ is false.
Seeing this example and at the same time people discussing pages long (while using even de-compilers) given that source snippet how the generated code may or may not look like reminded me of that, like I said "funny", fact about the "total control" C/C++ gives you.
It's not an issue, of course. It's just an observation and I was reminded of it.
If you know what a compiler will optimise, might optimise and can't optimise, you can get an intuition for what the generated code will be executing, but even ASM is not "full control" and often not super useful because what you think is happening will shuffled around again by the CPU. The execution time for the same asm will vary significantly depending on architecture.
If you have a good mental model of modern CPUs, in a simple loop, you can estimate what you think the bottlneck of the function will be, either by counting the micro ops or the number of stack / heap memory reads or memory allocations, etc, to estimate what is really happening to work out how you can optimise it, otherwise you're just shooting in the dark trying random combinations of flags or code not understanding why something worked or didn't work.
At least in C/C++ that model works.
In slow languages like python/javascript/etc, doing simple operations doesn't translate down to the very low levels at ALL. Generally if you imagine the worst possible way you can think of for how something will execute in a simple loop and multiply it by 10, it might be close.
It's not unreasonable to assume the compiler will optimize it to a single call. Though I guess people who are capable of making that judgement won't need to look this idiom up on the internet.
It shows what happens if you take away the compilers' ability to reason about 's': it could be a global variable, and 'foo' could be modifying it, so now the compiler has to call strlen on every iteration.
So even though 'strlen' gets optimized out in the original version, it's quite a maintenance hazard since any minor change could inadvertently change the complexity from linear to quadratic. It also doesn't get optimized in debug builds, making those far slower than necessary.
We don't write code only for compilers, but for human readers as well. Why write code that makes a smart human wonder "Is that going to be quadratic? I'd better make sure the compiler optimizes it out!"
Smart human for you, someone obsessed with micro-optimization but lacking knowledge of compiler optimizations for me..
Writing it for human readers can be used to argue for the original implementation if you think of for(i=0;i<expr;i++) as the C idiom for "iterate expr times" and here you want expr to equal the length of the string, which is what the (obviously pure) function yields. No unnecessary variables and assignments -- no clutter. It perfectly describes the intent.
This conversation started when another developer was alarmed that the strlen() call in the for loop looked like it could be quadratic. I think we owe that dev the respect to not dismiss their concern as being "obsessed with micro-optimization but lacking knowledge of compiler optimizations".
I try to write code that is as plain and simple as possible, and make it obvious what it does and that it has no gotchas. I want my code to be understandable both for new developers, and for my future self, who will surely be less smart than I think I am today.
Here is the original code, with the loop body elided:
for( int i = 0; i < strlen(s); i++ ) {
}
It is trivial to rewrite this as:
for( int i = 0, n = strlen(s); i < n; i++ ) {
}
This is a very common idiom, and now it is perfectly clear what the code actually does. Obviously, it only calls strlen() once.
Of course, as I pointed out in another comment, if you're writing a loop that iterates over a C string, you never have to call strlen() at all! You can just use the canonical C string loop:
"In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."
As I understand it, the strlen implementation ("calling a function") is typically going to come from another object file (at link time), so it’s not clear that when compiling this file that “calling strlen has no side effects” is information available to the compiler.
> so it’s not clear that when compiling this file that “calling strlen has no side effects” is information available to the compiler.
It is because strlen is a #define to __builtin_strlen
And __builtin_strlen is declared (internally) as __attribute__((pure)).
The fact that it is a compiler builtin isn't that important; the compiler will perform the exact same loop optimisation on a non-builtin function if you declare it as __attribute__((pure)). So, in practice, it is not the object file which tells the compiler it can do this, it is the function declaration in the header file.
__attribute__((pure)) is of course a GCC extension, not standard C or C++ – but Clang supports it and so do many proprietary compilers, MSVC appears to be the main exception. This isn't part of the standard but is a compatible extension. C++11 standardises attributes (with a different syntax) and C202x is going to do the same. Neither is yet including "pure" among the standardised attributes but a future standard revision could always add it. There is a proposal to add pure to the C++ standard [0]. If it successfully makes it into C++, don't be surprised if it makes it into the C standard as well at some point. It is the kind of C++ feature which the C standard is likely to borrow, and is already widely implemented (just under different syntax) anyway.
Nitpick: I don’t think the compiler needs that __attribute__((pure)) annotation to move strlen around.
__builtin_strlen is a reserved name, so if the compiler sees it, it may assume it came from an #include <string.h> and that it does what the standard says strlen does.
That is what allows compilers to do much more than move that function call out of the loop. Because they know what it does (or rather: what the programmer promises it will do, once the program get linked) they can inline its code or even, if the length of the string is known up front, replace the function call by a constant.
It doesn't even need to see __builtin_strlen (try modify your string.h or use musl, which has neither the attribute nor __builtin_strlen). It just needs to see strlen in a hosted environment (with the right header included?). If you want gcc to treat it otherwise, you also need -fno-builtin-strlen.
strlen is a standard function (in a hosted environment). So it must do exactly what the standard says it does, and the standard doesn't say it has side-effects. The compiler could very well use a built-in implementation of strlen, or even omit the call entirely if it had another way to deduce its would-be return value.
Object files are an implementation detail not known by the C standard.
The relevant question is "does the standard say that it does not have side-effects?" (is a pure function). @skissane's sibling comment to yours provides the explanation of how the compiler can deduce that it's a pure function.
The compiler can deduce that it is a pure function using the same logic by which it can deduce it is free to replace the call with a builtin: the function is defined in the standard. Everything @skissane said is implementation details and not particularly relevant (the compiler can do the optimization without the attribute they mentioned).
I think the standard would fall apart if you read it under the assumption that anything not explicitly forbidden can happen. Instead, you should read and find out what the side effects are (and same for undefined behavior, unspecified behavior, implementation defined behavior, etcetra).
"Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment." (There are more details if you care to dig in)
Of course nothing stops you or me from making extensions to the standard, but analysing things from the perspective that some implementation might extend strlen to have visible side effects goes too far into whataboutism for my taste, unless there are real world examples to make it a relevant point.
That is nice and simple, but it makes ten comparisons for each character in s, where only two are needed. Of course it would be a good approach if the set of characters you're testing against is not contiguous, unlike 0..9.
Correct, but we're talking about "idiomatic" not "most efficient" solution. Input validation is usually not time-critical, and for these cases the code clearly showing the intent is much better than optimal one.
Good news then, it’s also linear time like the marginally faster but enormously grotesque for loop provided previously. I would take that clarity of intent a hundred times over squeezing a couple of comparisons out.
You raise an interesting point. It got me thinking about how big-O notation has failed us in some ways: it teaches us to ignore constant factors.
In big-O, an algorithm that makes 1000 comparisons per element is no different from one that makes a single comparison per element. They are both linear time. But you can't deny that one of these will likely take 1000 times as long as the other.
Of course, like you, I favor simple and readable code over grotesque code that is hard to understand and mentally verify.
Sure, asymptotic time complexity only says anything about the asymptotic behavior. On the other hand, I have had far more cases of things blowing up due to cubic or even quadratic time complexities than I have for linear; linear is what you expect, if it takes a long time for a toy input, it'll take twice as long for a real input at double the size. Not so with the superlinear ones, your toy problem might work fine, but cubed? Ain't happening this millennium. This combined with the constant factor tending to also grow as the problem size grows really is a potent foot-gun.
Yes. You asked for "an integer" from stdin? Here's your integer. Specify constraints better next time. (That's probably how the upcoming AI-assisted code gen tools will look like).
as far as I saw they were pretty useless and the way they were presented wasn't that great (same language variants could be grouped in one commented section)
The longer I look at this example, the more weirdness I spot:
- There are no standard integer types that take 15 (decimal) digits to represent.
- The array contains ints instead of chars
- Why would you use fgets() instead of just gets()? (Though I don't touch C very often so perhaps that is considered proper style)
- Obviously no conversion of the digits into else, let alone specifying a base or handling a `0x` prefix for hexadecimal or a minus sign for negative numbers.
> There are no standard integer types that take 15 (decimal) digits to represent
Nitpick: that is irrelevant. The code reads in at most 14 characters.
> Why would you use fgets() instead of just gets()?
You don’t use gets because it doesn’t exist anymore. It got removed in C11 (it rightfully was deemed so bad that backwards compatibility was sacrificed). You can use
man gets
...
SECURITY CONSIDERATIONS
The gets() function cannot be used securely. Because of its lack of
bounds checking, and the inability for the calling program to reliably
determine the length of the next incoming line, the use of this function
enables malicious users to arbitrarily change a running program's func-
tionality through a buffer overflow attack. It is strongly suggested
that the fgets() function be used in all cases. (See the FSA.)
Hmm. There are more wrong implementations than correct ones as by now.
The core of the problem was noted already: Obviously most people can't read.
That's especially "funny" when thinking about all the fuss that is made about teaching children programming in school. They should start with teaching them reading.
I don't even mean this snarky. The state of affairs is actually depressing and I would welcome it very much if more people around would be able to read, understanding what's written. Would make a lot of things easier for everybody I guess.
Also writing. I don’t believe that it’s possible for someone to be a clear and concise programmer unless they’re able to write their native language clearly and concisely.
This would likely require teaching students at least basic logic. This was entirely absent from my school experience.
It’s a hugely undertaught and undervalued skill, IMO. Before we start shoehorning CS into high school curriculums, we should consider laying these foundations first.
I think the point is that the implementation of the program exactly describes the behaviour of the program. If you take the program as a description of what the program is supposed to do, then what it is supposed to do is pretty unambiguous!
(Also debatable, for example if correct operation of the program depends on some property of its environment that can't always be relied upon.)
I found out that a good way to learn idioms for a given language is to do some simple katas on codewars.com and then review the most upvoted solutions.
Add the ability for people to improve or debate the solutions. Ultimately we should have a large curated cookbook (with additional variant selections and associated recipe variants).
The most important human element of programming is knowing what to build (and what pieces to build to make the bigger thing). How often do I have to lookup ways to read a file in Ruby?... most times I need to read a file, I have to refer to the different approaches. I just don't do that often enough to remember everything. What I do know is, "this will be a lot of data, so I need to read it line by line or in chunks". That should be all you need to know, and then you pull up a recipe.