Hacker News new | past | comments | ask | show | jobs | submit login
`three = 1` in the Linux sourcecode (2014) (github.com/torvalds)
239 points by ddtaylor on March 9, 2021 | hide | past | favorite | 101 comments



Many years ago in an age of klocs and flowcharts, at a large three-letter computer company, there were iron-clad coding rules that must be obeyed, no questions allowed. In general you could see their reasoning, but such bureaucratic reasoning doesn't pay off.

One example of this was that all numeric values used in a program must be factored out as symbolic constants. The reason for doing this is obvious, but it failed to account for the fact that some numbers have intrinsic meaning that should allow their use directly. But to be compliant with the standard, our C code had this boilerplate up near the top:

  #define zero 0
  #define one  1
This, of course, only served to make every page of code harder to read. And it didn't really even solve the problem it was meant to. We once found in some source code:

  #define thirteen 13
which of course did nothing to show us why that particular value was relevant.


An application I touched a few weeks ago has a DB column named 'type' with values 1,2,3. So on to the source code we go:

  enum RecordType {TypeOne(1),TypeTwo(2),TypeThree(3); RecordType(int dbValue) ...}
As it happens, I know an end user of this particular beast, so I show her some record IDs of each type and ask in what way they differ. She looks a few seconds, then says: 'This is clearly a type one record, the next is a type two, and here you have a type three'. After a little back an forth, it turns out even our internal course for new personnel talks about 'records' with different rules to follow for 'type one/two/three'. The end users have no other word to describe these entities than 'records' each with its own numbered type, and think about it as completely natural. The application is based on an older COBOL application, and the terminology just stuck around.

Clearly, the application is well-written: The enum uses the terminology used by the business. Another of my flabbers just got ghasted.


This part looks like those people formed a neural network. They are able to learn to act quasi-intelligently, but when you peek inside, the internal model is just a bunch of numbers.

Edit: scratch that neural network. They just made one neuron so far.


Three categories? That kind of complexity requires at least two artificial neurons.


Sometimes people just can't think of a better term and it sticks. It makes sense if there's some ordering implied but in quite a few of these there's none:

https://en.wikipedia.org/wiki/Type_1

https://en.wikipedia.org/wiki/Type_2

https://en.wikipedia.org/wiki/Type_III

https://en.wikipedia.org/wiki/Class_1

https://en.wikipedia.org/wiki/Class_2

https://en.wikipedia.org/wiki/Class_3

etc.


One benefit there is that you don't get left with only context for what type one records used to be like before some evolving. The only baggage kept is by people aliasing that name in their own memory.

You don't get screwed because you called something apple to start with, but it now represents a race car


https://en.wikipedia.org/wiki/Hierarchical_database_model

This was pretty common on mainframes, before db2 and other SQL databases


I would do this:

    #define I     1
    #define II    2
    #define III   3
    ...
    #define XIII 13
There is no zero in roman numerals, but you can use NULL for extra fun! (yes, I am a bad person)


The Roman numeral for ‘0’ is (nonclassically) ‘N’; there are attestations going back to the early 6th c. It stands for “nulla” or “nihila”.


Or, just program in INTERCAL (https://en.wikipedia.org/wiki/INTERCAL)! Integers are input in English (e.g. THREE FIVE NINE) and output in Roman numerals.


Using NULL for the integer 0 requires a cast in C (because NULL may have a pointer type). That doesn't break the GP's rule, however neither does `!I` or `I-I`.


For extra fun, put it into the code like this, including the `...` instead of the actual values. (Make it compile by commenting that line out.) Then define IV somewhere else but make sure to use III before every use IV so someone unfamiliar with these constants see the “right” definitions first.


    iv = secrets.randbits(IV)
    plaintext = "Text for encryption"
    aes = pyaes.AESModeOfOperationCTR(key, pyaes.Counter(iv))
    ciphertext = aes.encrypt(plaintext)
    print('Encrypted:', binascii.hexlify(ciphertext))


> There is no zero in roman numerals

    #define X_MINUS_V_MINUS_IV_MINUS_I 0


Why is the reason for doing this obvious? I can't see any benefit.

I get why you would prefer to name constants for their use, like `numIterations=3` or whatever, but renaming every integer seems senseless.


When formulating the rule, they didn't understand the point of it. They think "naked constants === bad" is equivalent to "no numbers in source".

Which, of course, is wrong. They're not the same. Same bastardization happened to Hungarian Notation, IIRC. It was meant to add some information to variables, like `iterN` for a counter, or `lenX` for a length, but someone somewhere decided that you were meant to prefix with the most primitive of type info, so you get stupid shit like `intN` and `intX`, and aren't much more informed. (Especially with an IDE showing the type info)


It's "obvious" if you take into account the corporate culture I guess. The rule of 'no magic numbers' is generally good, but then one has to remember that some numbers (like 0 and 1) are not magic.


0 and 1 can be magic if the context depends on them being arbitrarily 0 or 1. eg.

// this byte is unused and always 0

#define UNUSED_CONSTANT_FIELD_VALUE 0

bytes = {HEADER, UNUSED_CONSTANT_FIELD_VALUE, [...]}


Let me tell you about the system where null and 0 are treated as equal because in some programming languages will coerce null to 0. It just so happens that for an important business type in this system:

the range of the type is the non-negative integers up to (2^32)-1

semantically, the value cannot be null

when the value is 0, the code treats it as a special case

null is treated as 0


The use case I'm familiar with is when the fn you're calling requires a pointer. Common for ioctl(..., &one) calls.

If you look later in the code that's exactly what is happening here.


> but renaming every integer seems senseless.

When the integer itself is self-explainatory but you can't just assign the integer as a name.

For example:

   constexpr int 42 = 42; // compiler error
So instead:

   constexpr int fourtytwo = 42; // yay!

However the obviousness of what 42 means is debatable.

   int universe = fourtytwo; // why?
   int sum = fourtytwo; // sum... of what?
   int magic = fourtytwo / 7;


until someone runs a spellcheck on this code and changes all locations to fortytwo


Fourtytwo: a number equaling four amounts of ten plus two singular amounts.

Fortytwo: a defensive fortification with a nickname. See also Boaty McBoatface.


"What do you get if you multiply six by nine?" = 42


  #define zero 0
is just bad programming, barring some philosophical code which cares about "zero-ness". Names need to reflect what they represent, not their precise values. Are we looking up the first index? Then how about

  #define first_index 0
? Are we summing up values which are coerced to zero when empty? Then how about

  #define empty_value_integer 0
or

  #define additive_identity 0
?

Yes, that means there might be more than one variable with a value of zero, and that's completely fine since we now have several megabytes available for both source code and for compilation.

To be clear, I'm not endorsing the rule, just saying that there are more productive ways of dealing with it.


>barring some philosophical code which cares about "zero-ness"

But that is the intrinsic meaning OP was talking about. Plenty of mathmatics deals with 0 or 1 as important constants. If you have a function that mods by a parameter, you need error handling for zero. Calling it value_that_causes_undefined_behaviour isn't going to improve code clarity. Beyond that;

for (i = 0; i < IMPORTANT_QUANTITY; i++)

, or some variation thereof, is very idiomatic C.


My point is, if you had to pull `0` in that `for` loop into a variable, would you call it `zero` or `first_index` (or something else actually relevant to the loop)? I'd certainly prefer something relevant to the loop rather than `zero`.


'zero' and '0' are the same label, for the digit 0. There is no advantage to re-defining it, in fact, it does the opposite. It's a good practice to define labels for magic numbers to promote more maintainable code through the DRY / single source of truth principle, document the magic number through the label name, and improve code archeology by enabling searching by label.

These principles hold when the label given a clear and concise name--which is an art.


Heh, try for

     #define 0 0


  > #define 0 0
That actually works on some implementations (the preprocessor grabs a token without checking that it's a identifier). I've seen:

  #define $ SOME_LINE_NOISE
  #define && HACKY_GARBAGE
  x = &&(y $ z);
as well.


We have something similar in my my organization's Fortran code. We have constants KI1, KI2, KI3, ... etc corresponding to (k)onstant integers 1, 2, 3, ... etc. Someone got the idea that hard-coded numbers were bad. All it does is make code harder to read to anyone unfamiliar with these conventions.


Did you have an automatic static code analysis tool to flag non-compliance? This is the sort of thing I see and use to warn people away from blindly spending money on this sort of thing, installing the tool, and calling it good.


  705 /*
  706  * Iterate through the groups which hold BACKUP superblock/GDT copies in an
  707  * ext4 filesystem.  The counters should be initialized to 1, 5, and 7 before
  708  * calling this for the first time.  In a sparse filesystem it will be the
  709  * sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
  710  * For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
  711  */
  712 static unsigned ext4_list_backups(struct super_block *sb, unsigned *three,
  713                   unsigned *five, unsigned *seven)


Yeah, maybe calling them pow3, pow5, pow7 would be a bit clearer? But a 10s search for a comment cleared it up, so I don’t really see the problem...


I see a problem:

https://github.com/torvalds/linux/blob/master/fs/ext4/resize...

Upon a quick code review, these lines look buggy:

    unsigned three = 1;
    unsigned five = 5;
    unsigned seven = 7;
Without digging deeper, the reader of this code thinks "The first line surely must be a bug, right???"


    unsigned three = 1;
To me, that is a 20 foot tall neon orange flashing Chesterton's fence. I think, "surely there is a very strong reason for this." Maybe not a good reason, but definitely a compelling one.


If I was code reviewing this, I would almost certainly insist on a different name to make the WTF go away. I’m not going to suggest one here despite having read the other code comment, as doing so would be bikeshedding — not my codebase and I have no skin in this — the only point I will insist on is that I cannot believe that “three” is a sensible choice of name for whatever it does. Not even what the comment implies it does.


Totally agree, it would definitely be better to rename it, but I read in another comment that the kernel is loathe to rename things. So then it makes you wonder, who allowed 'number_noun = N' in the first place?

"Three is the number of the counting, and the number thou shalt count to is three"


Why my kids ask me questions like this, "who invented the number three," I usually tell them it was someone vaguely European of the Enlightenment era with the same last name. "Ah, it was Ulysses Herman Three, inventor the natural numbers and famed bicyclist." Usually buys me enough time to look it up online.


Or at least require a comment explaining it, before someone needs to dig through the code.

Powers of three, five, and seven sounds a little bit like Totvald's very own little FizzBuzz game hidden in ext4's file-system source code.


> Totvald's very own little FizzBuzz game hidden in ext4's file-system source code

What's that?


Write a filesystem that saves all data blocks to disk, but:

- if the data block id is a multiple of 3, replace its contents with 0x03

- if the data block id is a multiple of 5, replace its contents with 0x05

- if the data block id is a multiple of both 3 and 5, replace its contents with 0x15


> To me, that is a 20 foot tall neon orange flashing Chesterton's fence.

It's not a Chesterton's fence, though. The whole point of Chesterton's fence is that whoever put up the fence that you want to take down couldn't be bothered to add signage explaining why the fence was present, thus either it isn't important to keep the fence up long term, or they're a idiot, in which case there's no reason to belive they were correct in deciding that having a fence there was useful in the first place.

With this fence, if you walk a few meters to the side (read, "select 'ext4_list_backups' and hit ^F and ^G a few times); et viola, there's your signs.


Take down the fence. It's in the filesystem code, what could go wrong?


'Without digging deeper' isn't really what matters here. A better question would be 'after some very cursory further digging, are we worried?' The first thing I ask myself when I see something that looks as wrong as this does is "Is it possible for this piece of code to be completely broken, and for no one to have noticed yet?"

This is really a question about two processes - the one where people look at, read, edit and discuss the code, and the extent to which this is discoverable by them. And the one where people run the code and expect certain things to work and would notice if they didn't work, or didn't work correctly.

This fails both of those checks. This file has been edited 8 times in the last year, by 5 different people. It's not so long that several of them wouldn't have read through the whole thing. This bug, if it was a bug, is unlikely to have persisted for 6 years without being noticed. It's not obfuscated or deeply technical in any way.

Secondly, this is code which is fairly critical to millions of systems. Ext4s is in literally hundreds of millions of devices. It's used in data centers in contexts where people are likely to follow up on any unusual behavior by their filesystem. The code in question is likely to be very well exercised in tests. It's likely to be extremely well exercised by people doing research stress testing/comparison testing of filesystem code. It is undoubtedly very well exercised by real-world deployments.

The bar to assume that you know something the person who wrote the code didn't, as opposed to the other way around, should be very high here. Even taking all the benefit of the doubt (which you should always do, never assume that stuff 'probably works' without evidence) it's not hard to dismiss this.


> The first thing I ask myself when I see something that looks as wrong as this does is "Is it possible for this piece of code to be completely broken, and for no one to have noticed yet?"

Good reflex. In my first job out of school (big research lab), we would occasionally find odd bugs in the code base (as would be expected...bugs happen). The head of the project would often wonder, “how could the system ever have functioned with this bug? Someone must have deliberately tampered with the source to discredit us.”

The first few times I thought he was joking, but no: he seriously believed this and even tried to investigate.

This was years before source control, which would have answered the question.

Edit: fix iPad-caused typos


Three years ago, I was sent as a consultant to help a struggling small, local health insurance agency. They had a three man team, the lead had been there for fifteen or twenty years (built their system). He assured me on the first day that they didn't really have any bugs and that the system was pretty stable.

We were called in to help them because their management (and users) were aware they were underwater with the amount of bug fixes, live production data changes they made to keep things working, and impossibility of adding anything new to the system without tipping something else over.

It seems the attitude that "there's no chance we have any bugs" is pretty prevalent anywhere I've ever been, along with the usual mix of "how did this ever work" which I still always find pretty shocking to run across.


The problem with this argument is that this particular snippet of code doesn't get run very often. It's the bit that works out where the superblock (and all its backups) are stored. Now, normally, only the one copy of the superblock is ever read - the backups are only read if the first copy of it is corrupted in some way. So almost all the time, the results of this code are not used. So if it were broken, then yes it could possibly go unnoticed for 6 years.


That's only a problem with one of the arguments, and is exactly why there are two approaches. Some errors are hard to detect by reading or working on the code, but easier to detect from behavior. Some are the other way around.

In this case, I do not agree that this could have gone unnoticed for a long time, even only looking at the behavior of the running system. When multiplied by the number of installations and how much filesystems are used, it is run a lot. It's also the kind of thing people investigate when it happens in certain settings - we lost a block and our whole filesystem was corrupted. And the kind of thing people test for.


I agree. The relevant comment is far away.

It’s a local variable so it seems that there would be little impact in renaming it (and the other two). But if the local (kernel in this case) coding conventions discourage variable renaming, a brief comment referring to the later comment would be worthwhile.

Over commenting is a problem, but this would a good use case if renaming is not appropriate.


Not really. Nobody sane would think other sane developer with any expeirience would call a simple variable according to what it initially contains. You'd expect either name related to what is the meaning of the contents, or meaningless name.


People make typos and simple mistakes all the time, regardless of experience.


True. But the intention of writing

    int three = 3;
is insane.

I would never ascribe to previous programmer (unless he just learned what variables are) the intention of creating variable (not even constant, which wouldn't make it any better) named 'three' and assigning actual value 3 to it.


If that is insane, you may not have issue with this specific line but you surely have issue with the 2 “insane” lines that follow it?


All three lines would be "insane" if the variables weren't mutated. As in a variable named "three" is insane if it solely contains either of the values 1 or 3.

When it contains other values it is still subpar variable naming.


If something looks insane, but probablu sane person made it and it works then that's usually good indicator that you are missing part of the picture and should look around. It still might be bad, but probably not insane.


+1 seen it so many times, often times from some cut and paste heavy refactor, one is more or less going on autopilot just moving substrings around and then bam. Review process hopefully catches it, when it is a bug, and potentially when the semantics are unclear...


When

  unsigned three = 1;
is accompanied by

  unsigned five = 5;
  unsigned seven = 7;
what would be the sane conclusion?


The reasonable conclusion is that you should read the surrounding context and try to understand what these variables are used for. The purpose of code review is to understand the code enough that you can spot real bugs; I try to avoid flagging surface-level stuff unless it's really important. IMO, this is a borderline case due to the very clear comment (which is a bit too far away).


And then, having located that comment, try to rename the variable to something more meaningful. Even gap1, gap2, and gap3 would be more informative.


We have a difference of opinion here. I lean away from bikeshedding in favor of finding more substantive issues. I'm also an old-timer and I'd call this a 3, on a scale from 1 (easily readable code) to 10 (a suitable IOCCC entry)


It's also true that unless I had a specific mandate to touch that code, I wouldn't. If nothing else it confuses the commit history, obscuring what I was actually in there to do.

But if it were my job to touch these lines of code, I would want to leave them better than I found them.


That there are some entities (maybe ordered, maybe related to numbers three, five and seven in some way) and there are some values associated with those entities, two of which (accidentally or not) need to be initially set to corresponding number, but the third one possibly not, but better look around in the comments or code to understand and make sure the code is correct.

Then maybe leave a comment right after `= 1;` why this variable shouldn't be initialized with 3 that would clear up confusion for you if it was there.


That the code was edited more than once on its lifetime, and in one of those times (not the last one), by someone with pretty bad naming habits.


These are variables that will change anyway, so the initial values are probably not that relevant to their name?


In which case the names should be cnt0, cnt1, cnt2 or something along those lines.


Lesson: dig deeper.


The problem is not that it isn't clear.

The problem is that it is extremely clear in a way that makes you confidently think something totally wrong.


That situation is one of the most costly mistakes, in every way i can think of, in software engineering period.


IMHO a better approach would be to make a helper function that initializes those variables, rather than relying on all callers to do it every time.

Then you could put the relevant comment about the weirdness of "three = 1" there, and callers wouldn't have to remember the magic sequence.


The sound of Linus typing in all caps intensifies


Pow3, pow5, pow7 still sounds wrong without explanation why pow3 starts at 3^0 but pow5 and pow7 start at 5^1 and 7^1


And the scope for "three" is quite small, a single short function. It's not like it's using a global variable or #define.


it is IRRATIONAL code (maybe rational within context, but ...)


threes_cursor fives_cursor sevens_cursor



keypusher on Feb 25, 2014:

If only there was a system for fixing the code l yourself instead of having to post about it to a widely read tech site.


Magic numbers are a horrible pattern (and I don't mean elf/obj IDs).

*ANY* time you use a non-obvious constant (like start of a loop that isn't 0), good practice to comment immediately at the source, or pull it into a well-named variable/macro.

Don't agree? Spend 35 years programming and you'll see this pop up again and again and again and again... SO much time wasted figuring out magic #s.



Yikes, that looks error prone.


Another fun one is ONE = 256 (or another power of two).

Intended for fixed point arithmetic where 1 actually means 1/256.


This is nicer example, since the posted one should really have different variable names (e.g. pow3). On the other hand, if this declaration of ONE is global (not contained within some context like a type), then perhaps it should be something like ONE_F8.




I don't know about the ext4 code specifically, but in hardware programming it is common to have stuff like "3 means 5". If for whatever reason a field of a register may only accept values that actually mean 1, 5 and 7 (to program whatever bits of the hardware, like clock divisors or whatnot), the hw register will probably only contain 2 bits, and be encoded in such a way where if you write 00b you mean 1d, if you write 01b you mean 5d and if you write 10d you mean 7. And 11b is reserved. This is a common thing, it's encoding to pack a sparse set of possible values into a few bits as you can.

But then, you should probably do something such as "#define CLOCK_DIVISOR_3 0x1" instead of simply "int three = 1".


It was done that way on purpose to cause hundreds of people a year to freak out and discuss it ad nauseum. But really a social experiment designed to show everyone is all talk and no follow through... That being that no one actually tried to change open source but just talk about it.


that's interesting, is there a source?


What wrong with:

  _3 = 1;
It's shorter. I know, _ is probably reserved, so maybe:

  x3 = 1; // x is not multiply
There. I fixed it for you.


It's actually worse than that, given that all 3, 5 and 7 have 1 as their power-0, and the rest seem to encode powers of the former. Too much WTF and not enough comments.


to me it is a great message that if you have attitude that your predecessors were stupid morons and you don't understand what happens here nor willing to bother to dig enough deep to understand it, you shouldn't be doing changes here. Attempt to correct it is a kind of warning signal for a PR containing the correction.


That was 2014, what does it say now?



That's smart; Imagine user setting vm.swappiness = 100 in /etc/sysctl.conf


For small values of three.


I wonder why a struct to hold the cursor state wasn't used.


surely we need another linter that can catch such issues


It's in the repository of a certain "torvalds". Better check twice next time what this guy's doing :)


I once spent two days debugging confusing behaviour in a serial command interface before discovering the line:

    #define CR_LF "\n"


man, that hurts. When you found the issue were you happy or were you just depressed? I've been there and sometimes finding the root of an issue is not a eureka moment but more just a long, deflating, sigh.


I was grumpy. Sometimes people are inexperienced and don't know how to make simple clean software and so end up with subtle bugs. I can understand that and dealing with it is part of life as an engineer. This was just laziness and there's no excuse for it.

"Write software like the next person to work on it is a psychopath and they know where you live"


Did you then rename the constant to NEWLINE to help the next programmer?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: