Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Taming names in software development (simplethread.com)
77 points by jetheredge on Dec 11, 2022 | hide | past | favorite | 43 comments


What this misses — and every other article I’ve seen about naming misses — is that the truly hard part of good naming is none is the stuff that was mentioned.

The hard part is this: taking the deserved level of care with naming often has to be done in a context with other humans who wrongly think that it’s simply not that big a deal, and their annoyance with having it brought up becomes an often unspoken source of friction. This not only leads to rancor but also has a chilling effect.

Even making the unspoken spoken often doesn’t help. The response will be like: oh yeah, choose some good names for us, knock yourself out, it’s great that somebody cares (but let’s not stop calling the timestamp a counter, or the database temmp_v3_old_Udpate_RstdnewV2, because reasons).

It can depend on the team, but in my experience successfully getting past these issues is usually way harder than any of the other factors mentioned.


This came up at my work Christmas party. One colleague on my team (1) hates my reviews because the naming things I point out are way too pedantic. Another colleague (2) in our same team loves my reviews because it makes them think about how it's read and understood.

It's difficult to convey why naming is important. (1) also feels that the code needs a lot more comments and berates others for not commenting their code. I have a hard time pointing out the irony to them.


One thing that I've found over and over again, is that if something is hard to name, there is often a problem with the abstraction itself. Most common issue is that the thing that resists naming is doing multiple things and that it should be split up in smaller parts that focuses on something small that can be simply named.


I've also discovered that the problem with the abstraction itself that the naming difficulty is pointing out might be the mere fact that this abstraction exists in the first place.

We've got such a fetish for reusability that it often overwhelms any care we might have given to comprehensibility.


I usually find in every project that there are few terms that are overloaded to mean multiple conflicting things.

On one project it was so bad that every time a particular word was used in conversation I had a mini freak out because it essentially referred to six related but very distinct things. I had to figure out which one people meant, assuming they even knew themselves.

It wasn't like these things were subtypes of that word either. The word had just been bastardized beyond recognition.

I tried to make up six new meaningful names and "ban" anybody from using the original word but it didn't stick particularly well. We never did manage to exorcize the original word from the code base, so it hung around misleading people.

It did lead to bad abstractions but the root of the problem wasn't the abstractions themselves but the fact that this word was used so indiscriminately and inappropriately.


And this is why code reviews while helpful can be wasteful. Reviews should be based on existing standards not preferred methods of each reviewer. Some people feel they need to comment so they over comment. Some feel they need to prove how much they know. Others may mark you down for doing it how the previous review recommended.


i found that even with existing standards, some people use to adapt the meaning to their own ends because either what they think are ALWAYS right or they really just don't care.


> but let’s not stop calling ... the database temmp_v3_old_Udpate_RstdnewV2

I worked briefly on a codebase that was otherwise produced by Chinese programmers (in China).

It did feel somewhat surreal seeing variable names that were in English but misspelled, particularly when e.g. several classes all had a field of the same name, except that in one of them, the name was misspelled.

I assume it didn't bother them because (a) they weren't English speakers anyway, and (b) they autocompleted everything. Why would it matter that `update_history` happens to be spelled `udpate_history` in one out of five classes? You just type `u` and pick the right field.


I've seen the same thing many times with English-speaking programmers. Plenty of people just don't spot that 'udpate_history' is misspelt.


And sometimes that mistake propagates and leaves us 25 years later still using referer [0]

[0] https://annaken.github.io/a-brief-history-of-the-referer-hea...


This one always seemed to me like a plausible alternative spelling (cf. traveler for traveller).

Maybe the reviewers who let this through were speakers of British English who knew it was misspelt but conceded it without a fight, having already lost on color, gray, center...


Traveler is stressed on the first syllable; referrer is not.


This is me.

The invention of the symbol name spellchecker a decade or so back has been a wonderful boon for me. It's a little annoying to teach it the new jargon when I'm getting started in a new codebase, but it's easily a net time saver. Catching spelling errors when you first create the new symbol is always cheaper than fixing them after it's being referenced from 15 different files and can't be fixed without another code review.


I used to think descriptive variable names were a waste of time (a,b,c, even things like booya, foo, etc), until I had to maintain my own code from years ago. I literally had no one to blame but myself for the extra deciphering work I had to do.


Naming is simple when the function or method is simple.

Complex naming is often, but not always, a good indicator of complexity.

Unless of course, it's Java... https://projects.haykranen.nl/java/


How locally a variable is used should matter for selection of a name, too. I don't find i, j, k, o, p, k, v at all offensive if their scope is just a few lines of code. Usage is often idiomatic (e.g., k, v for iterating a map or i for an integer loop variable) and using a longer name would just make it less idiomatic and less obvious.


This makes a lot of sense.

I also prefer for-loops that uses "i". It is instantly clear that "i" is the current index used by the loop. Even though it is a single character variable name, it has a specific meaning by convention.

If I see a variable named "people_result_list_index" it actually hurts the readability. I don't know that it actually is local to the for-loop, as it could have been defined anywhere in the code or even passed as an argument to the function. It actually hurts readability and adds complexity.

Using "i", "k", "v" and other single character variables outside for-loops is often not advisable. An exception to this could be "x", "y" and "z" if they refer to positions in 2D / 3D space. Personally I would probably wrap them in a structure, so that you could refer to them as pos.x and pos.y. But I wouldn't hold it against someone if they thought the code was readable without it. It is basically part of the domain knowledge. Other domains may have similar exceptions. The "R" value in terms of growth rate comes to mind.

TLDR; Single character variables can make sense in the right context, when they are used as part of a convention or domain terminology


Personally, I found there are two important aspects of things being named that are hard to express in names:

1. How important the thing is when it comes to big picture. Is this function where the meat of the program is, or it's just a technicality? Is this the main data structure or just something temporary? Good names should tell me what to focus on when reading the code.

2. Whether the name describes only the thing as it is, or actually prescribes what its use is. For example, LinkedList is descriptive because it tells only that the thing is a data structure, but it's up to you how to use it. On the other hand, CustomerRecord is prescriptive - it might be just a bunch of strings, but it also tells me what the intended use is, which is not necessarily contained in the code itself - it might be just some boilerplate to manage it in the database.


In JavaScript, you’ll often see i, j and subsequent letters as iteration variables. i is not descriptive, and j is somehow even less so.

Using i & j etc for indices dates back to older versions of FORTRAN where the variable type depended on the first character of the name, with i- n reserved for integers.

Quite why the convention has persisted for so long is one of SW Engs little mysteries.


Surely it predates that as well, I assume FORTRAN chose ijklmn as the integers due to i,j,k being used as indices in mathematical convention, (as well as n for sequences and m,n for matrices.)


> Quite why the convention has persisted for so long is one of SW Engs little mysteries.

I like it - it's a convention that has clear context once it's initially understood. It saves the argmuent of "is it an index, counter, loop iterator, <something else>" - it's the i/j/k'th loop's index.


> I understand exactly what BasicReviewableFlaggedPostSerializer is on my first time seeing it.

Good for the author, I certainly wouldn't. Things I'd have to research in the codebase before understanding the name are:

- what is Post?

- what do Flagged and Reviewable mean here? Are they attributes of the Post or the serializer?

- what does Basic mean? Again, what is this referring to? Is it indicating that the class is some kind of base class for an object hierarchy?


"In software, really good names are meaningful, descriptive, SHORT, consistent, and distinct." (emphasis mine)

I hate this general reccomendation style that names need to be short. This only made sense in the old times of programming where you had to actually type them. The reality of IDE's bringing all forms of intellisense and autocomplete means you almost never type a name out, thus being short brings no benefit if not "habit". You should really try to have understandable names, detailed names, but not care about shortness. "timeout" is a good variable name if you language has some great type system and your coding works with that. "timeoutInSeconds" is a better one if you are just using an int/long to distinguish it from "timeoutInMillis" and avoid silly mistakes.


I disagree. The utility of short names is not just that it takes less effort to write them. They're also far easier to read and understand. IDE autocompletion doesn't help with that, nor does any other tooling, really. Since code is read much more frequently than it's written (including but not limited to any time that related code has to be changed), names should be as short as possible without sacrificing clarity. Excessively long names are harder to parse, and can slow you way down when trying to understand code.

(I'm arguing against what I see as your central point, but to be fair, 'without sacrificing clarity' is doing a bit of work in the paragraph above... your example is actually a good case of a bit of additional length being actually worth it. I would say timeoutMs or timeoutSecs are good shorter alternatives, "ms" and "secs" being widespread and clear abbreviations. You're completely right that 'timeout' is insufficiently clear for a purely numeric type, though I'd disagree that you need all that much type system magic to make it OK. For example calling a `java.time.Duration` `timeout` seems fine.)


The problem with `timeoutMs` or `timeoutSecs` is that if you have a policy that you shouldn't contract words (possibly founded on a first principle that you value clarity in your coding standards), then you're going to spend time justifying why a pull request gets rejected when someone names a type `SearchCntrlr` or `SubmtBtn`. Before you know it, you'll have spent hours just debating and getting no work done, whereas you wouldn't have the problem if you just spelled out `timeoutInMilliseconds` or `timeoutInSeconds` fully.

I mean, what value do the shortenings `ms` and `secs` provide, anyway? Saving keystrokes? You could still type our `timeoutms` and the IDE's autocomplete would suggest it for you, right?


Also is it timeoutMs, timeoutMS, or timeoutMillis?


> thus being short brings no benefit if not "habit".

IPersonallyFind tonsOfCode likeThisReally hardToParse, especially when the least important parts have the longest names.

As a result code like that can be overwhelming and sometimes make me dread working with it.


I find they also often indicate over-abstraction or over-complicated generic stuff that is often kind of irrelevant to the domain.

Equipment_Maintenance_Criteria isn't super short, but it appears to actually mean something. Definitely tells you more than just "Selection" would. But all too often it's called something like AbstractServiceFactoryBuilderManagerLocatorPlugin which really doesn't tell you anything at all and leaves you at the end knowing less than you did when you started reading it.


Yes, long names can be taxing too if taken to extremes. I use descriptive names that spell out the domain or business logic so code becomes as close to self documenting as it gets. However, locally when I have to reference these multiple times I use a short alias, usually the acronyms of the long names so it’s the best of both worlds: don’t have to carry around the long names everywhere but still have a fallback on them when I forget what they represent.


do_you_prefer_underscores? iKindOfDo i_kind_of_do


In Emacs: M-x glasses-mode


Long names should be a signal that you’re breaking away from current context or doing something unusual. Unnecessary length and redundant context makes names more difficult to discern. I wrote about it a bit more in the “what?” section of this post: https://max.engineer/maintainable-code


Perhaps "short" is a shorter way of saying "quick to parse". Even with IDEs, I still want that, lest I end up having to choose between names like this all the time:

- MinimumPriceCalculatorFactory

- MinimumPriceCalculatedPriceFactory

The extra couple of seconds every time becomes distracting and irritating.


In small methods I tend to use shorter names, even very short non-descriptive names, because there is less context so less chance of confusion, and it makes it easier to see what's going on in a glimpse (and check it matches method name).

On the other hand, if some public method does multiple things that need to be known to decide where it can be called, I put them all in its name.


> I understand exactly what BasicReviewableFlaggedPostSerializer is on my first time seeing it.

I don't. I think I figured it out after reading it half a dozen times (except for Basic, no clue there) before working out that Post is probably a noun. So even this requires context to just read and know what it does, my first read of it I only knew what Serializer meant.


Or maybe I'm still not getting it, my read is there are posts, they can be flagged, flagged posts can be reviewed, and this is a "basic" serializer for flagged posts that have yet to be reviewed.

But why you'd need such a specialized serializer is beyond me, (let alone presumably less basic one as well) it seems like such drastic overkill that maybe I still don't get what the name means.


> Or maybe I'm still not getting it, my read is there are posts, they can be flagged, flagged posts can be reviewed, and this is a "basic" serializer for flagged posts that have yet to be reviewed.

I think you are right. The name is pretty clear to me, but that may be because I have worked on similar code bases where this naming is used by convention. Reading code requires knowing the domain and I'm not sure if a shorter name is more clear. You need domain knowledge to know what a post is, and what it means that it has been flagged.

You probably made a very accurate guess based on your knowledge of forums and moderator systems. This may not be apparent to all, and shorter names will probably not help much. In addition, if they shortened the name to "Serializer", "PostSerializer" or even "FlaggedPostSerializer" it could conflict with other serializers in the project.

> But why you'd need such a specialized serializer is beyond me,

I totally agree with your point. They may have their reasons, but it seems to me that a "ReviewableFlaggedPost", "FlaggedPost" and "Post" should have very similar needs and could be solved by structuring them differently (perhaps by using composable classes that can each take care of their own serialization)

Regarding the use of "Basic", it also triggers a "code smell" reaction from me. It may make sense to them, and it's hard for me to make any definitive comments without knowing the rationality behind it. My guess is that they have different types of responses based on the same "post" object. "Basic" may include a subset of the "Full" response, such as id and title only.

In those cases I tend to prefer separate DTOs, like "PostSummaryDTO" and "PostDTO" that can be re-used by composability for different responses (flagged for review etc.). This may of course not be the best choice for all usages, so I would need to know more to say something conclusive about this particular case


Yep, I searched github for it and it looks like exactly that for Discourse. I don't have an problems with the code now that I've seen it (though still don't know why it's Basic except that it's a subclass of BasicReviewableSerializer, which my question would extend to.)


Once you understand the domain you're working on and you've architectured your solution in a way that makes sense, only then naming in your code will get right and without much additional effort. Forget about naming, it is a side effect of your understanding of the issue at hand.


Exactly. Variable naming is not an isolated skill you can hone, and the fact that people are treating it like one means they are missing the point.

You won't become a great author by perfecting your grammar instead of storytelling.


In previous projects, data dictionaries helped name things like database tables and columns. In one project, DBA team used ERwin (data modeling software) to maintain a data dictionary and data model.

Are data dictionaries still in use today? Are there open source examples, books, etc. to learn from?


There are only two hard things in software development:

* naming things

* cache-invalidation

* off-by-one errors


Ap ko kakdbekkjdnsicdjd9jje disixjdjd sjxjenx xo ospenfneosjdbe do lcdlnrnepdcd




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: