On object-oriented programming: "That which obscures my code is bad."

osipov · on May 6, 2008

Whenever I hear strong opinions about coding styles I politely nod, agree and then launch into a series of questions designed to understand the background of whoever is articulating the opinion.

Yes, coding style is shaped by personal preferences (where do YOU put your curly braces?), cultural affiliation (Microsofties and Hungarian notation) and a variety of other minor factors. However these are secondary to the main issue of how programmers build an underlying structure of the code. Structure is somewhat of an elusive concept, but use of object oriented programming (proper use) certainly leads to structural changes in the code. Along the same lines, use of recursion leads to a different structure than use of procedural style and so on. In short, structure for me is the underlying mathematical solution to the problem addressed by the code, regardless of the specific programming language.

So going back to the issue of the coding style, I find that the background of the programmer is so important because it shapes their thinking about typical code structures. Someone with a background of writing payroll or other corporate financial systems (like many Thoughtworkers) falls naturally into a habit of thinking about code as a close approximation, if not a one-to-one representation of real worlds objects. Others with a systems background, who excel at using Nth order pointers to save 5Kb of RAM in a device driver care less about the real worlds objects than about differences in x86 parallel processing instruction semantics.

However, if you put a systems dev guy and a corporate dev guy in a room together and make them talk to each other about programming they'll happily argue for days about how the corporate dev guy is a loser for not saving memory through pointer arithmetic and the systems guy is an idiot for not "getting it" about object oriented programming.

So, to make the long story short, conversations about how <insert programming paradigm here> rocks/sucks are pretty much useless unless one first understands the target audience for the paradigm.

gruseom · on May 6, 2008

Interestingly, research results on this come out in favor of longer functions. I read this a long time ago (in the book Code Complete) and remembered it because it went against my assumptions. Can't get at it in Google Books but a summary appears here: http://www.augustana.ab.ca/~mohrj/courses/2003.fall/csc220/l...

Examples:

  Routine size is inversely correlated with errors, up to 200 lines of code
  Larger routines (65 lines of code or more) are cheaper to develop per line of code

There are all sorts of objections one can make to this. Perhaps it indicates as much about the weakness of software research as anything. But I still find it interesting enough to temper my opinions.

pdubroy · on May 6, 2008

I don't find that surprising at all. For a given piece of functionality, if it is spread across more methods, it means:

- more methods to look at, and more things to hold in your short term memory

- more (possible) entry points to any given piece of code, so more things to test, and greater possibility of error

- more lines of code to accomplish the same task

- harder to grok the overall behaviour/capabilities of a class

And most of my experience with big OO libraries is with the supposed best cases: Smalltalk class libraries (Squeak and IBM) and the Eclipse framework.

gruseom · on May 6, 2008

Those are great points.

Here's one issue that I think is lurking in all this. Even agreeing that a large function is easier to read than, say, 10 small ones, it doesn't follow that a system written that way is easier to understand. You still have the problem of how to factor the system into large functions as opposed to small ones - a question that is not addressed by this discussion (or, I'll bet, by the research). And it's a big problem, because complex systems usually have a great deal of intertwining, partly-but-not-completely overlapping functionality. If you could factor all this into a few large functions, you'd probably get something more readable, but in most cases you can't. So in order to build a well-factored system (with minimal duplication, separated concerns and so on) you carve things into smaller pieces that can then be cobbled together to create the various behaviors you need. (At least, that's how you do it in OO.)

So the business of function length plays out quite differently at the system level than it does at the function level. The function level is easier to discuss and to study, but what matters in practice is the comprehensibility of the system as a whole. That's why your caveat, "for a given piece of functionality" is a very well-placed one. You're presupposing that the functionality has already been carved into pieces.

projectileboy · on May 6, 2008

I don't think you've identified badness with OO programming styles; I think you've identified badness in the community of Thoughtworks developers.

pdubroy · on May 6, 2008

As I said in the article, I definitely am not against object-oriented programming. What I am against is this notion that we should strive towards some kind of "pure OO style" where encapsulation and polymorphism become goals in and of themselves.

Object-oriented techniques are a part of a balanced programming breakfast.

bridgetroll · on May 6, 2008

This style strikes me as similar to higher forms of database normalization. The efforts generally outweigh the benefits.

tobinharris · on May 7, 2008

I quite liked the guidelines in that blog post. I don't do exactly that myself, but as rules of thumb they aint bad! But like a few folk have pointed out, it's about personal style, background and context.

One thing I like to do is use naming of Packages & Namespaces to tell a story about the system. So, you can look at the packages and understand what the important aspects to the system are.

Example:

- Spidering - Feeds - Pages - Control

- Indexing - Strategy - Scheduling - Lucene

Also, when you return to a code base years later, you can scan the namespace story to get a reminder of the key areas of the system.

mironathetin · on May 6, 2008

This is all a matter of personal taste.

With small methods you indeed make code hard to read and harder to debug. Plus you generate method calling overhead (yes I know, a smart compiler will inline - but if thats true, you can inline yourself). Combine atomic methods with a wealth of design pattern, and your code may become absolutely unmaintainable (all in the name of pure object orientation and better maintainablility).

If - else chains cannot be avoided with small methods (one for if and one for else). You just move the if else to the point, where the decision for the one or the other methods has to be made. A smart replacement for if else chains is switch.

But as I said, please no flamewar. I think the best you can do is to leave every programmer with his own style. That will make him most productive. All these stupid rules come from non programming managers, who want one style throughout their whole codebase, because they believe this makes every programmer easily replacable. This is no developers interest (and its not true anyway).

Pure oo style leads to some very dangerous constructs. In my experience, the true oo scholars often produce slow code, thats a memory sink (ever created a new object in a loop?).

Like with patterns, it needs a lot of experience to know, when to use them and when to avoid them. These truly simple rules, are in my opinion made for beginners (and I tend to say also: by beginners).

What matters most, is a clear style, comments, well chosen variable names. Everything that makes your code easier to understand - and that mainly for you, the author, because if someone has to maintain your code, he very likely will rewrite it. And that can be smart, because most of the time its the faster way.

michael_dorfman · on May 6, 2008

I agree with your comments generally, but I'm not sure I'm willing to say "it is all a matter of personal taste."

Would you be opposed to a guideline that no method should have a cyclomatic complexity above, say, 15?

mironathetin · on May 6, 2008

Personally I write my methods like I write text: one thought, one paragraph, one method.

But: if a method is always used as an entity, meaning you don't need to reuse parts of its code elsewhere, there is no single reason split it only because it exceeds an arbitrary number of lines. There are reasons not to split it in that case.

I know developers who write more than 15 lines of comments into their methods. That can be perfectly fine.

My goal is to fit the code of one method on my screen without having to scroll. If its one thought, it should be comprehensible at one glance (Thats again personal taste, because it depends on the size of my screen ;o).

If it has to be longer, its not a problem of metrics. Its more a problem that the design is not clear enough and the things to do are too complex.

michael_dorfman · on May 6, 2008

I agree. I'm not suggesting splitting a method that exceeds an arbitrary number of lines. Instead, I'm asking about the cyclomatic complexity (http://en.wikipedia.org/wiki/Cyclomatic_complexity)

My point being that while the original article was quite ridiculously reductionist, it should be possible to have some concrete metrics that serve as good guidelines (and which naturally could be superceded by programmer judgment on occasion.)

sanj · on May 6, 2008

Would you rather have a single date format that switches based on an argument?

Or create 20 little classes?

michael_dorfman · on May 6, 2008

The former, of course.

sanj · on May 6, 2008

Then you've broken your self-imposed limit of a cyclomatic complexity of 15.

michael_dorfman · on May 7, 2008

Which is why I suggested a "guideline", instead of a hard-and-fast rule. I'm not trying to remove programmer judgment from the equation; I'm just suggesting that there are some guiding principles that can be offered (and can be said to be more than simply matters of personal preference.)

raganwald · on May 6, 2008

I am reminded of the constant fight to explain the difference between "if" and "iff."

We observe that really great OO code has small methods that do one thing well. IF great-oo THEN small-methods.

So is the answer to go out and refactor code so that each method only does one thing? This makes a different assumption: IF_AND_ONLY_IF great-oo THEN small-methods.

pdubroy · on May 6, 2008

Do you mean that the Jeff Bay essay is a short method cargo cult? Or are you saying that I missed the point?

The original article contains several suggestions that address things other than method length. Numbers 3, 4, 8, and 9 enforce extreme encapsulation.

raganwald · on May 7, 2008

I mean that the original essay provides an interesting exercise that, IMO, would be helpful to try as an exercise.

However, I do not suggest that the resulting software--as a whole--will necessarily be better after the exercise than before. I picked method length as one example, but I think it applies to all of the suggestions.

Just because we observe some property of good software, it doe snot follow that software deliberately written to exhibit this property will be good. That's what I was suggesting.

KiwiNige · on May 7, 2008

Programs with short methods are easy to understand. My program uses short methods. Therefore my program is easy to understand.

so if follows....

Cats have Eyes. I have Eyes. Therefore I am a Cat!

raganwald · on May 7, 2008

Hmmm. Actually, I disagree with your statement as written.

"Programs with short methods are easy to understand" expresses that the set of programs with short methods is a subset of the set of all programs that are easy to understand.

Therefore, every program with short methods is easy to understand, or (easy to understand) if (short methods), or (short methods) -> (easy to understand).

I was expressing that it there exists a very large set of programs with short methods that are not easy to understand.

To make these statements less subject to procedural wrangling, given a particular working program, the set of all possible permutations and refactorings that produce the same output contains many permutations with short methods that are not easy to understand.

dhs · on May 7, 2008

I'm sure that object decomposition does not per se, automatically, lead to code obfuscation. If the design is decent, it should be, if not obvious, then at least possible without too much ado to determine where in the file tree methods shaved off larger objects should be placed so that somebody who knows how the file names are chosen, but doesn't know the code itself, can find them by looking at a list or a diagram.

I find a "limit" at indentation depth 1 a bit overly harsh, but I at least casually consider it whenever I reach depth 2, and at depth 3, I really try bailing out, foregoing "else", too. And sometimes I even do it at depth 1, and get a nice fuzzy warm feeling from that. Edit: This was underplaying it too much. "Callisthenics", cleverly used, really do help to organise the code if you have something conceptually larger, e.g. a DSL, to capture the knowledge as it is abstracted from the data.

Then again, I prefer using function composition over inheritance whenever I can, so my whole work method is geared towards managing many small files. Systems of names, essentially. I think a lot in terms of names and naming systems.

Transparency, AFAICT, needs to be built into the design, via an ongoing effort for the seperation of concerns (systematic naming is crucial in that). Which has me trending towards smaller objects over the years. That's just my experience; YMMV. But I don't see how moving from, say on average, 50-LOC-objects to 100-LOC-objects now would make my codebase in any way less "obscured".