It’s Faster Because It’s C

tptacek · on July 13, 2010

No. C programs have fine-grained control over the memory layout of all their data, and thus are far better positioned to exploit caching in general and optimize for locality in particular. It's an attractive fallacy to suggest that most programs are I/O bound and thus are equally performant in Java and in C; while that statement does bode well for async code and poorly for threads, it's not as relevant for language comparisons.

jrockway · on July 13, 2010

But higher-level languages can write the machine code that "exploits caching in general and optimizes for locality in particular" for you. You teach the computer how to do that once, and then you get it for free from then on. You can program your problem domain instead of the solution domain.

Most people using C for high-performance computing are just using it to glue together the high-performance libraries, anyway.

Remember, most people are writing big applications, not tiny procedures. If you just want to multiply a few numbers together, sure, C is going to be fast. If you want to build a complicated application, then C's advantages are going to be almost unnoticeable and its disadvantages are severe.

tptacek · on July 13, 2010

I think this comment oversimplifies the situation at both a tactical and a strategic level.

Tactically, even code that simply glues libraries together is still maintaining state, and there's clearly a benefit to being able to efficiently manage bit-level control over how that state is laid out and looked up. Lack of bookkeeping and overhead, cache-cognizant data structures, and simple compactness of representation all add up. Not to mention the fact that the top of the profile for lots of programs is malloc(), and C programs can swap out allocators --- I've gotten 500% (five hundred percent) speedups from retrofitting arena allocators into other people's code.

Strategically, well, there's a term for the strategy you're advocating, and it's called "Sufficiently Smart Compiler", and it's a punchline for a reason.

jongraehl · on July 13, 2010

In theory garbage collectors could do a good job maintaining locality of data structures initially allocated in a local fashion. In practice, they don't, but compacting GC may be better than fragmentation for the average non-cache-aware program. In the case of Java, you have the additional problem that everything on the heap is boxed (except primitive arrays and primitive members of objects), which definitely strains memory (both capacity and bandwidth).

statictype · on July 14, 2010

But higher-level languages can write the machine code that "exploits caching in general and optimizes for locality in particular" for you

Are there any in existence that do this particularly well (ie, at least as good or better than a reasonably experienced C programmer might)?

jrockway · on July 14, 2010

Sure, lots. They use the same C libraries that the reasonable experienced C programmer would.

(Also, take a look at some of the ghc on llvm benchmarks, it's very competitive with C, and doesn't require you to jump through any hoops. I'd link you, but Google is blocked at work due to incompetent firewall rules. Sigh.)

barrkel · on July 13, 2010

You overstate the degree to which large applications written in C have policy discretion over all memory allocations, to take your specific example. Compacting GC, by bringing together memory allocated close together in time, arguably has higher benefits for caching. The bigger problem is avoiding indirection, and that's a place where Java is weak in comparison to e.g. C#, as C# has value types. You can go further with some tricks in C, such as e.g. allocating string storage associated with a structure past the end of the structure's allocation itself, but here you're often just trying to regain what you lost by lack of GC.

OO-itis is something which definitely will sap performance if left unchecked, but it's true whether you're using C or Smalltalk. Much like denormalizing relational databases, careful placement of data members can reduce cache misses, and avoiding long chains of dereference for data commonly accessed together is important.

tptacek · on July 13, 2010

I don't understand your argument. In terms of effort vs. reward, the one-line change of swapping malloc() with a pool allocator is probably the most effective optimization available to any software in any language.

The idea that C# (or for that matter any GC'd language) is going to beat an allocator in which alloc is amortized to a single load, constant add, and store seems... I don't know, fill in the blank.

You seem to think I'm advocating C over GC'd languages. I'm not. I write in Ruby by default, a language that is not only GC'd but badly GC'd. I just do so with my eyes wide open on performance.

barrkel · on July 13, 2010

A pool allocator is great, so long as the lifetime of the pool meshes well with the allocations out of it. But the larger your application, the less likely that's going to be true. Different libraries will do their own allocations, and it's often hard to pass the right pool through enough layers of abstraction to be sure that e.g. that string got allocated in the right pool for where it's going to get plugged in when it comes back.

I'm not saying you're advocating C over GC'd languages. I'm specifically disagreeing with the idea that in practice, in large applications written in C, that the application author actually has discretion over allocation policy. I'm saying that you couldn't in practice use a pool allocator much of the time, even if you wanted to, unless your application is very self-contained.

The Delphi compiler I work on is written in C and uses pool allocators to great effect. There's a heap allocator which can be marked and shrunk, there's a pool for every unit, there's a pool for expressions, there's a pool for constants, etc. You've got to keep track of what got allocated where, make sure you don't pollute one pool with references to another, and sometimes do a bunch of copying to ensure that. Pooled allocation isn't a panacea, even for something as self-contained as a compiler, albeit a compiler which can be hosted by the IDE, so it needs to be long-lived, manage memory across debugging sessions, multiple compiles, etc.

tptacek · on July 13, 2010

This is like saying "there will always be some code you can't optimize, so this optimization doesn't matter". I know you know that isn't true.

Real library code either owns object lifetimes (and can use pools internally because the library's own _release() function is the only thing that can free its state), or keeps its hands completely off allocation. The few counterexamples, where for instance a library malloc()'s something and expects you to free it, tend to be notorious examples of error-prone and evil interfaces.

Meanwhile, just because everything isn't amenable to pool allocation (or, even better, arena allocation, where there is zero memory management overhead at all ever) doesn't mean you don't win huge on the places that are amenable.

You are raising the boogeyman of a hypothetical library that is going to take a pointer that I pool allocated and call free() on it, blowing up the program. I assert that any real example of such a library is going to be easy to shoot down as "an incredibly crappy library".

barrkel · on July 13, 2010

That's a complete straw man, and you know it. I'm saying that the specific optimization you mentioned is hard if you don't control all the pieces.

The primary advantage of a pooled allocator isn't in allocation - though that's nice - it's that you don't have the cost of iterating through each object to free it. But if you have external libraries, they'll abstract their allocations into handles (say), and now you have the problem of running what amount to destructors.

tptacek · on July 13, 2010

Nope, I don't think it is. I'm saying that when libraries track their own state and manage their own lifecycles, you're right, you can't make them use pool allocators (unless it's worth it to you to do surgery)... but that doesn't keep you from using pools on your own state.

I think people also overestimate the extent to which C programs depend on third-party libraries for their own statekeeping.

And again... what are we talking about here? I'm not advocating writing things in C. I'm saying, it's bogus to say that since code tends to be I/O bound, C isn't a performance win for most programs. That is simply a bogus argument. That the level of performance you can get out of C is usually not worth the investment is neither here nor there. Again: Ruby programmer here.

barrkel · on July 14, 2010

Again, I'm not saying that you're advocating writing things in C.

Where I think performance advantages that come out of writing things in C come from is being rarely being able to take shortcuts by relying on provided libraries and primitives which turn out to be not quite tweaked for the problem at hand. That is, C forces you to do so much yourself - largely because it has such poor abstraction tools - that you end up with a more specialized codebase. That specialization can include cache-oriented optimization, but I don't think it's the most important aspect or unique to C such that you can't get 95% of it - to the point where it's no longer a meaningful advantage - in a GC'd language.

mad · on July 14, 2010

Do you think that is true even for modern allocators like Hoard?

In other words, I'm curious what are the underlying malloc() implementations that pool allocation so outperforms.

tedunangst · on July 13, 2010

Good luck trying to carefully align your C# arrays so that the length field (at index -1 and read accessed on basically every write) isn't in the same cache line as the 0th element the CPU across the bus is writing to.

barrkel · on July 14, 2010

The nice thing about using C# is that you don't need to use an array, if this is critical. With unsafe code, you can all but write C.

bokbar · on July 13, 2010

True, but I think his post isn't about generalized comparison of programming languages, but rather a pun towards "Its faster because its C" claims. Not that often our software can benefit from exploiting caching, especially in "release often, release early" world of web startups.

tptacek · on July 13, 2010

Not sure you're following: C programs can be optimized to take advantage of caching (or to use a low-overhead allocator, or to use more or less compact representations) in ways high-level languages either retard or disallow altogether. And all software benefits from caching.

I think your real observation is that C level performance just isn't that relevant in scale-horizontal network-bound programs, and I agree with you.

silvestrov · on July 13, 2010

A lot of average C programmers write high-performance linked-lists in C, while they could have used high-performance hashtables in Python/Ruby/Java with the same programming effort.

(and no programs are high-performance if they have a pointer bug that makes the program crash).

"The difference between theory and practice is small in theory and large in practice..."

tptacek · on July 13, 2010

Who writes high-performance code using linked lists? Linked lists are awful for performance. Every --- every --- large C project I've been involved in has a good generic hash table and a vector-style resizeable array.

barrkel · on July 13, 2010

If that really is true, it sounds like you don't have much experience in C, to be frank. C apps abound in fixed-size statically allocated arrays and linked lists with next pointers (often multiple pointers, if the objects are part of different lists) embedded in the objects themselves. The hash tables often aren't broken out until the poor performance shows up as a bottleneck.

tptacek · on July 13, 2010

My resume isn't too hard to find. There's even some C code out there with my name on it if you look hard enough.

Performant C code doesn't use linked lists. Linked lists shred caches and allocators. Your go-to data structure for storing resizeable arrays is... wait for it... the resizeable array. I call mine "vector.h".

(That performant code often doesn't want a hash table is also why I backported STLport's red-black tree to C specialized on void* in my second-to-last major C project, but I'm just saying that because I'm proud of that hack.)

scott_s · on July 13, 2010

The kinds of lists that barrkel are talking about are what I call hooked-in lists as opposed to the container style lists seen in the STL. That is, objects that already existed are linked together through fields inside the object themselves. Since you're probably already operating on the object, it's not quite as bad for cache performance as the container style lists. Further, since the objects already exist, the memory allocation overhead is minimal. All objects of that type need an extra pointer, but there's no allocation required to add something to the list.

This is how the Linux kernel maintains just about everything. It's also how I've implemented lists inside of an allocator: http://github.com/scotts/streamflow/blob/master/streamflow.h...

tptacek · on July 13, 2010

I get that void* lists are even worse, but randomly malloc'ing a bunch of struct-foo's and linking them with pointers is worse than keeping a pool of memory chunks of struct-foo size and maintaining a length counter, since hopping from foo to foo isn't likely to hop from page to page.

scott_s · on July 13, 2010

With hooked-in lists, you rarely need to navigate through the whole list. You usually just need to get the next one. Having implemented hooked-in lists, the backing for them was a fixed size of memory chunks, but that was one level of abstraction lower. The Linux kernel does something similar.

(Felt I should clarify that it's nothing about hooked-in lists that prevent you from navigating the whole list. It's just the algorithms you end up using to solve the problems that pop-up at that level of systems programming.)

tsally · on July 14, 2010

Err... why does the existence of shitty "C apps abound" have anything to do with how performant C programs are written?

jrockway · on July 14, 2010

Even I agree with this. You can't just do whatever you want in C and still get good performance, you have to have a clue, just like with higher-level languages.

I think the original article should add, "performance tends to depend on the programmer, not the language".

16s · on July 13, 2010

Yes and these containers are built into C++ so no need to role your own.

tptacek · on July 13, 2010

The side-discussion we're having now is also why <list> and <slist> are a pain to use, and why Stroustrop's examples tend to use <vector>. Storing things in contiguous memory and incurring occasional copies is usually a better plan than forcing every read to follow a chain of pointers.

th0ma5 · on July 13, 2010

these kinds of varying opinions that seem like wide schisms, or maybe just ultimately are different problem areas, often come up when i'm reading about C, and that makes me think "i'll learn it more when people figure out how to use it" ... is that fair?

jacquesm · on July 13, 2010

> "i'll learn it more when people figure out how to use it" ... is that fair?

No, because you could be learning that today and get a head start on your own future that way. It's not that hard to understand and once you've seen how the clock ticks from the inside it becomes a lot easier to build other clocks.

Lists vs resize-able arrays is a very old debate, both have their places, it all depends on the problem you're trying to solve which one is the most appropriate. If you code your stuff in a not too stupid way (say using a macro for your accesses) it's trivial to replace one approach with the other in case you're not sure which one will perform better.

Lists work well when they're short and you use a an array of pointers to the beginnings of your lists and maintain a 'tail' pointer for fast insertion in case inserts are frequent. If you use resize-able arrays you probably want to allocate a bunch of entries in one go, roughly doubling the size of your array every time you need to increase the size. That seems to waste a lot of memory, but since it is heap memory (allocated using brk) it (usually) won't be mapped to real memory until you use it, so the overhead is smaller than it seems.

I hope that's correct and clear :)

scott_s · on July 13, 2010

Only if you're comfortable never learning it.

jbooth · on July 14, 2010

If you need to scan/remove if appropriate? They're optimal for that.

ramy_d · on July 13, 2010

yeah i was wondering something. Let's say we are bound by I/O, wouldn't we reap more benefits by using lower level languages in the event that I/O boundaries improve? like faster drives or types of drives. Same applies for memory.

shouldn't the paradigm be "least amounts of bottle necks possible"?

Also the points mentioned by other comments are also valid (battery, VM behaviours)

However, his final point, or at least the crux of it, still stands: "Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third."

btilly · on July 13, 2010

Moore's law says that CPUs improve faster than I/O. Therefore any program that is I/O bound today is likely to be I/O bound for every future generation of hardware. And programs that are not I/O bound today, may become so down the road.

Incidentally on the final point, there was an interesting test that was run many years ago. Multiple teams were given the same spec, but were each told to optimize for a different characteristics (speed of development, speed of execution, maintainability, memory use, etc). Most of the teams managed to come in #1 on what they were trying to optimize for. The team that optimized for maintainability came in #2 on most other characteristics. The team that optimized for speed of execution came in last on most other characteristics.

The lesson from this is that a consistent focus on maintainable code results in more of everything else you need. Yes, there really are times that you're writing throw-away code and can forget all that. But by default, code well and let the other details take care of themselves.

petdog · on July 13, 2010

> there was an interesting test that was run many years ago

Source? I'd really like to read about it.

btilly · on July 13, 2010

I read it in Code Complete. I don't have a copy handy to track down the page though.

joe_the_user · on July 13, 2010

Let's say we are bound by I/O, wouldn't we reap more benefits by using lower level languages in the event that I/O boundaries improve?

There are unlimited number of potential future requirements that a program might have to meet. You should only code for these if you have a concrete basis for believing these will become real at some point.

If there's a serious chance your sedan will do stock car racing, then you outfit it accordingly. Otherwise, that super-muffler's just an unnecessary expense and something more to break.

Edit: You'll notice that in average car, every part has about the same quality, power and durability. In a sense, engineering is actually about achieving the least cost and the largest number of bottle necks, since any unneeded quality is wasted time and money.

skybrian · on July 13, 2010

That's not a great analogy. Using cheaper components to cut costs requires more engineering effort; this is a form of optimization. For a prototype, you might just use more expensive components if it's convenient, since the per-unit cost doesn't matter much.

When writing software, there's no direct analogy because usually the components we build our programs out of have hardly any per-unit cost. We don't save any money by using a crappy third-party library over a high-quality one. Using a library with lots of unnecessary features may be cheaper both for prototyping and for the final product.

wheaties · on July 13, 2010

When I was younger I would have agreed wholeheartedly with this article because he seems more knowledgeable. After a few years experience I would have disagreed with him. Now I'm experienced enough to realize I have no idea if he's right or wrong but he seems to make reasonable points.

I've only worked in I/O bound, memory bound, and CPU bound code before (but never at the same time.) My hats off to anyone or group that has to work in these types of situations. Guess that's why I'm not a kernel level developer.

btilly · on July 13, 2010

My experience says that he's exactly right.

I've had to sort out a lot of performance problems. Almost always they were algorithm problems, architecture problems, or some simple bottleneck. Only once have I encountered a performance problem which was best solved by writing in a lower level language. More than that, my experience says that people who brag about how they've designed for scalability have generally made stupid design mistakes that cost them huge amounts of performance.

In fact here is a performance tip. If you want performance, make sure you have verbose logging options written in. Because when you hit performance problems, it is incredibly valuable to flip logging on, take the logs, study them, and identify your performance problems that way. Try it. For most applications that will matter a lot more than what language you write it in.

jknupp · on July 13, 2010

In all of the performance sensitive systems I've worked on, "flipping logging on" in production is simply not an option for, well, performance reasons (i.e. the reason that it wasn't turned on in the first place). If you're then going to test in a non-production environment, why not just run it through a profiler rather than trying to divine performance problems from log statements?

btilly · on July 13, 2010

Depending on your architecture, logging can be turned on selectively. Just log one process/thread/database handle/whatever. Sure that one will be slowed, but the rest of the system still runs at full speed so you're OK. This can be an invaluable tool.

For instance a tool I incorporated into one system would see a parameter to a web request and would issue a command to Oracle telling it to log everything that happened on the database for that connection, and then would turn it off afterwards. So, for instance, we could take a slow web page, add a parameter, and a minute later be studying a log generated by Oracle telling us exactly what that web request did, and where it spent its time.

Having the ability to selectively do this on the live production system against live data with a problematic request while it was being problematic was huge. We were tracking down problems that only showed up in production, under production load, so no amount of profiling in development would have helped. Using the same idea, every day we would just take one random database handle, turn logging on for half an hour, and use it as a canary to look for potential problems. We found a lot of things that way.

Addendum (added later) It is also worth noting that in many horizontally scaled systems you can trivially have a fair amount of logging, even in production, if you're willing to accept a constant factor overhead in inefficiency. This can be utterly invaluable in tracking down latency, bottlenecks, and other larger scalability problems. Every large system that I've seen that was well-run did this to some extent.

Ralith · on July 13, 2010

Profiling would be far more informative than even the most spammily verbose of loggers, too.

anatoly · on July 13, 2010

A data point: the first version of memcached was written in perl. It was pretty good perl, with an async epoll-based event loop and everything. It was hopeless, orders of magnitude away from the desired performance.

btilly · on July 14, 2010

I'm not saying that there are not good examples where performance matters. In fact I encountered one where I had to move something from stored procedures in a database into C++. Furthermore I can easily believe that memcached is one of them.

But that doesn't change the fact that most of the places that I have seen worry about it have not been among them, even if they thought they were. (I admittedly mostly work in the web world.)

mturmon · on July 14, 2010

I felt that article had a point, but overstated it.

There are lots of domains -- simulation, computer vision, robotic control, machine learning -- which are CPU bound and will remain so for the next decade. In fact, one of the cutting edges of these domains is always CPU bound, practically by definition (sensor densities also increase quickly with time).

cdavid · on July 14, 2010

I sincerly doubt that machine learning is CPU bound. Many things related to numerical computing are memory bound

redcheetah · on July 13, 2010

I wonder if there are any startups out there who hire folks with "kernel developer" mentality/experience. I love dynamism and excitement of startup life but unfortunately it often comes with Ruby/JavaScript, which is fine but I prefer lower-level hacking: hardware interrupts, malloc-free environment, etc. I do believe startups who need such skills exist, they just seem to be quieter for some reason. :-(

bokbar · on July 13, 2010

Come to Austin Texas! When I lived there I met quite a few folks who were hacking on some cool hardware in their garages: embedded video analytics, signal processing, etc. Maybe its the "extreme proximity" to giant campuses of IBM/Motorola/Dell that favors a certain flavor of startup ecosystem, not sure...

But thing may have changed since then.

wheaties · on July 13, 2010

Maybe not start-ups but I hear and I've seen postings at many hedge funds and their ilk for those that can do low latency algorithms. They're specifically looking for kernel level developers.

kunley · on July 13, 2010

ksplice.com are hardcore low-level hackers.

paulbaumgart · on July 14, 2010

One I came across:

http://fastsoft.com/kernel-developer/

onecreativenerd · on July 15, 2010

mouseover the image

ethaneade · on July 14, 2010

Evolution Robotics (which may or may not qualify as a startup depending on your perspective)

ww520 · on July 13, 2010

Try Vmware and storage kind of startups. Virtualization is huge.

steveklabnik · on July 13, 2010

> When I was younger I would have agreed wholeheartedly with this article because he seems more knowledgeable. After a few years experience I would have disagreed with him. Now I'm experienced enough to realize I have no idea if he's right or wrong but he seems to make reasonable points.

Slightly offtopic, but one of the things I love about software is that it's trained me to recognize patterns in everything. Abstract thinking is really useful.

And I've recognized the same tendency you mention in myself, and across a few different topics.

It makes me wonder when I wholeheartedly agree or disagree with something... maybe I'm just still on that journey.

mcantor · on July 13, 2010

I think along these lines every time someone mentions the Dunning-Kruger effect, which is frequent these days on Hacker News. It reminds me of a funny aphorism I once heard about higher education:

First you get your Bachelor's Degree, and you think you know everything. Then you get your Master's Degree, and you realize you don't know anything. Finally, you get your Doctorate, and you realize that nobody knows anything.

Finding those patterns between disciplines is always a delight--there is a surprising amount of crossover between many fields of study and human behavior. Programming was where I first found that humility is very positively correlated with competence, and the same principle shows up in a lot of other places.

joe_the_user · on July 13, 2010

Reading the article, he discusses some places where language choice doesn't matter and some where it would (the unpredictable latency you get with Garbage collector even if average speed is the same). So it's not refreshingly undogmatic.

One thing the article doesn't mention is that Java once had a slow interpreter and now it has a potentially faster interpreter. When Java had a slow interpreter, then it would be inherently slow for a larger spectrum of problem - but still not all of them.

kunley · on July 13, 2010

The Java problem is not that JVM is itself slow, but that Java is so inconvenient to use, that people use tons of stacks of libraries to accomplish even simplest things (think dependency injection) that JVM becomes bloated by loading and calling all these libraries, and in effect is slow.

I've seen most ridiculous stacktraces only in Java.

omarin · on July 14, 2010

Slow?. That comment is so 90's. Plz upgrade http://shootout.alioth.debian.org

kunley · on July 14, 2010

Yeah microbenchmarks are cool to stare on when your farm of those jboss instances for another very important project is crawling. Very motivating.

Wake up, this is reality.

dededo · on July 14, 2010

They are coming in Perl too ... with Moose leading the way.

draegtun · on July 14, 2010

Life is full of trade offs and I'm happy to compromise if it gets me a powerful self-hosting OO & meta-circular MOP.

lemming · on July 13, 2010

There are a couple of excellent mailing list posts discussing the relative speed of JGit and Git, which are excellent reading:

http://article.gmane.org/gmane.comp.version-control.git/1180...

That said, there are probably very few applications that have been this heavily hand-optimised, and probably equally few where you actually need it. Where C really stomps Java is around very low level memory management, I think. With modern processors, code can benefit greatly from colocation of related data that can be very difficult to achieve in an idiomatic way with Java.

Edit: link layout

kevingadd · on July 13, 2010

I find those posts particularly interesting because they are not the typical criticisms levelled at high level languages; for the most part those posts read as a detailed list of design mistakes in Java: no unsigned types, no 'struct'-style types, reliance on boxing for generic containers, no way to 'reinterpret' blocks of memory C-style, etc. When people complain about using java to write software you often hear these individual design decisions come up.

it's also interesting to note that there are usable HLLs that suffer from few of the problems noted in those posts: both D and C# (though the latter required a second version to get some of it right) provide a garbage collected, object oriented environment like Java, but also provide a lot of the primitives needed for the kind of optimization discussed: pointers, structures, unboxed types, reinterpretation, and unsigned values.

I think this suggests that the problem is less 'hig level' languages and more 'immature' ones: C is a very mature language, descended from other mature languages, while Java was one of the first mainstream languages to make many of its decisions and as a result even now some of the larger mistakes have yet to be corrected (lack of function types and checked exceptions being two examples). Younger languages like D get to benefit from those lessons in the same way that C/C++ benefited from the mistakes of their predecessors.

lemming · on July 14, 2010

I agree that the criticisms are not typical, but I don't agree that they're necessarily design flaws - just things that make Java less suitable for certain tasks that require fine memory management or low-level bit twiddling. I write Java in some fairly high-performance scenarios and this is rarely a real problem (except maybe memory layout - having every object reference be a pointer does kill us occasionally, but again in specific circumstances). For the vast majority of applications it's a non-issue.

AlisdairO · on July 14, 2010

I find the lack of struct types, and the lack of unsigned integers especially frustrating.

ww520 · on July 13, 2010

Let me see. Here are some of the performance related cases I encountered.

1. A critical process slowed down drastically on certain days and certain times. Narrowed down to Oracle. Turned out another group went behind our back and ran expensive reports on our database server. Solution: Politic, spent half of a year to kick them out.

2. Some distributed processing slowed down steadily over time. Narrowed down to bandwidth throttling on the cross data center fiber optic. Solution: Scheduled emergency migration of processes to the same data center.

3. Site-wide page serving time slowed down. Narrowed down the Regex and XML parsing on pages; yes, this was CPU bounded. Solution: Faster libraries, pre-computation, caching result.

4. Lucene indexing took longer as data volume grew. Narrowed down to database bottleneck. Solution: revamp indexing architecture to use DFS and Hadoop.

5. Linux process spawning drastically slowed own on 64-bit machine. Narrowed down to OS page table copy-on-write overhead. Solution: work around the spawning requirement.

6. File system driver slowed down with more cache. Narrowed down to inefficient sorting algorithm. Solution: replaced bubble sort with heap sort.

In all these cases, language is never the issue.

stcredzero · on July 13, 2010

1. A critical process slowed down drastically on certain days and certain times. Narrowed down to Oracle. Turned out another group went behind our back and ran expensive reports on our database server. Solution: Politic, spent half of a year to kick them out.

Have also experienced this firsthand. Blame is automatically pinned on us, and it's never our fault.

ww520 · on July 13, 2010

Oh yeah, it's pure politic. People would demand benchmark and measurement to prove it's their reports causing the problem, and finger pointing your way.

aufreak3 · on July 13, 2010

Even when IO bound, you might want to spend less of that precious battery when you're not waiting. So even if your perceived speed doesn't change, the battery can tell the difference.

preview · on July 13, 2010

This is a great point. Power usage is growing in importance. I think power management APIs will continue to evolve. In the not too distance future, power will be another axis of optimization.

redrobot5050 · on July 13, 2010

At the hardware level, it already is. And it seems in the Mobile Space, we are seeing that on the software side.

cks · on July 13, 2010

"I’d even argue that the main reason kernel code tends to be efficient is not because it’s written in C but because it’s written with parallelism and reentrancy in mind, by people who understand those issues."

With this arguing, isn't it reasonable to assume that a project Foo written in C or C++ is faster than an equivalent written in Java simply because the author writing project Foo in C/C++ likely understands performance by choosing C/C++ in the first place? (I am not saying anything about the performance of a certain language implementation)

The author also argues from a performance critical application perspective. What about desktop applications where perceived performance rather acts like a quality property? I know many people that shy away from using desktop Java and even .NET applications simply because they feel sluggish and waste memory. I don't care if the Java application is as fast in pure algorithmic performance.

If I can choose between using two equivalent C/C++ or Java/.NET applications I will choose the C/C++ application. I still think this is a good assumption.

j_baker · on July 13, 2010

"With this arguing, isn't it reasonable to assume that a project Foo written in C or C++ is faster than an equivalent written in Java simply because the author writing project Foo in C/C++ likely understands performance by choosing C/C++ in the first place?"

No, not at all. First of all, don't assume that someone knows what they're doing just by choosing C or C++ over Java. There are plenty of dumb C/C++ programmers out there, and a well-written Java program is always going to outperform a poorly written C/C++ one.

Secondly, remember that Java programs may actually be faster than C/C++ programs. Programs written in C/C++ require more time and knowledge to performance tune. Writing something in Java (or other high-level language) allows the author to spend more time focusing on the big picture issues rather than having to deal with a lot of lower-level issues.

btmorex · on July 13, 2010

"Secondly, remember that Java programs may actually be faster than C/C++ programs. Programs written in C/C++ require more time and knowledge to performance tune. Writing something in Java (or other high-level language) allows the author to spend more time focusing on the big picture issues rather than having to deal with a lot of lower-level issues."

I'm not against Java and I'll even admit that theoretically I could imagine a situation where a Java program ended up being faster, but in reality, that never happens.

In reality, we always end up in situations like utorrent vs. azureus (for those that don't know, utorrent is written in c++ and pretty much better than azureus in every way). In fact, I can't really think of one instance where a piece of java software is better than an equivalent written in c or c++ (outside of developer tools, because those aren't really directly comparable anyway)

swolchok · on July 13, 2010

> for those that don't know, utorrent is written in c++ and pretty much better than azureus in every way

The Azureus/Vuze DHT is a lot nicer than the Mainline DHT (which uTorrent supports), it's just not documented, there are no other implementations, and this statement probably does not apply to code quality.

tbrownaw · on July 13, 2010

Secondly, remember that Java programs may actually be faster than C/C++ programs. Programs written in C/C++ require more time and knowledge to performance tune. Writing something in Java (or other high-level language) allows the author to spend more time focusing on the big picture issues rather than having to deal with a lot of lower-level issues.

No. C++ may permit more extensive performance tuning, but the same level of tuning shouldn't take any longer in C++ than in Java. And really I'd say C++ as a language is at least as high-level as Java (especially considering templates), just more of the libraries you'll want to build on are shipped separately.

cageface · on July 13, 2010

Except Java has a GC. In fact I'd say bringing GC into the mainstream may have been Java's major contribution to the art of software development.

gryan · on July 13, 2010

"I’d even argue that the main reason kernel code tends to be efficient is not because it’s written in C but because it’s written with parallelism and reentrancy in mind, by people who understand those issues."

It's used to build kernels because it doesn't require multiples of the needed RAM in order for the memory management to be timely.

ldh · on July 13, 2010

isn't it reasonable to assume that a project Foo written in C or C++ is faster than an equivalent written in Java simply because the author writing project Foo in C/C++ likely understands performance by choosing C/C++ in the first place?

I think that's kind of a stretch.

kunley · on July 13, 2010

"With this arguing, isn't it reasonable to assume that a project Foo written in C or C++ is faster than an equivalent written in Java simply because the author writing project Foo in C/C++ likely understands performance by choosing C/C++ in the first place? (I am not saying anything about the performance of a certain language implementation)"

The point is that if someone is able to choose C/C++ means that he's much more smarter than many existing Java programmers.

Sad but true.

rv77ax · on July 13, 2010

Agree.

Plus, C/C++ programmer can do a refactoring or optimizing more in their code, even in low level, with the limit is only their time and skills, but in high level languages like Java/.NET optimizing level is stuck in VM.

Hoff · on July 13, 2010

This is the standard performance-tuning discussion, in a different guise.

Until you explain what factor(s) you're optimizing for, "It's faster because it's written in (whatever)" is a canard.

You can take that discussion in most any direction.

Budget. Even free coders and open-source has its costs.

Raw speed? Custom hardware? Hand-tweaked assembler? FPGA?

Speed, but without the budget for bumming instructions? Architecture- or machine-dependent C code?

Staffing? Enterprise plug-compatible Java.

Maintainability? Not everybody can hack source code in Bliss or some other obscure or domain-specific languages.

I/O? Does removing the rotating rust from the design help?

Memory footprint or ROM space, the available languages, the stinky compiler that's available on (expurgated), or whatever other factors are key to your goals...

To paraphrase that ancient Microsoft slogan, what are you optimizing for today?

eru · on July 13, 2010

Perhaps they are optimizing for getting coders that rather code C than Java?

shin_lao · on July 13, 2010

C(++) has got a lower memory footprint than Java/C# which is also quite important.

brazzy · on July 13, 2010

True; Java makes it very easy to become memory-bound needlessly. No JIT in the world can save you if your primary data structure is a TreeMap<Long,Integer> with a billion entries.

angusgr · on July 14, 2010

C++ also makes it perfectly possible to build this kind of mess. I know because I've worked on the exact same situation you describe, except in that case the TreeMaps of pointers were hand-rolled, "tested in production", and yet another problem to maintain.(^)

Java might make it easier, and granted the extra object allocations around Longs and Integers will make it scale poorly more rapidly, but bad (or compromised) design and poor use of data structures is always going to lead to problems of some kind or another.

(^) Yes, I know about STL et al. Original programmer clearly did not.

JoeAltmaier · on July 13, 2010

Java leads the programmer into using bloated libraries, which absolutely litter the Java landscape. Its very hard to measure or even predict what effect a Java interface will have on your solution.

I agree with the author, the language has no intrinsic slowness, its the tendency to use a triply-nested abstraction for every trivial purpose (a hash table of objects containing references to a database API...) instead of Hey! a pointer, that lead the app programmer down the primrose path.

_delirium · on July 13, 2010

Compared to C that's true, but bloated libraries aren't exactly absent in the C++ world (one of many reasons that "C/C++" is usually a weird generalization). Nested templates of templates are all over the place, and as for pointers, it's common to wrap those too using one of the various smart-pointer classes.

C often leads to bad algorithms, though, for the same reason it often leads to lean code tuned to the specific application at hand. Absent many general library functions, the C world is littered with lots of custom reimplementations of data structures and algorithms, not all of which are the best (and a lot of which are actually buggy). Even when they're good, they tend to have short shelf-lives: much hand-optimized 90s-era C code is now slower than more naive implementations, because the optimizations used to save some instructions often actively harm cache performance.

JoeAltmaier · on July 13, 2010

Good analysis. I would amend: templates usually improve code efficiency, because the compiler can see through abstractions and generate (larger but) much faster code.

_delirium · on July 14, 2010

I think that's often true (one common example is stl sort versus C stdlib's qsort(), which is often a big win because of inlining a datatype-specific comparison operator), but I think there are quite a few cases where the object code bloat you get from multiplying the code by the number of types it's instantiated for (vs. using a polymorphic/generic function) kills your cache more than enough make up for any optimization win.

JoeAltmaier · on July 14, 2010

Sure; but it certainly depends upon the frequency of use of each instance etc. But in general, shorter code is a good choice of optimization

mfukar · on July 13, 2010

"much hand-optimized 90s-era C code is now slower than more naive implementations, because the optimizations used to save some instructions often actively harm cache performance."

On the same hardware? That seems unlikely - do you have a specific example in mind?

a-priori · on July 13, 2010

He means that the hardware has changed. Micro-optimization these days mostly has to do with optimizing cache and memory I/O performance rather than reducing clock cycles as it was in the 90s.

btilly · on July 13, 2010

Yes. On the same hardware. And I'm sure that a specific example was thought of.

To give one of several likely causes, CPU pipelines have grown much longer. As a result it is more important to avoid stalls these days. Naive code compiled with a modern compiler knows about the importance of this. For instance the compiler will know it can avoid a stall in certain cases by making sure that a read from memory that happens soon after a write obeys something called store to load forwarding restrictions. Doing that can mean extra code which would be slower on an old computer, but it is faster than a modern one.

mfukar · on July 13, 2010

Well, if we're talking about different hardware, this whole discussion is moot - CPU (not C-entral anymore..) architecture changes render "old" optimization techniques only situational today.

At any rate, the same is true for all languages, and _delirium's point is spot on: it's not the language that matters, it's the fact that bad (or slow, or inefficient, call it what you will) code is encountered regardless. It's time we stopped language wars, don't you think?

btilly · on July 13, 2010

The discussion is only moot to the extent that the complaint is inaccurate. It is true that code, once optimized, is frequently hard to unoptimize. It is further true that what you optimize for at one point does not match what you optimize for at another. It is also true that there is a lot of C that is now optimized for the wrong thing. And finally it is true that people who write C because they are trying to squeeze performance are more generally prone to create more of it.

As long as those facts remain true, it is fair to complain about this tendency in C code in the wild. Even though the problem clearly lies with some of the programmers the language attracts rather than with the language.

mfukar · on July 17, 2010

And finally it is true that people who write C because they are trying to squeeze performance are more generally prone to create more of it.

You can't imply "fact" and use "generally" in the same sentence, sorry.

JoeAltmaier · on July 13, 2010

Agreed. Its like my yellow framing hammer; useful for one job, but I have other hammers. (said this before)

IgorPartola · on July 13, 2010

Faster is not always what we are after though. I recently wrote a piece of code to run in tightly constrained (but not embedded) environments, and C was the natural choice. The Python or PHP or JavaScript interpreter and VM just wouldn't fit in the RAM.

gryan · on July 13, 2010

Sometimes you don't want to suck up all of the RAM on a non-embedded machine, either. Also faster startup times and no collector pauses. A lot of times those are required.

ramy_d · on July 13, 2010

collector pauses brings up some discussions I've had with some friends - in essence, it's not something you want your platform to do when you're programming something like a game where you're pushing 60 frames a second + audio.

If you don't know what to look for you could end up looking for a long time.

moomba · on July 13, 2010

Yea, Strings in Java take up much more memory than one would expect.

Minimum String memory usage (bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)

http://www.javamex.com/tutorials/memory/string_memory_usage....

Objects in general also have significant overhead.

kunley · on July 13, 2010

" I’d even argue that the main reason kernel code tends to be efficient is not because it’s written in C but because it’s written with parallelism and reentrancy in mind, by people who understand those issues. A lot of code is faster not because it’s written in C but for the same reasons that it’s written in C. "

Brilliant.

I'd say that C programs are generally faster because C world has near-zero amount of mediocre and copy/paste programmers, so coders just know what they're doing.

a-priori · on July 13, 2010

C world has near-zero amount of mediocre and copy/paste programmers, so coders just know what they're doing.

[Citation needed]

kunley · on July 13, 2010

Citation is overrated. Today anyone can publish anything and be cited by another anyone.

Live in the industry for 15 years or so and observe for yourself. Carefully.

a-priori · on July 13, 2010

What I meant was that you made a very bold claim without any justification. Are programmers who code in C categorically better than programmers who don't? I doubt it.

kunley · on July 13, 2010

Yes I know my claim, and I showed you a way to grasp it.

"Categorically": that's what you said. Just observe, instead of trying to apriori-tize the world.

ww520 · on July 13, 2010

I think he missed the point on why kernel code are fast. Kernel code are fast not because it's written in C. It's fast because it doesn't do much. Most system calls into the kernel does very little; they just update some data structure and return. OS kernel is complicate because of its breadth and dependency and side effects. The call path of each call is actually fairly shallow.

kqueue · on July 13, 2010

It takes more cpu cycles to run a java program than a C program given they are both well written and optimized.

However, writing it in C will probably take more time. The question to ask yourself, does it matter if my program is taking few more cycles to finish or not?

Most of the times it IS faster in C but the difference is insignificant(e.g. C takes 0.0001, Java/Python takes 0.0002. Who cares at this point? Very few).

tszming · on July 13, 2010

A better title (IMHO) to the article would be "Compare results, not approaches.", as stated in the last sentence.

limaya · on July 13, 2010

Great writeup!

There is one exception though: startup speed, it comes from just the fact that you're written in C, the same language the OS is written in, which means that majority of dynamic libraries you depend on are already loaded, that's what makes piping simple programs like "wc" possible.

wfjackson3 · on July 13, 2010

Excellent point. You could toss in assembly as yet another choice for the very highly constrained environment (very cheap microcontrollers, although many now have abstractions so you don't actually have to code in assembly directly, it is sometimes necessary).

c00p3r · on July 14, 2010

It is funny how people who are bounded to an artificial sandbox are trying to view themselves as equal (or even better) than those who created this sandbox. ^_^

JVM is a mere C++ program. Period.

jrockway · on July 13, 2010

C is a language, not an implementation. You can JIT C just like any other programming language.

cloudhead · on July 13, 2010

Irrelevant, we all know what he's talking about, and whether it's gcc, llvm or icc, his point stands.

jrockway · on July 13, 2010

I don't see the distinction. C is a language for describing what operations the program will perform. The compiler (gcc) / runtime (llvm) then turns that description of the solution to the problem into something the computer can actually execute. Sometimes it uses JIT compliation... other times, perhaps not.

If you "use C for control", then you must have written your own compiler.