Web programs written in C++ are no big deal

tptacek · on Jan 10, 2012

Writing C++ code that works reliably in a benign setting is not a big deal. With the right libraries, it is almost as easy and perhaps only a little slower than writing in a high-level language; you can't, for instance, really believe that everyone who writes a popular iOS application is a solid bare-metal C programmer.

But writing reliable C++ code that works under adversarial conditions is very hard; hard enough that the best C programmers in the world have track records like "only one theoretically-exploitable memory corruption vulnerability in the release history", and quite a few people who set out with "no vulnerabilities" in their project charter end up with way worse records.

I've found over the last couple years that one way to get C/C++ code fit into a web application is via "nosql" databases, particularly Redis; let something like Ruby/Rack or Python/WSGI or Java Servlets soak up the hostile HTTP traffic, and use it to drive an async API over Redis.

The less C/C++/ObjC code you have talking to an adversary, the better off you are.

I'm a C/C++ programmer; C is my first language, from when I was a teenager. I respect C programmers. But if I was interviewing a developer and they told me their gut was to write a C program to do something a Rails program would normally do, and it wasn't an exceptionally specific circumstance, I would think less of them.

bri3d · on Jan 10, 2012

In this worldview, wouldn't the C programmers writing your language runtime have the same poor track record when it comes to security? And wouldn't the runtime itself be a substantially higher-value target for attackers?

I, too, would look at someone strangely if they told me they were going to write a C application where I'd use a Rails one, but security certainly wouldn't be the first reason on my mind.

As a postscript, I really like the idea of putting C/C++ apps behind a message bus, as decoupled from the web end as possible. I've had great luck using C++ for performance-critical services behind a Rails frontend talking to Redis (I've also used AMQP via RabbitMQ, but I found that to have a high enterprise brokered pain to value ratio).

tptacek · on Jan 10, 2012

They do have a poor track record with the language runtime.

You should be concerned about the quality of your language runtime.

MRI, for instance, has had many memory corruption flaws that were plausibly exposed to hostile input. When security is a priority, I advise using JRuby (it helps that JRuby is better than MRI anyways).

But either way: language runtimes for mainstream languages are high-priority targets. Your C code is not. You will not learn the worst problems in your C code until long after it's been deployed.

icebraining · on Jan 10, 2012

Linus' Law. The language runtime is shared between thousands or millions of users and has many more contributors than your single project, hence any big security bugs it might have had are probably fixed by now, or at least will way faster than you could fix yours.

And wouldn't the runtime itself be a substantially higher-value target for attackers?

That depends, but relying on security through obscurity isn't usually a very good choice.

16s · on Jan 10, 2012

Parrot alert!

relying on security through obscurity isn't usually a very good choice.

Real-world alert!

Camouflage paint works for tanks.

icebraining · on Jan 11, 2012

In the "real-world", camouflage paint isn't used instead of heavy armor, which is what is being proposed (using a much less tested piece of code instead of a well known runtime).

Sure, if you can afford to throw the same number of man-years (of both developers and white hackers) at your proprietary codebase as are thrown at the runtime of a popular language, then great, you can have the cake and eat it too, just like the tank builders.

Since most people can't afford that, they have to choose between camouflage paint and an armor. I don't know about you, but I'd rather be in the bullet proof tank than on the one built with balsa wood, regardless of its paint.

jerf · on Jan 11, 2012

Relying.

By definition, if we're talking about a tank, that's merely one layer of many. Obscurity can be a fine one layer of many. It had better not be the layer such that you are relying on, though.

numeromancer · on Jan 11, 2012

So do very thick firewalls.

nl · on Jan 10, 2012

In this worldview, wouldn't the C programmers writing your language runtime have the same poor track record when it comes to security?

This is true. But I think it is reasonable to expect a good C/C++ programmer who already understands web security to have the mental model to write secure code in (say) Ruby.

And wouldn't the runtime itself be a substantially higher-value target for attackers?

Yes - popular runtimes are some of the most heavily attacked pieces of code around. This has benefits as well as costs...

pork · on Jan 10, 2012

> "only one theoretically-exploitable memory corruption vulnerability in the release history",

I presume you mean qmail [1]

[1] http://cr.yp.to/qmail/guarantee.html

tptacek · on Jan 10, 2012

Yes; qmail had a (disputed) LP64 integer overflow.

doe88 · on Jan 10, 2012

> I've found over the last couple years that one way to get C/C++ code fit into a web application is via "nosql" databases, particularly Redis; let something like Ruby/Rack or Python/WSGI or Java Servlets soak up the hostile HTTP traffic, and use it to drive an async API over Redis.

I'm sorry forgive my ignorance can you explain me a bit more what you mean by "async api over Redis"? I'm always genuinely interested in understanding good patterns especially given your experience in security. Thanks!

jbester · on Jan 11, 2012

Not the GP, but the general principal is to use Redis as a task queueing system. The front-end puts a task into a redis queue. One or more C++ programs are waiting on the queue(s) and execute the given task (like a large DB insert). If results are needed they can be communicated back to the front-end. The front-end can poll for the result or use pub/sub messaging.

This gets you a number of benefits: separation of the front-end logic and the back-end logic, better scalability - there may be a bunch of workers on distributed among different machines, and security - the C++ programs aren't as worried about unvalidated input since their input comes from the front-end.

tptacek · on Jan 11, 2012

The Redis interface is also so simple that it's very easy to hook up C code to it, and Redis is somewhat "typed", which reduces the amount of parsing you have to do.

mikecsh · on Jan 11, 2012

Can your recommend any particularly good resources/tutorials or further information on this? Thanks!

sausagefeet · on Jan 11, 2012

The web app I have been working on has the same architecture.

1. A request comes in 2. Request handler parses the request 3. Handler determines which Queue the request should go into based off the URL 4. Request handler queues the request as well as how whoever handles it can get back to them 5. Waits for response

There are then multiple workers living possibly on other machines listening on the queue. They handle the request and return a response to the original request handler and pull the next bit of work off the queue.

I like this because I feel like it is rather robust. I use a STOMP message queue which is very trivial to hook up other languages to. It is fast enough for my needs. It lets me do simple things like specify how many queued items a handler can handle concurrently. My web app is then broken into components that each run independently. They can run in the same process or be split into separate processes or even across computers. My web app is not particularly high demand but we run it on fairly light resources so the queuing also keeps our app from becoming overwhelmed if a lot of requests happen at once. They just get queued and a handler will get to it when it can.

akg · on Jan 10, 2012

I'm not sure I understand why one particular language would lend itself to more vulnerability than another. The less skilled someone is at a particular language, the more bugs/vulnerabilities he is likely to produce. It is a function of technical skill rather than a quality of the language.

For example a Ruby interpreter or a Java runtime that you trust to handle all your HTTP requests are prominently written in C/C++.

I think what makes popular packages like Ruby/Java/Rails (etc.) more secure is the sheer number of users they have. Those technologies have been hammered out over several projects and by a plethora of users and developers. Writing a component that rivals that number of interactions is tough, but certainly doable.

tptacek · on Jan 10, 2012

There's one (mainstream) C Ruby that needs to be audited. But every C CGI program needs to be audited.

C programs are susceptible to memory corruption. Programs written in practically every mainstream high level language are not susceptible to those problems (until they start using third-party C extensions). That's the security win of not using C code.

simonw · on Jan 10, 2012

"I'm not sure I understand why one particular language would lend itself to more vulnerability than another."

http://en.wikipedia.org/wiki/Buffer_overflow

From that page: "Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array."

akg · on Jan 11, 2012

I would call that a programmer error. The language certainly does make it harder to write "safe" code, but it is certainly doable.

I guess my point is that the tools/libraries/frameworks on top of the language are what make it useful or not useful, independent of the language itself. For example, writing a web app in Ruby may not help you against SQL injection (http://en.wikipedia.org/wiki/SQL_injection) unless you have a well designed query language on top of that.

tptacek · on Jan 11, 2012

Everyone calls it programmer error. But when you make the same error of copying arbitrary-sized inputs from attackers into a Java program, you do not enable that attacker to upload their own code into the JVM process and run it.

akg · on Jan 11, 2012

But doesn't the use of Java's JNI invalidate any security the JVM offers? As far as I know, any protections the JVM puts up are invalidated once you inject native code, which would potentially enable an attacker to potentially inject malicious code that hijacks the JVM. Then again, one could argue that the JNI is no longer a "Java" program.

tptacek · on Jan 11, 2012

Yes, when you write C code and attach it to JVM processes, that puts the JVM process at risk. More C code, more problems.

dextorious · on Jan 11, 2012

"""I would call that a programmer error. The language certainly does make it harder to write "safe" code, but it is certainly doable."""

Everything is "certainly doable" in a turing-complete way, but that fact has not mattered at all in the evolution of programming languages.

It doesn't matter if it's "certainly doable", what matters is how easy it is.

nl · on Jan 10, 2012

Things like memory protection by default do make some languages safer than others.

akg · on Jan 11, 2012

Yes, however, there are libraries in place that can circumvent some of these issues: e.g., boost::shared_ptr in C++.

tptacek · on Jan 11, 2012

No, Boost shared_ptr exacerbates the issue by creating a second regime of reference counting that, if contravened anywhere in the program (for instance, in any third-party piece of library code, which every C++ program of any real size is replete with) creates use-after-free conditions.

I invite you to continue coming up with examples of ways to reliably mitigate memory corruption flaws in C/C++ programs, because I enjoy this topic very much, but as your attorney in this matter I have to advise you that you're going to lose the argument. :)

akg · on Jan 11, 2012

After doing some research on the topic, I will have to say that I concede my point. ;-)

roel_v · on Jan 11, 2012

I agree with you on the gist of your argument (I think), but there are 'ways to reliably mitigate memory corruption flaws in C/C++ programs'. For example, using std::string rather than malloc()'ing a char* every time you do something that works with strings is certainly a way to reliably mitigate memory corruption flaws in C++.

tptacek · on Jan 11, 2012

True as far as it goes; std::string is safer than libc strings. If all your program does is manipulate strings, and not marshaled binary data structures or protocols, and your data structures are simple and you're very careful with your iterators (which themselves often decompose to pointers) and your object lifecycles are simple enough that you can reliably free things and know you're not going to accidentally touch the memory later, and-and-and, you can write a safe C++ program.

SamReidHughes · on Jan 11, 2012

Every boost::shared_ptr I see is a cringe-inducing experience. It's not just the atomic memory operation that happens whenever you copy it, it's the programmer who thought that he could just put things in a boost::shared_ptr and it would solve his problems. Now the code is less readable, because you don't know what the lifetime of your resources are! The worst thing is when they get shared across threads, and suddenly you don't know what thread your object's going to be destructed on.

One better alternative to a shared_ptr is a noncopyable shared pointer type. You have to explicitly copy it with a call like

    x.copy_from(y);

That this makes the use of reference counted objects more verbose and uncomfortable is not a downside.

Really this should be a noncopyable version of intrusive_ptr, not shared_ptr. Either the object is owned by references on multiple threads and you'll want to be careful about what thread you destroy it from, and perhaps then you'd want to send a release message over some message queue system, or it's a single threaded object and you don't need the overhead of atomic memory operations.

ggchappell · on Jan 11, 2012

> Now the code is less readable, because you don't know what the lifetime of your resources are!

You're certainly making a valid point; however, as far as how important this is, the experience of a lot of people out there points in the other direction.

Consider: the lifetime of a Python object is essentially the same as that of a C++ dynamic object owned by a shared_ptr. But you don't see Python programmers complaining that they can't figure out when their objects are going away. In Java it's even worse; an object's lifetime is pretty much whatever the JVM thinks it ought to be. I have seen complaints about this, but not many, and the JVM's reputation as a platform for serious languages remains pretty strong.

On the other hand, memory leaks in C (and C++) programs have been a major thorn in all our sides for decades.

So, yes, when you get assured destruction by using an object whose lifetime is managed for you, you do lose something. But the experience of programmers all over strongly suggests to me that, for most applications, what you get, is much more valuable than what you lose.

roel_v · on Jan 11, 2012

At the risk of sounding like flamebait, it's because Python and Java developers don't know what they are missing without deterministic destructing. Of course there are way to 'code around' it, but knowing the exact lifetime of objects is very often very useful, and often makes for much easier to understand code and easy ways to avoid resource leaks.

nl · on Jan 11, 2012

try..finally blocks have 80% (or more?) of the advantages of deterministic destructing without the costs, especially for avoiding resource leaks.

lelele · on Jan 11, 2012

What about objects who are referenced by more than one object, and which are linked to resources? try... finally blocks are just a different way of freeing your resources at the end of a block, and won't help with objects which outlive the block they are guarding.

Actually, here is your choice: either you'll have to manage every kind of resource except memory (garbage collected languages), or you'll have to manage only memory (C++).

roel_v · on Jan 11, 2012

Yes, that's true, but at the cost of syntactic noise. It's a preference, and one gets used to it I guess, but to me all the try blocks are harder to read than code where variables have a fixed lifetime, so where you in many cases can avoid the extra indent level.

SamReidHughes · on Jan 11, 2012

> You're certainly making a valid point; however, as far as how important this is, the experience of a lot of people out there points in the other direction.

That's because I'm talking about C++ and you've somehow decided to talk about something unrelated. Python and Java programmers still care about all resources that aren't object lifetimes.

akg · on Jan 11, 2012

After doing some thorough research on the topic, I would like to say that I concede my point about the vulnerabilities of C/C++. Thanks tptacek.

tptacek · on Jan 11, 2012

Happy to help! :)

kragen · on Jan 13, 2012

While I mostly agree with your primary point --- that C and C++ are extremely prone to memory corruption vulnerabilities --- I think there's an important distinction you're glossing over here, between C and C++/ObjC.

Both C++ and ObjC have a string class and containers in the standard library, and support for some automatic memory management in the language. This turns out to make a big difference in practice in reducing those vulnerabilities. There are people in this thread claiming that they do as good a job in reducing those vulnerabilities as using Java or Ruby or Python. I can't really evaluate that claim, but it seems plausible to me. Barely.

tptacek · on Jan 14, 2012

Some things to remember:

* std::string (or NSMutableString) eliminates the stdlibc strxxx() vulnerabilities --- iff you use them exclusively. But lots of C++ code (and, especially, ObjC code) drops to char-star strings routinely.

* Most C++ code still uses u_char-star for binary blobs. ObjC has (to its credit) NSMutableData, but there's still u_char-star handling code there too (I also feel like --- but can't back up with evidence --- ObjC code is more likely to call out to C libraries like zlib).

* Both C++ and ObjC have error-prone "automatic" memory management: shared_ptr and retain/release, respectively. shared_ptr is risky because every place it comes into contact with uncounted pointers has to be accounted for; retain/release because it's "manu-matic" and easy to make mistakes. In both cases, you can end up in situations where memory is released and pointers held to it, which is a situation morally equivalent to heap overflows.

No, I don't think C++ and ObjC do an equivalent job in reducing memory corruption flaws. The MRI Ruby interpreter has had memory corruption issues (it being a big C program itself), but Ruby programs never have memory corruption issues (except in the native C code they call into). C++ and ObjC programs routinely do.

lelele · on Jan 11, 2012

"Why do C++ folks make things so complicated?": http://www.johndcook.com/blog/2011/06/14/why-do-c-folks-make...

tptacek · on Jan 11, 2012

Kind of orthogonal. You'd face the same risks writing in C as in C++.

lelele · on Jan 11, 2012

Well, after starting using memory-management classes exclusively, and never again raw pointers, I have coded lots of C++ and I have never experienced any memory issues (leaks or access violations). I think C++ has issues much much worse than memory management.

tptacek · on Jan 11, 2012

Wow is it ever not my experience that C++ programs that avoid raw pointers don't have memory corruption issues. Note that my job involves looking at other people's C++ code, not just having opinions about my own, so I'm a bit more suspicious than most people.

lelele · on Jan 11, 2012

Therefore your experience is that compilers and/or mainstream libraries are irredeemably broken, right? Because I can't think of any reason why your code should break havoc as long as you are following your language's and your libraries' guidelines and safe-code practices . And yes, I agree that's a PITA: I've ditched C++ for that reason. However, to me, as long as my profiler didn't show any memory leaks, and there were no crashes, then I assumed everything was fine. Maybe I was blessed in having discovered and mastered Design by Contract. And AFAIK, Python and Ruby interpreters are written in C... what makes them safer than average applications, then?

tptacek · on Jan 12, 2012

"Best practices" is a no-true-Scotsman argument. Any example I come up with of something that blows up a C++ program is going to contravene some best practice somewhere. A long time ago someone said buffer overflows were easy to avoid, and so I did a little math, counting up the revenues of companies that sold products that mostly mitigated the impact of buffer overflows. Billions and billions of dollars, lost to "inability to count numbers of bytes".

In any case, my pointier response to you is, "allocation lifecycle and use-after-free", which doesn't care how many layers of abstraction you wrap your pointers in.

"Irredeemably"? No, just very, very expensively. I suppose I should thank them.

pshc · on Jan 10, 2012

* Dismissing newer technologies as "shiny" instead of evaluating their merits

* Language hipsterism

* Being disturbed by modular code

* Dismissing high-level code that might have leaky abstractions

* Plain CGI

* Being turned off by callbacks and reinventing the wheel instead

* The usual silliness about all tools being capable and therefore equal (they're all Turing complete, yes, but we still want to know which ones are more productive for some use case.)

This is grumpy posturing and C++ is now blub. Am I being trolled?

mvgoogler · on Jan 11, 2012

Am I being trolled? It feels more to me like you're the one that's doing the trolling, but maybe I'm just mis-reading your post.

Disclaimer: I worked with the OP and she is a ferociously productive engineer.

Dismissing newer technologies as "shiny" instead of evaluating their merits*

Or, evaluating technologies based on how useful hey are rather than the amount of hype they are generating

Language hipsterism*

Huh? I don't even know what this means.

Being disturbed by modular code*

The OP was disturbed at the thought of building a program by gluing together "modules". In this sense she is using "module" to mean (essentially) black-boxes of unknown origin and quality. If you had ever seen her code you would have a hard time convincing anyone that it wasn't modular.

* Being turned off by callbacks and reinventing the wheel instead

The OP says she hates the design of cgic, not that she's against callbacks. She made a practical decision that cgic didn't do enough for her to warrant the pain of using it, and discovered that creating something that she actually liked using wasn't that much work.

To put it more generally, she is saying that - given the choice between using some existing code that isn't quite right (or outright sucks for some reason or another) and taking some time to roll her own, she has found that it is often worthwhile to spend a little time to create something that she knows works and that she enjoys using.

* The usual silliness about all tools being capable and therefore equal (they're all Turing complete, yes, but we still want to know which ones are more productive for some use case.)

The OP said no such thing. She said that she has found a language she is highly productive in (C++) and she hasn't yet seen a newer "shinier" language that would make enough of a difference to warrant switching to it.

kelnos · on Jan 11, 2012

The OP was disturbed at the thought of building a program by gluing together "modules". In this sense she is using "module" to mean (essentially) black-boxes of unknown origin and quality.

Right, but then she goes on to mention libxml2, which for someone unfamiliar, could be a "black box of unknown origin and quality". In this case, of course, I wouldn't dare criticize libxml2: it's an impressive piece of software with a great track record and excellent test coverage. But I just happen to know that. To someone who doesn't, they'd have to learn. Just as, for someone who doesn't know anything about some Ruby XML library, they'd have to learn.

So basically her argument boils down to "I'm fine using modules written in my language of choice that I already know about, but I don't feel like learning about new modules written for a different framework".

Regardless, C/C++ is just not a safe language when writing code that has to survive adversarial conditions. Maybe the OP always writes perfect, vulnerability-free C code (though I doubt it), but most people do not. Advocating a language that doesn't have some kind of memory protection for web apps just strikes me as irresponsible.

A buffer overflow or use-after-free() in C can let an attacker run arbitrary code on your server. A buffer overflow in Java throws an exception and terminates the program (or is handled gracefully). Not saying that there aren't other classes of vulnerabilities, but using a language/runtime such as Java or Ruby eliminates some of the trickiest sources of security bugs.

pshc · on Jan 11, 2012

> At some point, you just have to shut up and make stuff.

It does sound like she's ferociously productive. It's also possible that she could be ten times as productive and that C++ is holding her back. The programmer credo is to work hard and work smart.

> I guess the current "shiny" thing is still more-or-less Ruby [with] Rails. Oh, I suppose there's also the whole node thing, in which you write things in Javascript and run them server-side with v8. She dismissed node right away (with no argument from me)

> I have no practical experience with them

Casual dismissal without evaluation of merits.

> it's all been linked to frothy people clamoring for the latest shiny thing. I've seen that pattern plenty of times before, and it's always annoying.

This is what I mean by hipsterism: Dismissal based on an excited fanbase.

> that somehow you would need to abdicate almost all of your code to modules where you just hope they work.

I had a problem with this sweeping generalization. If cgic isn't great, that's fine. Certainly you still have to evaluate your dependencies' merits.

> that creates an awful design where you give up main() to that program's code and then just wake up when it decides to call back to you

This did read like an indictment of callbacks as a whole, but I see now that she's not happy about libraries that hijack main() and have some secondary entry point. I concede that.

> It might not be wonderful, but you could build something relatively quickly, at least in theory.

High-level languages have proven productivity gains and known pitfalls. Here, she is hand-waving them away without giving them a chance.

> Better still, since I"m not chasing the shininess in that realm, I can spend those cycles on other things.

Trollish fallacy.

> she hasn't yet seen a newer "shinier" language that would make enough of a difference to warrant switching to it.

As far as I can tell she didn't even give them a fair chance. She dismissed them based on supposed hype and blubbiness.

Rusky · on Jan 10, 2012

Dismissing newer technologies as "shiny" instead of evaluating their merits

I would think it's more about trying not to dismiss older technologies just because of the merits of something newer.

Being disturbed by modular code

The C++ code described in the article sounded pretty modular to me...

tshaddox · on Jan 10, 2012

I think the point about reliance on and obsession with modular code is a valid one, even though it's so entrenched in programming culture now. Donald Knuthe makes a similar (much less inflammatory) argument here (Ctrl-F "black box"):

http://www.informit.com/articles/article.aspx?p=1193856

blangblang · on Jan 11, 2012

It just distills down to "Don't let people tell you that you can't frame with your trusty roofing hammer. Just get to work and switch tools when you must."

uiri · on Jan 11, 2012

Just wondering, what exactly is wrong with plain CGI?

iso8859-1 · on Jan 11, 2012

With CGI, you do your own data persistence. It leads to gigantic blobs. Also, you spawn a process for each request.

http://stackoverflow.com/questions/424839/what-were-the-main...

As I see it, it's the old low-level vs high-level discussion all over again. I'm sure you have seen it before.

Low-level programmers argue that they have control and know how everything is working. High-level programmers argue security, problem isolation, modularity (i.e. not only using your own code!), readability, maintainability.

javadyan · on Jan 11, 2012

I assume he's implying that CGI is slow.

alttag · on Jan 10, 2012

Heh.

I used to work for a company that used C (not C++) for its large web site. They ran the entire site from a monolithic C file. To say again, the code for the entire web site (including product listings, shopping cart, coupons, etc.) was in one .c file. To make things more "interesting," it was, at the time controlled by a single developer who didn't work on site and wouldn't let anyone else touch his code. This barrier was backed up, as I understood, by both tradition and management.

The initial decision for C was, I think, the correct one. The site had been around from the early days of the web, and speed was important. However, the architecting and personnel decisions didn't keep up with the company's growth. Another consequence, though, was switching to CSS (which came along later) from, for example, spacer gifs, took a great deal of developer time and testing, as did adding new features. It's the trade-off I think we all understand well today: fast code or fast developers.

It was interesting to see the changes while I was there—the site began (slowly) to get recoded in Java, and broken into more manageable chunks at the same time. So far as I know, the whole thing is now in Java.

alttag · on Jan 10, 2012

Oh, and to make things even more fun, much of the data passed around between back-end processes and vendor apps was done in XML at the time ... and the XML parsing library they used for C was a bit buggy... No one (w|c)ould fix it, which broke some edge cases.

Fun times.

Also, if I had to guess, I suspect another factor in the transition was that C programmers were getting more expensive and harder to find.

iso8859-1 · on Jan 11, 2012

Using a buggy XML parser has little to do with C...

pnathan · on Jan 10, 2012

I left C++ a few years ago. It's not that it's a "bad" language, for a certain value of bad.

It's just bloody inefficient to get things done in! Coding in it requires girnormous amounts of text shovelling, typing attempts to be strict, template meta-programming requires hand-coding type inference schemes, and on top of that, the library/package management is a nightmare.

Well, I didn't think these harsh things when I 'left' C++, but as I worked with more high-level languages, I realized that I was doing more, with less code and dynamic duck-y typing, and with nice libraries only a `cpan` away.

So when I hacked together a C++ program a year or so ago, I got punched in the face by all these issues. It was a pain. So I said, okay, this is stupid. I need to use C for low-level work like drivers, and use Common Lisp for other things. Like what everyone else does[1].

Fundamentally, C++ has a number of flaws, of course - that's typical for a pragmatic language - but the key flaw in my opinion is that it's a pain to get higher-level stuff done in until you build the libraries that other language constructs/libraries give you out of the box.

[1] For a very small subset of everyone else.

obtu · on Jan 11, 2012

Good point on the relative friction and availability of libraries. With C++ you rarely have a common platform. The STL, with Boost to extend it, is one; but it stays pretty basic. Qt is another. Glib, with Glibmm/Gtkmm &co is a third one, but the core implementation is C and Vala because that makes the platform accessible to a lot more languages. libxml2, cited in the original post, reimplements a lot of basic data structures, because it intends to be portable to C code which doesn't have a common platform. Other languages have CPAN, Hackage, PyPI, RubyGems, Maven, npm, and their libraries reuse and rely on each other a lot more freely.

o2sd · on Jan 11, 2012

I am currently writing part of a web application in C++, the other part of the web application is being developed by another programmer in Ruby On Rails.

In a single request, the RoR part 'does' about a two dozen 'things', which takes it around 2 seconds to complete.

The C++ part 'does' around 1.2 million 'things' which takes it between 0.1 to 0.2 seconds to complete. Building the the component that does 1.2 million 'things' is simply impractical in Ruby.

Admittedly, very few web applications have this type of requirement, but perhaps in the future many more will. The two dozen or so 'things' the RoR does would take a LOT of code in C++. Well, compared to RoR it would be a lot of code. And the 1.2 million 'things' that the C++ application does would be absurdly difficult to program in Ruby, and as mentioned, would take an absurdly long time to execute.

So at the risk of sounding cliché, perhaps it is less a matter of Ruby vs C++, and more one of using the right tool for the job. Once that decision has been made, programmer productivity has more to do with the programmer than the language. I am extremely productive in C++, even though I have to write a lot of code to be productive. If I was a LISP expert, I would be ridiculously productive, but only after 10 years[1] of learning and gaining experience in LISP. Most projects don't have 10 years for you to become productive in the language of choice.

[1] Yes, I am exaggerating. It would only take 9 years to become highly productive in LISP.

dagw · on Jan 11, 2012

I've done basically exactly the same thing. Much of the webapp is written in python, but the core performance critical part is written in C++. There basically wouldn't be any other way to make that particular app work, and this approach worked really well.

That being said, I never felt particularly inclined to write the whole things in C++.

o2sd · on Jan 11, 2012

> That being said, I never felt particularly inclined to write the whole things in C++.

Agreed, however one of the problems we are encountering is passing messages efficiently between RoR and the C++ components. Currently using XML until another method that is more efficient, but no less flexible, presents itself.

Personally I think the debate should be less focused on which language to write your web application in, and more on improving the glue between different languages so in developing web applications we can have the best of all possible worlds.

overgard · on Jan 11, 2012

Here's the thing, writing a web app in C is like using a flame thrower to light a candle. Yeah it's possible, but it's dangerous as hell at best and it doesn't really gain you much.

And this distrust of other's libraries is odd. You're already running on an OS that's providing millions of lines of code to you in the form of api's and services, and using a compiler that's going to do all sorts of modifications to what you've created at the machine code level. You're already well into trusting a lot code that's not yours. But something that's a million times easier (parsing HTTP headers), and now they're worried about other peoples code? Seriously?

joe_the_user · on Jan 11, 2012

Being in the process of writing a c++ web client, I am strongly inclined to agree with your assessment of c and web apps.

The one thing is, if someone does write c web apps - having your own set of libraries can be useful exactly then. The web has all these crazy encodings and protocols and knowing exactly which way your library does them is really useful when you use a language with no safety net or safety switch or anything.

Spent the afternoon figuring out, sort-of, the url encoding of reddit's "api" and how it interact with Qt's QURL class.

udp · on Jan 11, 2012

> using a compiler that's going to do all sorts of modifications to what you've created at the machine code level

You do know that the compiler isn't allowed to change the behaviour of your code, right?

(If it did, it would be a bug in the compiler, and C compilers are pretty damn mature.)

puls · on Jan 10, 2012

Why must the standard of discourse always be "I'm right and you're wrong?" It seems to me that more often than not in this technological day and age, the situation is much closer to "I'm right and you're also right, and between our two positions there's probably something for both of us to learn."

mattgreenrocks · on Jan 10, 2012

Intellectual honesty is not a strong feature of the Internet. I point this out whenever possible, but nobody seems to really care.

The medium selects those who are both loudest and have an adequate factual basis. Peppering statements with "In my limited experience," makes you seem meek, unfortunately. Blame the culture of needing traffic and thriving on controversy. It feels like it is a race to the bottom. :(

_delirium · on Jan 11, 2012

I had an interesting discussion about that with some professors as well. It might not be as bad as on the internet, but it does happen, in both research and instruction, to some extent. One prof I know pretty well, who's a good teacher, was lamenting that his students seem to actively want him to be definitive and un-nuanced. His tendency, which he has to work against a bit, is to qualify statements by noting where his views aren't shared by everyone in the field, where he's speculating past the established results, where he's pretty confident of an approach but there are arguments for alternative approaches, etc.

But students tend to interpret that as weak and muddled, and prefer profs who "tell it like it is", even if that means in a fairly biased and opinionated way. It seems they particularly like it when you make black-and-white statements that would be controversial, roughly equivalent to "there's a debate on this but you don't need to know about the other viewpoint because it's wrong".

sanderjd · on Jan 11, 2012

This is not limited to the internet. We learn as children that persuasive writing is weakened by statements such as "I believe", "in my opinion", or "in my limited experience". (I believe that) it is true that such statements weaken writing, and that it should be the reader's job to determine which content is editorial and subconsciously insert the "in my opinion"s where appropriate.

Having said all that, it does seem hard to find intellectual honesty on the internet, but I had more trouble finding a decent newspaper in London so it's not an isolated problem.

NinjaFerret · on Jan 11, 2012

I think this was summed up by a guy at a conference that I went to "developers, on the whole, pride ourselves on being logical but when it comes to technology discussions we get too emotional".

It is the same reason why there was conflict in the Robber's Cave experiment between the two groups. As a species we group together with people we mentally associate with and have a natural prejudice against any external force.

alexchamberlain · on Jan 10, 2012

THANK YOU! I totally agree that it is not unreasonable to write websites in C/C++.

I've just started looking into writing Nginx modules, which are generally written in C. Now that I've started to understand it, I can write secure content handlers in a reasonable time and reasonable effort, whilst being able to use the exhaustive libraries on offer, as well as my experience with the language. Couple this with great performance, and I can't justify writing apps in PHP anymore. We're moving more and more processing to the client , as we have realised that networks are pretty slow. Well then, how do you justify writing slow code behind the slow network?

I highly encourage the author to look at writing Nginx modules, possibly not on a hackathon, since you get security and fantastic performance for free!

jacques_chester · on Jan 10, 2012

The big reason for avoid C++ is to avoid bugs and security issues. In particular, overflow attacks. That the language requires constant, conscious diligence to achieve a security baseline that comes for free in other languages is wasteful.

Then there's the library/tools situation. I did all the work for my honours dissertation in Ruby in part because the Ruby ecosystem is vast and vibrant (sometimes inconveniently so).

I imagine that the C++-for-the-web ecosystem is going to be a bit more spartan.

daemin · on Jan 11, 2012

Do you actually mean C?

Since in C++ you can use STL classes for buffers where they are automatically resized as appropriate, streams for IO, etc. So you'd effectively have the same resizing and safe objects to work with as in a higher level language.

As far as I see it both C++ and Ruby have a similar amount of libraries that can be used, they both have a major web framework, Rails for Ruby and OKWS for C++, and there's probably a plethora of smaller frameworks and helpers available for each language.

Rails may seem like it would be winning (versus C++) on the ecosystem, but that's because there's so many hip gems for Rails out there, but if you take a look at all libraries it probably won't be so.

jacques_chester · on Jan 11, 2012

I mean both, though OP addressed C++. And you've proved my point.

Proper use of carefully-written classes will reduce the risk of a buffer overflow -- but you will need to exercise additional vigilance ("Make sure you use STL! Why isn't this code using STL?" etc), over and above addressing the problem, to ensure string safeness is assured.

That's a dead loss that programmers in memory-safe languages simply don't have to pay for the same level of confidence.

daemin · on Jan 11, 2012

I consider the STL to be (essentially) part of the language, hence to program without it, isn't really programming C++ anymore. This is especially true since the introduction of C++11, where the STL enables many language features.

Essentially a string is std::string, a resiable buffer is std::vector<unsigned char>, and for a string or a resiable (or unknown size) buffer to be anything else you need to provide a very good reason.

I would also be careful with Ruby since a lot of (performance specific) gems and plugins for Rails are written in C, and hence potentially suffer the same problems that one is trying to avoid by using Ruby. Or at least you end up in the same boat if you were using C or C++ to begin with.

VikingCoder · on Jan 10, 2012

> That the language requires constant, conscious diligence to achieve a security baseline that comes for free in other languages is wasteful.

For extremely expensive values of "free."

jacques_chester · on Jan 10, 2012

Could you elaborate?

VikingCoder · on Jan 10, 2012

No language exists yet that is aware of all security issues. No matter what language, library, framework, toolset you use, you need to be aware of multiple vectors of attack.

Claiming that some languages solve these problems for you, for free, is disingenuous.

jacques_chester · on Jan 10, 2012

That's not what I claimed.

I pointed out that languages with safe string handling and memory management have a higher security baseline. As in, what you start with, for "free".

Achieving comparable string and memory security in C/C++ is an expense. You must take active steps to prevent those exploits; and even then human error means that you will have a higher risk than the "free baseline" languages.

petercooper · on Jan 10, 2012

[..] I haven't had a reason to touch them. The few times I've seen or heard about them, it's all been linked to frothy people clamoring for the latest shiny thing.

Avoiding a technology by applying a frivolous standard like this is a good way to miss out on useful learning experiences.

"Doing" is great, but stretching your boundaries and trying new things outside of your comfort zone is an essential part of being a well rounded programmer.

rachelbythebay · on Jan 10, 2012

Ah, you grabbed onto the second sentence while dismissing the first. I avoid them because I haven't had a reason to bother. The task could be accomplished some other way, and was.

I've used plenty of things used by "foamers" out of necessity. I don't enjoy it, but it does get the job done.

petercooper · on Jan 11, 2012

I've used plenty of things used by "foamers" out of necessity. I don't enjoy it, but it does get the job done.

An unpleasant subset of a technology's userbase has no relationship to the usefulness or learning experiences relating to the technology and is of no relevance to the well-rounded, life-long learner.

Being distracted by the who in technical pursuits is a burdensome impediment against learning the valuable whats and whys.

rachelbythebay · on Jan 11, 2012

You have to work with the 'who' to actually get anywhere with some of this stuff. And some of these communities are less than friendly. Some of them are outright hostile.

Does that help?

kragen · on Jan 13, 2012

Also, the other users of Ruby are the ones who wrote the libraries you get to use in a Ruby program. Which is pretty relevant to the person in a nearby thread who implicitly equates using the Linux virtual memory system to using Rubyful Soup.

sunchild · on Jan 11, 2012

"At some point, you just have to shut up and make stuff."

This should have been the beginning, middle, and end.

_delirium · on Jan 10, 2012

I agree they're not a big deal per se, but I do find them clunkier. Facebook seems to agree, and argued that it was worth expending the considerable engineering effort it took to write a whole new PHP runtime, just to avoid the obvious alternative, writing more of Facebook in C++. Though Google does write a lot of its backend stuff in C++ successfully, so culture and what kind of web software you're writing may be part of it.

alexchamberlain · on Jan 11, 2012

No, Facebook had LOADS of PHP code thst sas failing terribly. It was easier to write a PHP compiler that a rewrite in C++.However, now they have control of the compiler, they can probably slowly replace the PHP with C++.

akkartik · on Jan 10, 2012

Perl and python and ruby aren't just easier because they have more libraries. They're easier because they have hash literals, a more flexible module system, etc. (ahem, implicit/dynamic type checking)

I agree that the reliance of rails apps on gems is unfortunate. They're rediscovering DLL hell for themselves. But the original DLL hell was all C++ apps. You can shoot yourself in the foot with over-reliance on libraries in any language.

NathanWong · on Jan 10, 2012

It's not clear that this comment is addressing any real issues.

Firstly, std::unordered_map is officially part of C++ now, and it's been available in all major compilers for a long time. Secondly, there's no substantial evidence provided to support the claim that the module system in a dynamic language is "more flexible" - anytime you want to include someone else's code in your source tree in any language you can just drop the source in as though it was your own. Whether there's a "package manager" to hide the references for you is beside the point; it's not as if the concept of shared libraries is lost upon C and C++.

And "ahem" does not establish a valid argument as to why dynamic type checking is "easier" (whatever that even means). In fact, I believe a strong argument could be made that dynamic type checking is the worst aspect of dynamic languages, and the interest that Haskell has been brewing up lately would tend to support this. Static type checking brings bugs to the forefront when the developer is in the room instead of the user. One could argue that runtime reflection provides more flexibility and thus makes dynamic languages "easier", but it's a stretch to extend that to dynamic typing.

I'm unsure of how DLL Hell would even apply to a SaaS application, unless we're referring to completely different things; you have complete control of your application's environment in a web app. If there's a package already installed that you don't want or something is missing, you remove it or install it. With the prevalence of virtualization and virtualization-as-a-service, there's no reason to be trying to run two different applications that require different dependencies on the same virtual box.

akkartik · on Jan 10, 2012

It sounds like you're claiming C++ is as convenient as a 'HLL'. If so, I think the divide is too great for us to talk across.

I've worked in C++ for ten years, but I think scripts in perl or python are more convenient for many tasks. I haven't met anybody who disagreed about something so basic, so I'm not sure how to respond. I use unordered_map all the time, but I wouldn't claim it's the same as support for literal hash-tables in the language. Implicit typing is useful if I am happy to allow my tasks to die at runtime because I screwed up. After all, C++ apps often segfault when I screw up.

boneheadmed · on Jan 10, 2012

I have to say that I like the perspective. I think that a good programmer should be able to use virtually any good language to create anything.

In particular, it seems worthwhile, if for example one is well versed in say Ruby, to learn C to get at the root of what is occurring "under the hood". I really enjoy Ruby, but I'm learning C to fully immerse myself in what is occurring with the bits and bytes (or at least the bytes).

gaius · on Jan 10, 2012

Ah, you kids! Back in the '90s, we thought nothing of building websites in NSAPI.

racER · on Jan 11, 2012

Well, I can't see any specific advantage on using C/C++ as backend of a plain old CGI API. Personally I'd use PHP or something like that, which integrates tighter with the web server. Using C on CGI is propably not very much faster compared to an optimized PHP environment with precompiled (cached) p-code.

> "C programs are suspectible to memory corruption." (tptacek)

Yes, they are, because in C you can do memory corruption, in many other languages you can't (even if you'd want to). But where do these corruptions most likely occur, when speaking in context of web applications? Yes, in I/O and string operations. And all these can be mitigated with somehow "safe" classes - by these I mean not a home-brew string-class, but something like STL (which has proven stability).

However, is memory corruption the only security risk? In my opinion, an average C/C++ programmer creates more secure code than an average PHP programmer - just because a C programmer is used to the intrinsic security issues, while the PHP coder won't produce an buffer overflow by not validating input, but leaves eg. XSS or SQL injection holes.

Writing a web application in C without preparing for safe I/O & string operations is as bad as writing dirty script code in PHP/Perl/Ruby/...

At my company we've written a really big web application (a hosting control panel) completely in C/C++, but for other reasons than execution speed: the runtime dependencies of a sellable web application are pure horror. Neverending CPAN-depencencies in Perl, incompatible function changes in PHP, and so on. With a monolithic app (web server & application logic all-in-one) you just need a libc - that's all. Easy to roll out, and thus easy to sell. :)

16s · on Jan 10, 2012

I've used C++ with cgi for years too! Glad to see others do it too. Makes for blazing fast web apps and I'm just more comfortable working with it.

mwd_ · on Jan 11, 2012

Choosing a good language for your project is all about understanding trade-offs and the overall space of languages. It's hard to do this if you only know C++ and view everything else with suspicion. The comment "...back to programming languages, one thing that always bothers me is why new ones keep appearing" sounds like it was written to satirize that sort of perspective.

ComputerGuru · on Jan 11, 2012

Does anyone know a real (not proof of concept) FastCGI C++ library/framework?

sciurus · on Jan 11, 2012

http://news.ycombinator.com/item?id=3449294

obtu · on Jan 11, 2012

Wt (I haven't used it): http://www.webtoolkit.eu/

scscsc · on Jan 11, 2012

I find it ironic that the author says that you should just shut up and make stuff without taking his own advice (the part about making stuff will probably follow soon).

Having written a C(++) web-app (using CGI), I'm ok with writing in C++ (it's not that bad), but the advantages of using a language more suited for web development are not negligible. The author would probably find this out the hard way (if/when (s)he gets to making stuff).

obtu · on Jan 11, 2012

Here's the followup: http://rachelbythebay.com/w/2012/01/10/cpp/ It's a round-up of her recent blog posts about super trunking scanner, which scans airwaves and displays the info it gleans to a web ui.

lrobb · on Jan 11, 2012

The reaction to this piece is interesting... I was sort of "ho-hum" about it, but I've written many a "web-app" in c/c++ - cgic on linux, isapi on windows... not to mention socket servers with thread pools implementing custom protocols...

I don't say that because I think it makes me special -- it's what we ALL used to do because that's the only thing we had.

After you've done that for a few years, you become VERY efficient at banging out code.

krf · on Jan 10, 2012

qDecoder by Seungyoung Kim (www.qdecoder.org) is a terrific open source library for using CGI with C and C++.

A lot of us did what this article mentioned - but probably in the 1990s when desktop programmers were moving to the web. Most probably settled on Perl after trying C and finding development was much faster with Perl.

albertsun · on Jan 11, 2012

OkCupid is written in C++

http://www.okcupid.com/about/technology

reactor · on Jan 11, 2012

http://www.treefrogframework.org/ might help.

john2373 · on Jan 11, 2012

> At some point, you just have to shut up and make stuff.

Amen.

hackermom · on Jan 10, 2012

For those who never dealt with the server end of the web in the mid-to-late 90s I can tell you that native executables for CGI was at that time not the least rare or uncommon.

biasedstudy · on Jan 10, 2012

Having a lightweight, standalone executable with quick start up time is still a major win.

cageface · on Jan 10, 2012

What about caching database connections? Opening a new connection on every request can be very slow, particularly for Oracle.

javadyan · on Jan 11, 2012

One may consider using FastCGI to avoid the costs associated with starting a new process with each request.

rubashov · on Jan 10, 2012

This plus something for disk persistence is totally sufficient for writing web apps in C++:

http://cgi.sourceforge.net/

akkartik · on Jan 10, 2012

Disk persistence was actually the question I came to this thread with after reading the article.

jQueryIsAwesome · on Jan 11, 2012

Some of her points are vague and unclear:

> that somehow you would need to abdicate almost all of your code to modules where you just hope they work

Well, you are using jQuery, jQuery UI, jQuery Player and jQuery Scroll-into-view in your example project, that is actually a big pile of modules if you ask me.

drivebyacct2 · on Jan 10, 2012

Why C++? Go (golang.org) is too much fun!

drivebyacct2 · on Jan 11, 2012

I wish I could edit this comment, I think more explanation would have helped. Go is low level (originally marketed as a systems-level language), but it really is truly fun to use and it has whole swath of really comprehensive libraries for throwing together web apps very quickly. (SPDY, WebSockets, and much more, but I appreciate those features out of the box)

madaraz · on Jan 10, 2012

There is a reason that NO major web application you name is written in C++, Gmail, Facebook, Google Docs, YouTube. There are major advantages to writing web applications in a higher level language and almost no advantages to writing them in C++.

motti · on Jan 11, 2012

Gmail was originally mostly C++ (server-side), see http://www.quora.com/What-programming-languages-is-Gmail-imp...

www.google.com remains C++ to the best of my knowledge, see http://web.archive.org/web/20110708015633/http://panela.blog...

javadyan · on Jan 11, 2012

I doubt that the parts of YouTube responsible for video-related heavy-lifting (transcoding, detecting copyright infringement, etc.) are written in a dynamic language like Python, Ruby or PHP or even a managed language like Java.

radarsat1 · on Jan 11, 2012

I always imagined that they just patch together a bunch of calls to mplayer and ffmpeg using a bit of shell script.. ;)

javadyan · on Jan 11, 2012

Maybe, but this won't do when you actually have to do algorithmic processing of the video (which I think Youtube does after all).

javadyan · on Jan 11, 2012

and by "algorithmic processing" I mean detecting when the video contains a pirated song, for example.

alexchamberlain · on Jan 11, 2012

I'm sure you could script that out to somone else... :oP

hlfcoding · on Jan 12, 2012

YouTube optimizes their Python with psyco: http://www.slideshare.net/didip/super-sizing-youtube-with-py...

outside1234 · on Jan 11, 2012

unless you are google and running at huge scale. then its C++ because they can hire a huge crowd of newbies from an elite university to maintain it and the cost divided across their huge server base is zero.