Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
C++: A language for next generation web apps (stevehanov.ca)
104 points by TomasSedovic on Jan 26, 2010 | hide | past | favorite | 95 comments


To be fair, after coming back to C++ after years in the world of Python and Haskell, it's not as bad as I remembered.

C++ is actually a moderately effective functional programming language. With reasonable knowledge of the standard data structures, and using BOOST, one can actually write fairly expressive and effective C++ code.

Most of the time, I feel like all I'm doing is writing a much more verbose version of Python. The rest of the time, I'm hunting nightmarish segfaults and segfault and template errors.


The question isn't "Can C++ work?" or "Will C++ be faster?" because we all know it can work and that it will be faster. The question is "At what point does the extra development time become more expensive than just adding another server to my Ruby/Python/C#/Whatever App"

For me, someone who learned C++ in school but has been in managed enviornments ever since, I'd be too scared to use C++. I'm not ashamed to admit I get a lot of help from the Internet when I'm stuck on a problem and that help is available because everyone else is using the same tools I am. Using C++ to code web apps puts you out in the wilderness as far as I'm concerned and I just don't think I could do it.


And it's not just the dev time: with C++ (and C, and any notorious unsafe language..) you've got to be extra diligent about security, which uses even more time, and requires expertise not just in C++ but also how to write secure software in low level languages... a skill many C/C++ devs lack, unfortunately.


"because we all know it can work and that it will be faster"

Speed of execution depends more on the programmer's expertise than the language used.


Not really true. I can write the same algorithm in Perl as in Haskell, and the Haskell always runs faster.


But then your expertise is not that different from your expertise. Your perl will probably still be faster overall than a beginner's haskell, not for the same algorithm, but for solutions to the same problem. Knowing which algorithms perform well for a given problem solved in perl is the most relevant part of expertise.


So maybe he meant "on the expertise of the programmer" -- of the language/compiler...


Chances are that the majority of developers won't find the most efficient algorithm to solve the problem.

(vs. the most efficient implementation of that algorithm... which is clearly going to be in a combination of C and assembly anyway).

Anyway, the guy using the good algorithm ends up with the fastest code. Different compilers will produce different quality of code.

Question is, if I am using C++ vs. a guy using Perl, will I ever reach the efficient algorithm? I might call it quits when I finally get something to compile and not segfault.

(Of course even this is kind of a strawman, because the perl guy can just reimplement in C++ when he figures out that he wants more speed out of his good algorithm).


"Chances are that the majority of developers won't find the most efficient algorithm to solve the problem."

If that is so and the code does need to run fast or use little memory then the majority of developers would benefit from a fast language.

Very often, code doesn't need to run super fast. Memory usage and concurrency (the GIL) are greater issues with dynamic languages in my view.


I don't get the constant complaints about the GIL. Letting your Python program run on 2 cores will make it 2x faster at best. Rewriting it in, say, Javascript or Lisp or Haskell or Java will make it run 2-50x faster on one core. After you get your 50x speedup, then you can worry about the 4x you'll get from buying 3 more processor cores.

(And oh yeah; it's only shared-memory concurrency that things like the GIL affects. If you have a job to do that wants to use 8 cores, split the job up into 8 parts and invoke your program 8 times. There's your 8x speedup.)


> 2x faster at best

That's if you're CPU-bound. I don't use Python, but I made an image acquisition program in C++ which could be a relevant example. We wanted to save the images to disk in real-time (30-60FPS). Doing this in the acquisition loop would make the software unusable (the goal is video-rate confocal microscope imaging); it's far too long, and much of it is just due to disk writes being slow, not to the compression time. Using a thread pool was the solution, not because of an actual increase in speed, but because from the loop's POV the write went from blocking to non-blocking so the CPU stopped wasting time waiting for the disk.

We also wanted shared memory since there can be a lot of image data which is shared between the image compression & saving, display, and possibly statistics or filtering modules.


Dunno about Python, but Perl has a library to do all disk writes in a separate (p)thread, so your main control thread never blocks on IO. (This is in addition to the usual event-loop tricks; I know Python can do nonblocking IO that way.)


I can only tell you why _I_ am constantly complaining about the GIL. It's because I would like to use a Python/C combination for in-memory data analysis. C gives me the speed and memory efficiency and Python gives me the ease of use and the web stuff.

There is no 50x speedup to be had as it doesn't get any faster than C. The only significant speedup will come from parallelism. 8 cores this year, 16 next year and probably a 100 cores in a few years. Since I'm holding a lot of data in memory I can only run one process not many unless I implement each and every data structure on top of shared memory, which I'm not going to do because it's unproductive.

I cannot use Java or JavaScript or any language that doesn't have value types (i.e. structs and arrays of structs) with a well defined memory layout. I don't want to use Haskell because my problem doesn't lend itself to functional programming as it's inherently stateful. I feel I would have to fight the nature of Lisp to make it use as little memory as C. It makes no sense to use Lisp when I need to know how lists are laid out in memory.

The only realistic options right now are pure C, pure C++ or C#. Go does have all the right properties as well. It's very immature at this point though.


Python C extension modules can release the GIL while they're running which allows for true concurrency. See this Google code search http://www.google.com/codesearch?q=python.org+lang%3Ac+Py_BE... for example usage.


I know this is possible. But I'm having trouble to imagine a web app design (or other network server) based on that idea. In order to use a lot of data in memory I would have to have a single Python process that gets called by nginx or apache. So that would be a bottleneck even before I get a chance to call my extension. And later there wouldn't be much code left that executes in the Python interpreter, which kind of defeats the purpose.

But I have to admit that I haven't fully thought this possibility through. Maybe you're right that it can be made to work.


I guess my point was that the fast languages are generally not the ones that are good for developing the fast algorithms. It is harder to experiment in a language like C than it is in a language like lisp or perl.

The other thing to point out is that there is no reason for memory usage, concurrency or speed to be problems in dynamic languages. These are all issues with the implementations of compilers/interpreters that we are using.

It just so happens that dynamic languages have only recently come back into vogue, and we have forgotten (at least in ruby and python) all of the work that was done to create efficient implementations of dynamic languages.

Examples being how well Lisp stacked up against C as early as the 80-90s, projects like StrongTalk, stack based languages like forth... the multitude of papers on efficient scheme implementations.

Dynamic languages were declared 'slow' and therefore were dumped in favor of C by most programmers. This has caused a gap in the knowledge that we have about implementing dynamic languages. Which is a shame, because there is a lot out there for us to relearn.


There are some fundamental problems with making dynamic languages run fast. Being able to prove that some variable will never contain anything other than a 32 bit int allows the compiler/JIT to do things that it cannot otherwise do.

The only way to make dynamic languages as fast as statically typed languages is to selectively remove dynamic features. A few type hints can make a huge difference.


Your experience with Python and Haskell has most likely made you a far better C++ programmer than you were before.


This is absolutely the case.

I'm now used to first class functions, currying, etc. C++ actually has them, hidden in the stl and boost::function. Whatever I can't do using those, I can usually do by creating a family of classes which only implement operator() (I've only needed to do this once or twice, and it might have been avoidable).

Due to using a language with native dicts (and syntactic sugar), they are now part of my vocabulary. This means I immediately reach for std::map or boost::unordered_map when it makes sense.

Similarly, python generators and haskell lazy lists have made view sequences as reasonable objects to generate and iterate over. Custom iterators are just the same thing with added verbosity.

If I never used higher level languages, I'd still be treating C++ as C with objects.


In C++, I'm more likely to write a struct/class which implements operator() than an actual function. It lets me write code like this, which does mutate state, but I put it off as long as possible:

  xformerlist& lbrace = node.children.front().xformations;
  xformerlist& rbrace = node.children.back().xformations;
  fn_and<shared_variable> row_and_flat(&shared_variable::is_row, &shared_variable::is_flat);
  const sharedset& flat_ins = filter(row_and_flat, set_union_all(in, inout));
  const sharedset& flat_outs = filter(row_and_flat, set_union_all(out, inout));
  const sharedset& flat_all = set_union_all(flat_ins, flat_outs);
  make_conditions<gen_in<row_access> > make_gen_in_row(parcond, conds, local_depths);
  make_conditions<gen_out<row_access> > make_gen_out_row(parcond, conds, local_depths);

  append(lbrace, fmap(make_gen_in_row, flat_ins));
  append(rbrace, fmap(make_gen_out_row, flat_outs));
I imagine that code would give a lot of people fits. In my current project, I ran with a more functional approach to C++. I don't know if I'll keep everything I've used in this programming style, but I certainly will keep some of it.


Oddly enough I'm back to C++ too, after Ruby.

One nice thing is; in a fast language, tests run quickly too.

Also, Python and Ruby don't have the equivalent of an ASSERT (you could roll your own but it's not standard practice).

Also, C++ has standard hash lists and complicated data structure are actually going to run quickly.

And while memory management can be a pain, you are doing it yourself so you can track down memory leaks rather than having them be inherent in the interpreter...

...as in Ruby AND Mono.


Also, Python and Ruby don't have the equivalent of an ASSERT (you could roll your own but it's not standard practice).

Ahem:

http://docs.python.org/reference/simple_stmts.html#the-asser...


C++ doesn't actually have a standard "hash list" (I assume you meant hash map or hash set?). Depending on your implementation, you either have hash_{map,set}, unordered_{map,set}, both, or something entirely different. tr1 specified unordered_{map,set}, but tr1 is only technically a proposal, not a standard.

I think a safe assumption is that tr1::unordered_{map,set} will be available in all implementations by now, but until C++0x is ratified and implemented by major compilers, you will run into different platforms having (possibly) different implementations. And lets be honest - even after C++0x has been implemented in major compilers, you will still have subtly different/buggy implementations.

edit: I had conflated {hash,unordered}_{set,map}. Fixed


Nah, the committee isn't going to change unordered_set or unordered_map; they're struggling with far more urgent things, to finish the std off. Those have already gotten the full treatment, they have the usual std:: collection interface; what's to change?


Or you can use boost::unordered_set right now, in any compiler, whether it supports tr1 or not.


RubyOnRails is very test gunghu, but to run a hello world test takes seconds, anyone know why? This seems very odd. Is this one of those do what I preach, not what I do things?


I doubt it takes seconds to test a hello world program in Ruby. However, to test a hello world Rails application is a different story, as you have to start up the web server, which takes seconds (even though it eventually just serves a simple page).


Not necessarily the web server.

Standard `rake test:units` (or even `ruby test/units/model_test.rb`) takes the whole Rails stack so it adds up.


You're right and I agree. But let's be careful not to equate Rails with Ruby. Rails tests may have a slow startup time, but Ruby tests do not and other Ruby web framework tests may not.


Because if "performance doesn't matter" is repeated often enough people actually start to believe it, and if enough people believe it, it becomes false. Is there a name for this logic? :-)


Confirmational Bias followed by a Self-fulfilling Prophecy. But you knew that and I think your question was purely rhetorical ;-)


So, what is great, modern C++ code to read?


It's a book but Accelerated C++ by Koenig and Moo has been well reviewed.


It's perfectly sane to write web services (computationally heavy processes accessed via REST) in C++. But for the algorithm-light, marketing-heavy frontend to such a service, use something easier—the optimizations C++ offers aren't worth it there.


I don't have any experience with it, but I'm under the impression that MSVC / C++.net has all the goodies you'd need to do the easy frontend stuff also?


Sort of - I don't think that the web templates and CodeDom are there like they are for VB/C#. It's also kind of a mess going between managed and unmanaged code, which you'd be doing extensively if you want to use the System.Web namespace. Managing references on the managed heap is an extra hassle that has a fair amount of complexity. To top it all off, the support for the managed C++ extensions has been lackluster.


I thought at first that the article was sarcastic :-| I mean, what's taking time is not whether you use ruby python or c++ but the http request, request to the DB, IO or javascript loading..

And anyway, the part that really need to be optimized can still be done in C even if you use python.

There was a time I was a C++ guy who wanted to control memory and everything.. but now, I've got other things to do. If I can write 1 line that is more readable and cost less to type, why should I use C++ ?

And by the way, C++ isn't a verbose python. And, even if I once thought that boost was the best thing ever made, I feel that it's a waste of time. Instead of using meta-programming hacks to use lambda in a clumpsy/ugly way, why not simply use python or scheme ?


> I thought at first that the article was sarcastic

This article definitely is sarcastic.


IIRC OkCupid uses a webserver and application stack that is all C++

Ref: http://www.okws.org/doku.php?id=okws


What has not been mentioned at all: how C++ webapps will spell the end of XSS and SQL injection as hackers refocus on the much more interesing but almost-forgotten buffer overflow vulnerabilities.

Oh joy!


Such vulnerabilities still exist in PHP, Python and Ruby, because they are written in C. And they are much easier to exploit because almost every web app uses one of these languages.


Nonsense. Just because the Python interpreter is written in C does not mean that you can overrun Python strings and smash the stack like you can w/ C strings.


Its not just strings, all kinds of data structures are vulnerable. See: http://www.hardened-php.net/hphp/zend_hash_del_key_or_index_...


I smack my lips in expectation of the usage of gets() in web apps.


"MySql is GPL'd, so you can't even link to its client library in a closed source app."

AFAIK it's GPLv2 (and not AGPL) so this is only true if you intend to distribute your application itself, not just host it yourself.


I believe it does however mean that your application code has to be GPLv2, and thus can't be linked to code using some popular licenses, for example Apache 2.0.


For a two to three years i was running in a VPS server in VPSLink which had about 64MB of RAM - for everything and without swap space. The target use of this VPS was probably as an email server or something as most people recommended the more expensive 128MB and 256MB plans for serving (static) pages. And actually there were only about 40MB left since the OS needed some memory for itself too.

Personally i thought that with better resources management i could do much more, so i wrote a custom HTTP server in that could fit in less than one MB of RAM. Most of my pages were generated offline using a custom program in FreePascal.

The server could also execute CGI scripts, so i also wrote a forum in FreePascal.

According to my logs, the whole system ran out of memory only once :-). Until the day i decided to give myself a little more features (when i got a much better VPS from Linode) i had about 5-6 sites running (different domains), a Subversion server and a few "dynamic" apps.

The forum can be found here. I still run it in my new VPS, although it got some spam. The a + b = ? anti-spam feature was new when i wrote the forum but seems that bots got better :-P.

http://www.badsectoracula.com/projects/mforum/


Interesting to see that you used FreePascal! We are also experimenting with FreePascal and (fast)CGI. My collegue put up some sample pages: http://services.cnoc.nl/lazarus/index/fclweb


Let's say C++ can code be executed OVER 9000% faster than python.

However this does not imply a web app written in C++ will run even 1% faster than a web app written in python unless the performance bottleneck is code execution.

If the performance bottleneck is instead the database server (which it almost always is) then choosing C++ for your next webapp would be a _very_ masochistic premature optimization.


Way back in 2000, when we launched Planetarion (http://en.wikipedia.org/wiki/Planetarion), we used C++ with a custom webframework, and CORBA for communicating with the database.

At our peak late in 2002, we served about 320 million dynamic webpages a month using three desktop Pentium 3's for webservers, and a dual CPU P3 for the database. No caching, as that wasn't needed.

Blazing fast, and not all that difficult to work with once the basic framework was solid and in place.


He does have a point about efficiency, or about delivering a single app ... but you also get those advantages with a Java, or a .NET, or even an Erlang or Haskell app which are reasonably efficient ... and still, you won't have to deal with segfaults.

Also, if you want extreme scalability, like being able to serve 10000 requests/sec on a single server ... sorry, but raw performance doesn't cut it ... see this article for instance ... http://www.kegel.com/c10k.html.

Not to mention that the most usual bottleneck is the database (how many apps can you build that doesn't use one?). So even if you build the fastest web server in the world, if you're using a RDBMS you're going to end up with 100 reqs/sec, unless you're sharding or caching that data.

The bottom line is ... if you want extreme scalability, I don't think C++ is going to cut it, and you're going to invest a whole lot more in optimizations that are already done in more mature web frameworks.

Well, unless you have Google's resources and skill.


"Also, as often overlooked by anyone over 30, you have to handle UTF8 to Unicode conversion, but this is easily achieved in a 10 line function."

lol.


I had to read that a few times. What is he talking about?


    1     wchar_t *
    2     utf2wide(const char *utf)
    3     {
    4        size_t len;
    5        wchar_t *wide;
    6        len = mbstowcs(NULL, utf, 0);
    7        wide = malloc(sizeof(*wide) * (len + 1));
    8        mbstowcs(wide, utf, len);
    9        return wide;
    10     }


That is definitely the most to-the-point response I've ever seen.


Because the result of malloc is not checked, it can take the whole web server down if it returns NULL. Which can happen.


thats C not C++. geesh :P


I hear C++ can call C functions these days.


It's amazing the bridges technology can create these days. cough


Excellent satire.


I thought so too, until I got to the part where he claims it's running rhymebrain! Still can't figure out whether he's joking or not. Naw, on second reading, definitely sarcasm.


The reason why dynamic scripting languages are more appropriate for web applications than C++ is simply that the bottleneck is somewhere else - namely, the Internet is slow enough to make the performance of the server-side code irrelevant. That can very easily change in the future.


Roundtrip latency is certainly going to add up to a substantial chunk of time, but that's not much excuse to discard performance considerations. Requests that take say 50ms of processing will take even longer if the box is busy; it doesn't take much to add up and becomes noticeable.

And if your implementation on the server side is very fast, you can do more.


My point was indeed that performance can matter, and will matter even more in the future, since no apparent technological limit on network speed has been reached so far.


If this guy is really running his web server on an Acer Aspire One 512, as he says at http://stevehanov.ca/blog/index.php?id=71 C++ sounds like an excellent choice.


if you're on an embedded device, this is actually not a satire


I've worked on a lot of embedded devices with web services, and it seems pretty typical to just run a JVM and serve web pages out of it.

The few times I've seen actual web stacks done in C++, the end result has been... comedic.


Current embedded devices are likely to have at least 256Mb of RAM, 4Gb of Flash, and as much MIPS as a Pentium III. Not so "embedded" any more, don't you think? If Portability is an issue, then I am likely to use C with something like Lua. But I can't imagine for a second abandoning garbage collection before I see proof I can't afford it.


So, serving web pages from an embedded device? Have we finally got to the "your washing machine is on the Internet" era? :)


It's not exactly Facebook, but I've seen plenty of small network-connected appliances provide web admin interfaces. Routers are a common example, I've also used network console servers that do this.


I can't wait for the RESTful API for my alarm clock, and washer. ;-)


For the alarm clock radio with iPod dock I just got for Christmas, I think the RESTful API would be easier to use than the array of buttons on that thing. My wife got the old, red-LED clock radio back out because she knew how to set the alarm on it.


That almost sounds like a dare. :-)


Which is a very valid point, I have never understood why organizations that need massive scalability for simple, stateless, distributed apps would not pursue such a course.


on that note does anyone know of an ansi Forth (or fairly portable at least) server that's reasonably robust.


http://colorforth.com/haypress.htm

I'm not sure if this falls under the robust category or.. sever :) but it supports forth and has total of 360 computers running at 700 Mips or 250 Gips


Objc is probably a better candidate because it is so dynamic?


See http://news.ycombinator.com/item?id=1046516

  Web apps in Objective C --- the second-least safe programming language
  on the market. Oh please, oh please, build your next huge application
  in this. College tuition for my kids is freaking me out.


That has got to be the dumbest thing I have seen in quite a while. "Smalltalk is garbage-collected. ObjC deals in raw memory addresses. It's actually less secure than C, as I see it" shows a level of fail that is beyond explaining.


I do sometime wish Apple would bring back the original WebObjects framework with EOF. The Java version is not much fun.


I think he does a good job of humorously pointing out the difficulties involved with writing a C++ Webapp, but I don't doubt that he's serious about the performance characteristics being a good thing.


Anyone else remember ISAPI?

I still use it. Every day.

It really isn't as terrible as you'd think.



Language {x} would be good as a next generation web app if there was web app libraries. Typically those are not {x}.


Was anyone else let down that this was a sarcastic thing and not real?


Check out OKWS (http://www.okws.org/doku.php?id=okws) if you're interested in developing webapps on C++.


Why believe it's satire? He's got an actual app already deployed and a full history of blog posts demonstrating his knowledge and interest in this field.


Here's a link to the C++ code for his app: http://stevehanov.ca/blog/index.php?id=8


I just drank a couple of Pliny the Elders with my neighbor, and now I read this. My head is definitely spinning now.


Ending up at C++ because of design constraints is understandable... starting there is idiocy.


I missed the <sarcasm> tag.


He forgot to encode the character entities.


From the article:

I gave a tongue-in-cheek talk on how C++ can fit in to a web application.


The thruth is, my company maintains a C++ we application. It's implemented as an Apache module and employs some very interesting in-memory shared structures.

And, like I said, I missed that part.

It was not proper markup ;-)


He's lying, C++ apps are not portable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: