The scalability argument doesn't make sense to me. When you're dealing with Google's scale, you need to parallelize horizontally, and you need to design your software in a way which lends itself to horizontal parallelization. For this type of software how much you squeeze out of a given machine is almost irrelevant to scalability (other than the cost of maintaining the extra machines). Of course once the cost goes up too much, you can always rewrite in C.
The cost starts out too high to justify releasing something that's not optimized when you're operating at Google's scale, especially since the development costs are largely up front whereas the ongoing compute costs keep going for at least a few years. If they didn't keep an eye on this stuff, those factors of 2-10x would eat them alive.
For some value of M and N, having N million users around for M months means that development costs are actually cheaper than compute costs, and I'd assume that they've now reached a point where most new products expect to see more than that magic number of users. A few engineers for a few extra months is only ~100k, and I have no trouble believing that many Google products cost them at least that much over their lifetimes due to resource consumption.
The rest of us can safely ignore all this because our products actually need to grow before we pass that threshold, at which point we deal with the issues; Google is in an enviable position where no optimization is premature.
Even for the rest of us, though, given the performance differences in Python vs. Java (best case 2-4x, worst case more like 40x) relative to the productivity differences (I can't believe this is more than 10x, even for someone very comfortable in Python and merely competent at Java), I'd suspect that many high use software projects even outside of behemoths like Google are actually cheaper in the long run if they're done in Java than they would be in Python.
Prototypes are another issue altogether, but I haven't seen anything that says Google is discouraging people to do those in whatever language they want; AFAIK it's production code that we're talking about here.
Some concrete numbers on the productivity differences would help.
The chart in Software Estimation by McConnell, adapted from Software Cost Estimation with Cocomo II is that projects are 2.5 * bigger in C than Java, and 6 * bigger in C than they would be in Perl or Smalltalk. Assuming that Python is equivalent to Perl, that would mean that Java requires 2.4 times as many lines. Those estimates are from 2000. Both languages have improved since then, but I'll go with that estimate because I don't have more recent figures backed up by quantitative data rather than someone's opinion.
Research across many languages suggests that lines of code/developer/day are roughly constant, so Python development should average 2.4 times as fast as Java.
Let's suppose that your programmers are paid $80K/year. On average people cost about double their salary (after you include benefits, tools, office space, etc), so each programmer costs you $160K/year. Per Python programmer replaced you need 2.4 Java programmers which means an extra $224K/year in cost. Let's suppose your computers have an operating cost of $14K/year. (I am throwing out a figure for regular replacement, networking, electricity, sysadmins, etc.) Then the cost per Python programmer replaced will cover 16 more machines.
And it can get worse. Research into productivity versus group size suggests that productivity peaks at about 5-7 people. When you have more developers than that the overhead for communication exceeds productive work done unless you introduce processes to limit direct communication. Those processes themselves reduce productivity. As a result you don't get back to the same productivity a team of 5-7 people has until you have a team of around 20 people. If you grow you'll eventually need to make this transition, but it should be left as long as possible.
Therefore if a Python shop currently has less than a dozen webservers per developer, switching to Java does not look like it makes sense. And furthermore if you've got 3-7 Python developers then the numbers suggest that a Java switch will force you over the maximum small team size and force you to wind up with a large team at much higher expense.
For this reason I believe that the vast majority of companies using agile languages like Perl, Ruby and Python would be worse off if they switched to Java. When you serve traffic at the scale of Google this dynamic changes. But most of us aren't Google.
> Research across many languages suggests that lines of code/developer/day are roughly constant, so Python development should average 2.4 times as fast as Java.
Some people, with more Java experience than me, claim that modern IDEs like IntelliJ make them as productive with Java as they would be with Python. Java's verbosity is "mechanical", and if your tools help you with that, it's hard for me to buy 2.4x productivity difference. You are also assuming that static typing has no effect on how many people can work together. In Python (which I use for prototyping and appreciate), one might often wonder "does this method take a class, an instance of a class, or the returned value as an argument", etc.
Debates on relative productivity are endless. I used the only numbers I have available that are backed by actual quantitative data rather than opinion. I'd love to see more up to date numbers.
However even if you assume that the difference is much smaller, say a factor of 1.2, then you can afford an extra 2.3 computers for every developer. Oddly enough in the various companies I know well with small teams of experienced scripting programmers I've never seen that high a ratio of webservers to developers, so even so switching to Java doesn't make sense.
On the question how many people can work together, needing to talk about data types is such a small portion of what people talk about that I would be shocked if it changes where the cutoff is between where small teams break down, or where large teams become as productive as that peak. That said I fully agree that Java is designed to let large teams cooperate, and that's likely to matter when you have teams of 50+ programmers. However if the productivity difference really is a factor of 2.4, and we assume linear growth in productivity for large teams, then you'll actually need a team of 50 or so programmers in Java to match the team of 5-7 programmers working in a modern scripting language. Given that, if you're working in a Java team below that size you should seriously ask yourself whether having a team size that requires getting that many people working together is a self-inflicted problem.
I write my MapReduces in C++ the first time. Why? Because writing them typically takes only an hour or two. Running them can easily take a couple days. If I take a 10x productivity improvement for a 10x execution slowdown, my development time goes from 3 hours to 20 minutes, but my execution time goes from a couple days to a month. Not really a great tradeoff.
I'm not a C++ programmer and do not know the complexity involved with a map reduce, but surely there is still a greater chance that with C++ that you will shoot yourself in your foot (or chainsaw, take your pick).
I'd say the chance that you shoot yourself in the foot is roughly equal with both languages, but with Python, it's far more likely that 'tis but a flesh wound. With C++, you're likely to sever an artery and need extensive vascular repair.
The point is that with small programs that operate on big data, the cost of shooting your foot is less. There's less code to wade through, so you can find and fix your bugs quickly, and it still costs you less time than you'll lose in execution speed.
Other types of programs have different complexity/execution tradeoffs, and other languages may be more appropriate for them. I actually do the majority of my programming in Python - but that's for other things, where I have to iterate rapidly yet am typically the only one hitting my server. That's very different from a program that you'll write once, run once, and never have to maintain again. (Or one that you write once, run many times, and never maintain again...which describes a bunch of other pieces of code.)
Google may spend about $500,000,000 / year on servers [educated guess, probably within a factor of 2]. More than the cost of 1000 engineers. It's worth it for them to put serious effort into efficiency.
Perhaps their bottlenecks are mostly in bandwidth, memory and disk space? Otherwise, I'm very surprised Google doesn't invest A LOT more into languages and VMs.
Bisection bandwidth is likely the most scarce resource in their environment, with memory following. Not sure on how disk and compute would rank. Architecture is the most important tool for optimizing cost on such systems, which is why they put effort into things like bigtable and map/reduce. Language efficiency does matter when you're buying servers by the truckload, but it's impact on cost is linear, whereas mistakes in architecture could be much worse.
You answered your own question. On Google's scale the cost of maintaining the extra machines justifies writing the application in an effective language.
E.g. when you have an app written in C and it's 10x more effective than a Python implementation (which is quite realistic assumption) then you will need only 100 servers instead of 1000.
...yet the guy from Google explaining the reasoning claimed higher memory usage as one reason Python was discouraged.
I don't know myself, as I've never written a full non-trivial, scalable business application side-by-side in Java and Python, but I'm definitely hesitant to disregard what a Google engineer says about it, as I'd imagine they have more experience with that sort of thing than most of the rest of us, and I can't believe they'd make technology decisions like that without measurements to back them up.
Is it possible that the benchmarks are not giving realistic estimates about how large apps scale in memory usage?
I thought that in any modern OS you would have the libraries loaded only once and the only thing that is multiplied across processes (or threads) is the working data for that specific thread or process. The overhead should be minimal.
If not, it's an OS problem outside the domain of the Java and Python maintainers.
What we are apparently seeing is that the working data is larger in Python.
I thought that in any modern OS you would have the libraries loaded only once and the only thing that is multiplied across processes (or threads) is the working data for that specific thread or process.
You'd think that, but apparently mmaping bytecode would be too easy for the JVM people so each process copies all its bytecode into its heap (modulo the class data sharing kludge). I think Ruby also enjoys this misfeature and I wouldn't be surprised if CPython does the same.
What we are apparently seeing is that the working data is larger in Python.
Not surprising since a Java object is a struct but a Python object is more like a hash table.
Sorry, but it is very uncommon for Java to use 10-100x as much memory.
Those benchmarks are pretty irrelevant because all the algorithms in there are short-lived.
As far as I know CPython uses reference-counting for GC. And while this has definite advantages, this creates the potential for memory-leaks when using cyclic data structures. Also, the heap can get pretty fragmented, which for long-running processes can leave the heap looking like swiss cheese.
This doesn't happen with Java, but as a side effect a compacting GC usually allocates twice the heap size needed since it needs space to defragment the heap. The JVM's GC is also a Generational GC, separating generations in multiple regions, such that short-lived objects can be collected faster and newer objects can be allocated faster with speeds comparable to a stack allocation.
On Python, reference counting has the advantage of being cache-friendly and pretty deterministic. And when it comes to web servers that fork processes for requests, this is alleviated by the fact that the Python process has a short life.
On memory consumption, yes, Java might have the heap doubled, and the garbage collection is more non-deterministic. But it depends on your application ... the JVM ends up using memory a lot more efficiently for long-running processes, although the upfront cost is higher.
Java 6 -server memory use reported on the benchmarks game site that's around 12,000KB - 14,000KB is base JVM use at default settings - so it probably isn't telling you much that's interesting.
Although you might see a couple of examples where CPython memory use is higher because of buffering before output from multiple processes can be synced -