JavaScript performance and a Webkit hotspot

pennig · on May 2, 2011

Looks like WebKit has problems appending to a very long string. The less we can do that the better, so I came up with this solution:

  var dest = 'apple';
  while (dest.length < 100000) {
    var out = '';
    for (var i=dest.length-5, end=i+10; i<end; ++i)
      out += dest[i];
    dest += out;
  }

That brought the time from about 4 seconds to a more respectable 90 msec.

nod · on May 2, 2011

That reduces the number of string appends, sure. But it is well known (see either "The Good Parts" or "High Performance Javascript" that appending strings is quadratic in JavaScript, and that the best way to assemble a large, multipart string is arrayOfStrings.join('');

* And by quadratic I mean "copies the whole string every time" such that an append loop is N^2 while a push+join is linear.

rimantas · on May 2, 2011

  But it is well known (see either "The Good Parts" or "High
  Performance Javascript" that appending strings is quadratic 
  in JavaScript, and that the best way to assemble a large, 
  multipart string is arrayOfStrings.join('');

It's a bit fuzzy now, modern browser optimized that quite a bit, so this mainly applies to older versions of IE.

mraleph · on May 2, 2011

Array is an appropriate datastructure for this algorithm. String is not.

    var dest = 'apple'.split('');
    var j = dest.length;
    var start = new Date();
    while (j < 100000) {
      var offs = j - 5;
      for (var i = offs; i < offs + 10; ++i)
        dest[j++] = dest[i];
    }
    dest = dest.join('');

=> 7-10ms in the latest Chrome.

mjs · on May 2, 2011

Arrays do seem to be the way to make Chrome happy, although unfortunately they're slower for everything else:

http://jsperf.com/backreference-copy

I don't see why you say an array is a more appropriate datastructure for the algorithm, though--that seems very much implementation dependent. And at the JS level, arrays support operations such as splice that strings do not, suggesting that they'd typically be implemented with a more sophisticated datastructure, and hence slower at primitive operations like index access. (Perhaps V8 notices that these expensive operations aren't actually used, and creates a fast, basic version instead.)

mraleph · on May 3, 2011

Array is intended to be a mutable storage with a fast random access and optimistic preallocation of backing store. String is intended to be an immutable storage. At least that's how I see these types.

I also would not expect that VM will use representation optimized for splicing --- you should _always_ expect that splicing will take O(arr.length) and use appropriate data structure when you want better time bounds.

There is a limit to how VM can optimize use of data structures. It can't magically guess that you wanted list or binary tree instead of an Array. Introducing many special cases/heuristics also leads to complexity and as a result to bugs.

masklinn · on May 2, 2011

Not exactly surprised, appending to immutable strings is rarely fast, and rarely considered an interesting hotspot to optimize: you can usually fix it away by using arrays and joining them at the end.

pbiggar · on May 2, 2011

Mozilla JS engineer here. It goes without saying that I don't speak for Mozilla here, or even for the rest of the JS team. And I may misremember stuff, so don't cite me too closely.

Benchmarking is very difficult, and not just in JS. I worked on PHP before, and benchmarking was crap there too. Same in Lua [0]. Same in Python until the Unladen Swallow team packaged up some real-life applications. Even in Java, things were bad. They had SpecJVM98, and it so mischaracterized real Java programs that the Dacapo group went and formed and built benchmarks so they could optimize their research correctly.

What is an ideal web benchmark? Tough question. Parts of Sunspider and V8 tried to look at what the web was and distil it into a useful benchmark - that's a good start [5]. Kraken tried instead to look forward at what the web might be, even though people aren't doing that stuff just yet - that's good too. Perhaps what the web is now would be good, but we don't have anything like that [1] [2].

Add to this that benchmarking is inherently hard, so hard that it's its own research area. How do you weight various benchmarks to produce the really useful single number that we all want to see? That's kinda hard, and back when I read research on this, I didn't see a particularly useful answer.

Let's consider what we'd love to have. Run a web page for a little while, record it somehow, and package that up so that it can be reproduced in a useful way [3]. So what should go in it? I mainly use gmail, google reader, bugzilla, hacker news, and small amounts of other sites. What if you read nothing but Jezebel and Facebook? Will my benchmark help you? Will my benchmark even help me in 6 months after those sites are updated? Whose view of the web is the right one, at which time, and how will we coordinate that into a benchmark?

Which brings us back to the original post. That's benchmarking a decompressor, ported from C, which doesn't sound like a great real-world application to me. Its hot-loop seems to be string appending, which Firefox is great at because of ropes [4]. We actually have a few benchmarks like this. One of the V8 benchmarks uses a Scheme-to-JS compiler to produce code. And Emscripten, an LLVM-to-JS compiler, produces JS apps in the same way to OP did, and we measure these periodically to make sure we don't regress. But the OP's benchmark is hardly representative of real web usage, and which particular implementations did well here is at least partially down to luck.

So to summarize, benchmarking is hard, no-one is doing it right, I'm not even sure we know what right is, but most of us can agree that ported C apps are probably not representative of real browser usage.

And I leave you with a final reminder that I don't speak for Mozilla.

[0] This was true when my mate worked on Lua two years ago.

[1] If you can package Gmail and Facebook into a nice platform-independent package that we can run and get one single number representing its performance, we will happily use it and worship at your feet.

[2] Microbenchmarks (how fast is string appending, for example) aren't the answer either.

[3] On multiple platforms, while doing the same thing on each.

[4] Other posters commented that string-appending is an O(n^2) operation - this isn't quite true. The naive implementation is, but JS vendors haven't been naive since at least 2007.

[5] While not representative of the real web, these benchmarks helped us run our little JS performance war, right from slow-and-sad up to fast-and-awesome.

kqueue · on May 2, 2011

I don't think this is webkit specific. Chrome uses a different JS engine than other webkit based browsers.

Scaevolus · on May 2, 2011

When code runs slower in one JS implementation than another, why do people say that "real-world" code is slow?

Just because the JIT missed an optimization-- in this case, repeatedly appending to a string-- doesn't mean that it's a bottleneck for the JS that browser run into in the wild.

People don't often run jslint or jsunzip frequently, they run jquery and do lots of DOM accesses.

mjs · on May 2, 2011

splice() might be able to help out, though how much depends on whether the copying really needs to overlap or not. There's some benchmarks at:

http://jsperf.com/backreference-copy

I couldn't actually get Safari to complete; FF 4.01 is fastest by a long way.

gorset · on May 2, 2011

Appending to a immutable string in a loop is O(n^2). I'm surprised the author is surprised by this. Python and Java is the same.

Sure, it's possible for the vm to optimize this by either using ropes or convert to a buffer based on some kind of code analysis, but you shouldn't bet on it.

tlrobinson · on May 2, 2011

Another unzip implemenation https://github.com/tlrobinson/zipjs (should be easy to adapt for browser use)