Unusual speed boost: size matters

DannyBee · on Aug 9, 2013

the updateCachedWidth example probably gives the wrong idea to a lot of folks.

You would be just as well off making computeWidth const/pure/readonly/whatever

The compiler can even detect if it modifies anything and mark it for you. In fact, better compilers will compute mod/ref information and know that m_cachedWidth is not touched over that call.

However, LLVM's (which is what at least Apple is using) basic stateless alias analysis is not good enough to do this in this kind of case (there are some exceptions where it might be able to, none apply here)

This is actually a great example of how improving pointer and alias analysis in a compiler buys you gains, and not an example of "how you should modify your code", since you generally should not modify your code to work around temporary single-compiler deficiencies without very good reason.

Especially considering how quickly compiler releases are pushed by Apple/et al.

Someone · on Aug 9, 2013

I agree with the 'make things const' part, but I would expect m_cachedWidth to be mutable. I'm too lazy to check that now, but if so, that would not help here.

Even if it is not, I still think this is an example of "how you should modify your code". Reason? Doing

  temp = foo();
  temp *= bar();
  m_member = temp;

keeps your state consistent in case bar() throws. I would even use it if bar() is known not to throw, because you can't know what the future will bring, and because what you 'know' sometimes isn't true. Defensive programming, if easy as in this case, is a net benefit.

khuey · on Aug 9, 2013

Most low-level C++ projects (including Webkit and Gecko) don't use C++ exceptions.

RogerL · on Aug 9, 2013

True enough, but we strive for consistent practice - coding in one way for project X, another for project Y, is error prone and doesn't scale. That handy dandy defrobolizer you wrote for WebKit? You can't really reuse it in another project without rewriting it to be exception safe. You should vary from good practices only when it makes sense, not use them only if you can make a case for using them in this specific instance. Anyone can come along later, eyeball your code, and understand your intent and see the potential problems. Magically code for a very specific environment? Good luck. Also, good luck when your next project requires exceptions or what have you, because you will be deeply out of practice coding for them.

Obviously the above can be taken too far, but programming for const correctness, mutability, and safety in the face of exceptions should be in everyone's toolkit, and the default way they program in C++ (IMO, naturally).

BenjaminPoulain · on Aug 9, 2013

Keep in mind the example is just there to illustrate a type of problems. The real use cases were obviously more complicated.

Some comments: The compiler does not do magic. If you call a virtual function or call an address from an other library, there is simply no way for the compiler to know the side effects at compile time.

The keyword "const" does not help the compiler to optimize this kind of code. Const is mostly a tool for developers, you can ignore it with mutable, const_cast, etc.

In many cases, the kind of code of the example will be optimized properly: -If the code is simple enough and all the functions are in the same compilation unit, Clang will figure out the dependencies and optimize it properly. -If the code is simple enough and the code is spread across several compilation units, Clang's link-time optimizer does an amazing job at finding dependencies.

The point of this example was more to illustrate a point about code clarity. We should not hesitate to make code more explicit, use extra temporary variables, etc. Those extra lines help both the other hackers and the compiler.

DannyBee · on Aug 12, 2013

"Some comments: The compiler does not do magic. If you call a virtual function or call an address from an other library, there is simply no way for the compiler to know the side effects at compile time."

False. There are annotations on plenty of these libraries in header files, and you can determine virtual call side effects in plenty of cases through various forms of analysis.

"The keyword "const" does not help the compiler to optimize this kind of code. Const is mostly a tool for developers, you can ignore it with mutable, const_cast, etc."

Let's just say I understand what const and the various attributes I mentioned do, where they are useful, and how compilers use them. I disagree with your statement. Note that const where i mentioned it (IE computeWidth() const) would make the this pointer const, and const_casting that would not be legal, AFAIK.

Unless the member variable was marked as mutable, the compiler would know it is readonly.

"The point of this example was more to illustrate a point about code clarity. We should not hesitate to make code more explicit, use extra temporary variables, etc. Those extra lines help both the other hackers and the compiler."

This is not what was written in the article (though i'd agree with your statement about code clarity) , and as I showed, those extra lines do not help a good compiler.

minamea · on Aug 9, 2013

What a fantastic catch. This is also one of Meyer's Effective C++ tips: always use const when applicable.

brigade · on Aug 9, 2013

Wrong const. He's talking about the gcc function attribute extension, which most of the time is too strict, so recommending its general use isn't a great idea.

C++ const has no effect on optimization, since it can be casted away.

And being aware of aliasing issues is a good idea in general; nothing the compiler does can fix this in the general case (e.g. if computeWidth() is located in an external shared library it's basically impossible for the compiler to determine that it can't modify m_cachedWidth)

DannyBee · on Aug 9, 2013

"And being aware of aliasing issues is a good idea in general; nothing the compiler does can fix this in the general case (e.g. if computeWidth() is located in an external shared library it's basically impossible for the compiler to determine that it can't modify m_cachedWidth)"

This is of course, false, it would be more accurate to say "in most open source compilers, ....". Doing compile/dynamic link/ analyze based points-to analysis is at least 15 years old (papers published), and likely much older.

There is nothing that fundamentally stops the compiler from knowing things about external shared libs. At some point, the binary is being linked against those libs. You can store the necessary info in those libs, and then use it at link time

(This is in fact, one of the premises of LLVM and having a good, cheap, complete on-disk representation).

In any case, in this example,

1. you don't need the function attribute, just the normal C++ function modifier (It's too early to remember if that is the right C++ terminology) would do, since m_cachedWidth is clearly a member.

2. It's not in a shared library, clearly :)

brigade · on Aug 9, 2013

Link time for shared libraries is run-time, so unless you're advocating everything be JITed the compiler still cannot know in the general case whether an external function modifies random memory.

1. I'm relatively sure that const member functions are still allowed to write into pointers that could potentially lead back to the object.

2. No way to know that just from the source. No way to know it's a member function either - if it isn't then 1. is irrelevant.

haileys · on Aug 9, 2013

> C++ const has no effect on optimization, since it can be casted away.

Modifying an object that has had its const-ness casted away is undefined behaviour.

brigade · on Aug 9, 2013

Sorry, I should have been more specific.

C/C++ const has no effect on optimization, unless it's on a global/static object and the compiler can see the original declaration.

I'm not going to look it up in the C++ spec, but in C it's only undefined behavior if the original object was const, and it would make sense for C++ to be the same.

caf · on Aug 9, 2013

It doesn't matter whether it's static or automatic storage duration.

bmm6o · on Aug 9, 2013

But if you have a const reference to a non-const object, it's legal to use a const_cast to remove the const part. This is the more likely scenario, since most objects aren't const.

kinlan · on Aug 9, 2013

I may have missed it, but were there any stats about the actual performance gains? It often mentioned binary size etc but nothing about the impact it had.

invisible · on Aug 9, 2013

I, too, found it kind of odd. The bulk of the savings they achieved came from dropping support for the Chromium browser (since Chromium no longer relies on WebKit) and C++ 11. This was a really bizarre article because it had a lot of details about how to reduce the size of a C++ binary but not what that actually means to performance.

What did that 6% binary size reduction buy them in startup time? 10ms? 300ms?

What about memory consumption after startup?

Did tests indicate that the memory locality had improved CPU performance in noticeable manner?

BenjaminPoulain · on Aug 9, 2013

The problem is the performance gains were measured on a patch by patch basis over a period of about a year.

Nobody kept all the numbers and digging them backwards is more work than I have time for.

I am sorry I no longer have the actual numbers. To give an idea of the order of magnitude: -For startup speed, measuring the cold start of a new WebProcess, the size of the WebCore binary seems to have a direct relation to the time it takes to start the process. Cutting 5% of binary was giving about 5% reduction of startup time. -The inlining improvements gave a runtime boost of the order of a few (single digit) percents. It was usually improvements over many benchmarks instead of being specific to one part of WebCore. -Some changes had surprising results. I don't remember specifics but some changes (unrelated to initialization) improved startup time without changing runtime performance in any measurable way.

moconnor · on Aug 9, 2013

I cam here to say this. It's weird to talk about performance without actually showing any performance figures.

I'm all for removing old code, but if you're going to claim performance gains then why not measure those?

advisory5739f2 · on Aug 9, 2013

Exactly. You really should avoid doing performance optimizations without measurements that show improvement as well as provide some coverage against performance regressions.

Here's an example, Thrift (as used in Hector, a Cassandra client), had someone make a performance improvement:

https://issues.apache.org/jira/browse/THRIFT-959

The discussion has a lot of "shoulds", and one measurement of latency distributions, but no measurement of typical workloads or bulk inserts. Turns out, that caused at least a 30% performance regression:

https://issues.apache.org/jira/browse/THRIFT-1121

Qantourisc · on Aug 9, 2013

Or even better then double:

inline void updateCachedWidth() { m_cachedWidth = computeWidth() * deviceScaleFactor(); }

Ghee, was that line so hard to read ? No it's easier ! (Might have just been an example though.)

kalleboo · on Aug 9, 2013

I was also wondering why this wasn't done, it must have been an example that was reduced in size.

qprofyeh · on Aug 9, 2013

Came here to ask about this, found your answer. Thanks.

captainmuon · on Aug 9, 2013

Does anyone else find it crazy that the absolute size of WebCore is 38 MB? That's larger than the Linux kernel which includes a bunch of drivers.

If I understand Webkit's architecture correctly, that doesn't even include chrome (the visible UI, not Google's browser), JavaScriptCore, platform specific glue, and especially no auxilliary files (certificates, icons, the "broken image" sign, ...).

Sometimes I long for the good old days where a browser used to fit on a floppy disc (Opera).

I wonder if someone has done analysis on what features make browsers so complicated. I could imagine that 20% of the code could handle 80% of the features (as so often). You could have a 'lite' HTML subset that's targeted on rich documents, rather than rich client webapps. Something like that would be great for older computers or mobile computers.

Going a bit further, I know there is a lot of crazy stuff in WebKit... e.g. neural networks try to predict which links you'll click on, based on previous behavior, mouse movements, etc, and then the browser prefetches likely pages. There are runtimes for NaCL, pNaCL, Flash, there's a PDF browser (some of these are plugins), there is a VNC client, support for a bunch of different rendering models (layered HTML elements, Canvas, 3D), media support (codecs), support for webcams and microphones, and peer-to-peer communication, and much more. phew

I guess a large chunk of this stuff should be in the OS, so that other apps could benefit from it. And another large part of it should be in plugins, so the browser can benefit from all the codecs on the system, for example.

anon1385 · on Aug 9, 2013

>I wonder if someone has done analysis on what features make browsers so complicated.

There is a visualisation of the chrome binary here: http://neugierig.org/software/chromium/bloat/ I'm not sure how up to date it is now, but it gives a vague idea.

>I guess a large chunk of this stuff should be in the OS, so that other apps could benefit from it. And another large part of it should be in plugins, so the browser can benefit from all the codecs on the system, for example.

That is the case for Safari for example (using PDFKit for pdfs etc and system codecs for video and audio). Mozilla like to bundle everything with Firefox because they view Firefox as an OS itself, rather than just another application for viewing html documents. Most of the huge size and complexity in modern browsers is due to people trying to turn the browser into an operating system: https://en.wikipedia.org/wiki/Inner-platform_effect

extra88 · on Aug 9, 2013

I think a lot of what bulks up the rendering parts of the code is the handling of invalid web code. Not JavaScript, it's a programming language so throwing an error and stopping is expected behavior. But for HTML and CSS, browsers are expected to Do What I Mean, Not What I Wrote. If they only rendered valid code, they'd be a lot smaller.

But there are a lot of new APIs under the HTML5 umbrella. They're all accessed using JavaScript for some, a lot of the code will be in WebCore. Here's a bunch: http://www.netmagazine.com/features/developer-s-guide-html5-...

I'm sure Canvas and WebGL add a lot to WebCore.

dragonwriter · on Aug 9, 2013

> There are runtimes for NaCL, pNaCL, Flash, there's a PDF browser (some of these are plugins)

Really? While Chrome has all of those built in, the other WebKit browsers don't so why would it be in the WebKit source tree (especially after the mutual purges of the Blink/WebKit split.)

Also, all of those (except maybe the PDF viewer) in Chrome are plugins, and they are PPAPI (which was Chromium specific, not used by other WebKit browsers) not NPAPI plugins, and Flash and maybe the PDF viewer aren't bundled with Chromium (just Chrome) so its really weird that anything related to them would remain in WebKit.

captainmuon · on Aug 9, 2013

You're right, I mixed up WebKit and Chrome there.

pcwalton · on Aug 9, 2013

NaCl, PNaCl, Flash, and PDF are not part of WebKit (unless you mean the NPAPI support needed to make it work; even if you're just talking about NPAPI, PNaCl needs Pepper, which is part of Chromium, not WebKit).

smackfu · on Aug 9, 2013

Remember that the browser game is not just about features, but also about speed. And the fastest way to do something is often not the shortest code.

cLeEOGPw · on Aug 9, 2013

It is not mentioned in the article that writing "inline" does not automatically make the function inline. It only gives C++ compiler a hint that it might be worth inlining. Compiler can inline function even if it has no inline keyword, and can not inline even when the function has the keyword, if it decides inlining would be inefficient.

othermaciej · on Aug 9, 2013

In WebKit, we have macros to always inline or never inline for our target compilers, so we can force the issue if the compiler won't take the hint in a place that matters.

kgabis · on Aug 9, 2013

Why does that keyword even exist then?

olliej · on Aug 9, 2013

The primary effect of the inline keyword is to tell the compiler that a function is not to be exported from the current object. e.g.

foo.h:

int bar(int i) { return i; /* awesome function! */

foo.cpp: #include "foo.h" … bar(5) …

At this point everything is fine, but lets say there's also:

wiffle.cpp: #include "foo.h" … bar(6) …

now both foo.o and bar.o will contain a function named bar - because they picked up the definition in foo.h and c/c++ don't (technically) see any difference between a function that has been written inline, or a function that was included from a header.

By slapping the inline keyword on the function bar, the link flags for the function change so that the bar won't be exported from any object file that includes it, and so the name collision will no longer occur at link time.

There are a bunch of other benefits, mostly along the lines of "there function doesn't need to be exported therefore if it's not used i don't need to include it in the object"

doug363 · on Aug 9, 2013

It allows multiple (equivalent) definitions from different translation units to coexist in the same program. That's the only actual meaning attached to it by the standard.

RogerL · on Aug 9, 2013

Try putting this in an .h file that you include in several places:

void hi() { int x = 3; }

without the inline and you will get a complaint that hi() is multiply defined at link time. So, one use case.

nknighthb · on Aug 9, 2013

Because some compilers do make use of that hint in useful ways. They're just not required to, nor are they required to do so in any specific situation, much like the register and volatile keywords.

Someone · on Aug 9, 2013

volatile is not a hint, it's a requirement. You couldn't control hardware without it (compiler: "you're only writing this variable, never reading it. I'll skip those writes to speed things up". Programmer: "why isn't my program writing anything to this I/O port?")

nknighthb · on Aug 9, 2013

I suggest reading the C standard, where you'll find that, despite the many words used about volatile, it doesn't really guarantee you anything at all. I especially love this sentence:

> "What constitutes an access to an object that has volatile-qualified type is implementation-defined."

The behavior of I/O ports and the meaning of any attempt to access them is beyond the scope of the standards. The "requirements" that exist for volatile are simply an absurdity outside the (new in C11/C++11) threading context, since the implementation is not actually required to do anything useful.

unknownian · on Aug 9, 2013

Without a doubt, WebKit is one of the most interesting parts of Apple. A community of open source developers that accept contributions (I'm assuming) with a developer-focused open blog with tips on writing C++ - a language not even particularly widely used elsewhere in Apple.

pavlov · on Aug 9, 2013

C++ is widely used by the Apple OS teams, but they don't expose C++ APIs directly.

A lot of the plain-C APIs with a CoreFoundation style interface are actually C++ underneath. (No insider information necessary, this is easy to see in stack traces and process call stack samples.)

CoolGuySteve · on Aug 9, 2013

I concur. When I was there, they were pretty clear that everything I made belonged to Apple (which was total bullshit according to California law), that nothing we made would ever be open sourced due to patent issues, and that I was to never say anything about the company in public.

I wonder how their team could swing that culture without tripping over legal at every turn.

malkia · on Aug 9, 2013

Isn't WebKit based on KDE's KHTML?

huxley · on Aug 9, 2013

Back in 2001, Apple forked KHTML and used it to create WebKit. Some parts have been backported to KHTML but the projects diverged a decade ago.

bzbarsky · on Aug 9, 2013

> and that I was to never say anything about the company > in public.

This remains the case to a large extent. This makes working with Apple and Microsoft in standards groups a bit challenging at times, since they won't actually comment on whether they're even thinking about implementing a standard, or whether they would be willing to implement it as written, until they suddenly ship it.

And if they're _not_ shipping it you have no way to tell whether that's because they never will because of some fundamental issue they perceive or whether they're basically fine with the idea bu just haven't gotten around to finding resources to implement yet.

othermaciej · on Aug 9, 2013

In the case of Web standards in particular, you can usually see the checkins in our public source tree well before we ship it. But per policy we will rarely publicly commit to shipping something or not ahead of time.

bzbarsky · on Aug 10, 2013

Historically, the story with visibility on checkins to iOS Safari was not that great.

But yes, for desktop Safari usually one can get some idea by scouring checkin logs.

CoolGuySteve · on Aug 9, 2013

Agreed. Apple's involvement with BluRay in particular was pretty hilarious.

nailer · on Aug 9, 2013

If Apple could have developed their own proprietary rendering engine they would have. Jumping on KHTML meant they could get a working, OS X UI compliant browser when IE for OS X was no longer being maintained, even if the guts of that app had to be shared with others.

rsynnott · on Aug 9, 2013

While that's true for the LGPL components of WebKit, they were under no obligation to open source the BSD components, which they largely wrote.

nly · on Aug 9, 2013

> C++ - a language not even particularly widely used elsewhere in Apple.

Isn't this why Objective-C++ is a thing?

CoolGuySteve · on Aug 9, 2013

Interesting that this article doesn't mention profile guided optimization. In my experience, PGO is able to eliminate a lot of the performance problems associated with unnecessary inlines and rarely called functions eating up cache space.

The major downsides are that you can only optimize what the profiler can see and running the thing to make a build takes forever.

bzbarsky · on Aug 9, 2013

The major compilation targets for WebKit for Apple are MacOS and iOS, and compilation is with clang/clang++/llvm.

And clang's PGO support is not very good so far, so there isn't much to talk about...

nnq · on Aug 9, 2013

looking on example 1, I wonder why don't languages like C++ or D or Go just add a "pure" keyword for functions that don't modify the global environment or their arguments? this will help the optimizers a lot I imagine. and yeah, I get it that there still could be roundabout side-effects, it's not Haskell, but the compiler could just trust the programmer that he knows what he's doing when he sticks the "pure" keyword before a function definition.

fhd2 · on Aug 9, 2013

> looking on example 1, I wonder why don't languages like C++ or D or Go just add a "pure" keyword for functions that don't modify the global environment or their arguments?

The function in example 1 is modifying a member variable, and there is indeed a keyword that requires functions not to modify the class they operate on: const. It's very powerful, and by a long shot my favourite feature of C++.

That said, the function in question actually has to modify a member variable.

mjn · on Aug 9, 2013

The problem here isn't updateCachedWidth(), which indeed modifies a member variable, but deviceScaleFactor(), which doesn't. Since LLVM doesn't infer that deviceScaleFactor() leaves m_cachedWidth unmodified, it forces a reload of m_cachedWidth after the function call in case the value changed.

RogerL · on Aug 9, 2013

It was an example. Replace deviceScaleFactor with some other function that modifies, say, m_foo, and you have the same problem. And no, const+mutable is not an answer(in my example), because the function does in fact modify the class in a way important to the caller.

dmytrish · on Aug 9, 2013

There is[1] a `pure` attribute in GCC extensions to C and C++:

    int square (int) __attribute__ ((pure));

But I can't say how smart is the compiler in handling those attributes.

[1] http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

olov · on Aug 9, 2013

"Pure functions are functions which cannot access global or static, mutable state save through their arguments". http://dlang.org/function.html

jevinskie · on Aug 9, 2013

LLVM/Clang already has a pass that can infer the 'readonly' attribute.

http://llvm.org/docs/LangRef.html#function-attributes

https://github.com/llvm-mirror/llvm/blob/master/lib/Transfor...

dm2 · on Aug 9, 2013

While I'm very appreciative of the contributions Apple made to the web with WebKit, most of the recent innovations and upkeep has been thanks to Google.

What is the future of WebKit now that Blink has been introduced? Will Apple spend considerable resources keeping an open-source project at the bleeding-edge considering it doesn't really make them any money? Should Safari just be scrapped? It only accounts for < 4% of the market share of non-mobile browsers.

http://en.wikipedia.org/wiki/WebKit

http://en.wikipedia.org/wiki/Blink_(layout_engine)

Perceval · on Aug 9, 2013

I think Apple is pretty well set on mobile browsing as the future growth market. They don't seem to mind cannibalizing desktop sales for iPads, iPhones, and iPod Touches. In mobile browsing, Safari has ~60% market share. As long as mobile browsing keeps growing, Safari will remain an important asset for Apple.

http://www.netmarketshare.com/browser-market-share.aspx?qpri...

dm2 · on Aug 9, 2013

I'm not sure I trust that site, they also claim that IE had 56% of the Desktop market share during 2013, while all other reports claim to have very different data: http://www.netmarketshare.com/browser-market-share.aspx?qpri...

http://en.wikipedia.org/wiki/Usage_share_of_web_browsers

Here is StatCounter for Mobile for the last 6 months, the iPhone browser with 23%: http://gs.statcounter.com/#mobile_browser-ww-monthly-201207-...

extra88 · on Aug 9, 2013

You just posted the netshare link again instead of a StatCounter one. StatCounter's Mobile OS shows iOS at about 24% last month. http://gs.statcounter.com/#mobile_os-ww-monthly-201207-20130...

But I don't think they count iPads and other tablets as "mobile." Their Mobile Browser chart listed iPhone and iPod touch separately and didn't list iPads at all. Mobile Screen Resolutions doesn't include iPad resolutions. Browser Versions (Partially Combined) has a separate entry just for Safari iPad (which actually exceeds Safari 6.0 and Safari 5.1, combined).

othermaciej · on Aug 9, 2013

WebKit did fine before Google ever joined the project, and I'm sure it will do fine with them gone.

cmbaus · on Aug 9, 2013

I'm glad this is being discussed, because for many years inlining functions was considered a performance panacea in C++, with no regard to the size of the object code that was being generated, or the effects inlining was having on instruction cache performance.

The C++ community has brought itself all kinds of complexity and long compile times all in the name of performance which, in my mind, was always pretty suspect.

bhdz · on Aug 9, 2013

Sorry But I will recapture his thinking here:

    * Try to be explicit { rather than implicit }
    * Carefully consider inlining { large blocks of code }
    * Do not use static initializers { for infrequency or trivialities }

zurn · on Aug 9, 2013

There's a compiler optimization to automatically improve icache utilization by moving rarely executed code branches far out of line, so that they don't take space in the loaded cache lines when the straight line code executes. (This still leaves the wasted disk space and possibly RAM)

GCC docs sound like the trick would be -fprofile-use, -freorder-functions and -freorder-blocks-and-partition - after a representative profiling run.

A representative profiling run for a shipping binary is a problem of course, JITs win here. DEC had a dynamic binary reoptimization framework in the 90s called DYNAMO that could do it for AOT compiled binaries.

MaxGabriel · on Aug 9, 2013

I liked this article, but the last 3 graphs were really poor. By not listing the actual binary size in some denomination of bytes for the last 3 graphs (only using %s), I get less information from the charts. A much worse problem is the date formatting, which squishes all the numbers together so that they look like one big string. The first graph was much better.

masklinn · on Aug 9, 2013

Others have already covered a lot of things, so I'll just say that I'm sort-of impressed by:

> The second big drop is the result of removing features and cleaning code that came from the Chromium project.

In the graph, the second big drop is ~5% of the initial code size, removing the chromium code actually reduced binary size more than the inlining fixes did.

dschiptsov · on Aug 9, 2013

Try to be explicit in your coding to help the compiler understand what you are doing. - so obvious. Clarity is an evidence.)

caycep · on Aug 9, 2013

Is webkit written in C++ instead of objective C?

captainmuon · on Aug 9, 2013

Yes, Webkit is originally based on KDE's KHTML, and thus written in C++. It was heavily modified by Apple and later by Google. The core is platform independent C++, and then there are platform specific parts for the various environments it runs in (C/C++ for Gtk, C++ for Qt, ObjC for OSX, ...).

With the clang or gcc compilers, you can easily link ObjC and C++ together. To some extent you can even mix them in one file (ObjC++), though I don't have experience with that.