Hacker News new | past | comments | ask | show | jobs | submit login
24-core CPU and I can’t move my mouse (randomascii.wordpress.com)
1011 points by joebaf on July 10, 2017 | hide | past | favorite | 499 comments



Full disclosure: I work for Google on Chrome.

A Chrome build is truly a computational load to be reckoned with. Without the distributed build, a from-scratch build of Chrome will take at least 30 minutes on a Macbook Pro--maybe an hour(!). TBH I don't remember toughing out a full build without resorting to goma. Even on a hefty workstation, a full build is a go-for-lunch kind of interruption. It will absolutely own a machine.

How did we get here? Well, C++ and its stupid O(n^2) compilation complexity. As an application grows, the number of header files grows because, as any sane and far-thinking programmer would do, we split the complexity up over multiple header files, factor it into modules, and try to create encapsulation with getter/setters. However, to actually have the C++ compiler do inlining at compile time (LTO be damned), we have to put the definitions of inline functions into header files, which greatly increases their size and processing time. Moreover, because the C++ compiler needs to see full class definitions to, e.g., know the size of an object and its inheritance relationships, we have to put the main meat of every class definition into a header file! Don't even get me started on templates. Oh, and at the end of the day, the linker has to clean up the whole mess, discarding the vast majority of the compiler's output due to so many duplicated functions. And this blowup can be huge. A debug build of V8, which is just a small subsystem of Chrome, will generate about 1.4GB of .o files which link to a 75MB .so file and 1.2MB startup executable--that's a 18x blowup.

Ugh. I've worked with a lot of build systems over the years, including Google's internal build system open sourced as Bazel. While these systems have scaled C++ development far further than ever thought possible, and are remarkable in the engineering achievement therein, we just need to step back once in a while and ask ourselves:

Damn, are we doing this wrong?


Not to take away from your point, but in my experience the vast majority of C++ build pipelines even at major companies can still be improved. Few people enjoy 'improving the build', it often touches everything, and requires discipline to keep it working. Most of the projects I've worked on have been larger than Chrome, I've seen the compile time for BioShock Infinite go from 2 hours down to 15 minutes with serious work on header use, precompiled headers, and all the other tricks people use. Epic's build system is a pretty good example. There is even a older book, Large-scale C++ Design, that is specifically about this point.

Starting with a full build that initially takes hours and it shrinking to < 15-20 minutes and better seems pretty par for the course for truely large C++ projects. You don't get a fast build process for free, but if the team makes it a priority, alot can be done.

EDIT: Times mentioned were for a full build, often you rarely due a full build, incremental builds should be majority. Places that don't make incremental builds 100% reliable drive me crazy and waste so much developer time. This is common, but it's a lame excuse. Just do the work and fix it.


I worked on exactly this problem for Chrome! I agree with all your major points -- in particular, optimizing incremental builds is the most important thing for developer sanity.

Here's a post about what I did: http://neugierig.org/software/chromium/notes/2011/02/ninja.h...


Doom 3 actually used Scons across all the OSes (~2004). At the time, it was so nice to have a python build system. I sort of hoped it was the future, but it sort of died as it failed to scale. I've seen a few home-brewed python build systems work well, but typically we're back to CMake/Make


Check out meson; it seems to be the future for projects that were using CMake or autotools. It's certainly a joy to work with in comparison.


Were using? I quite enjoy CMake and find it fast and easy to use. What am I missing out on?


Meson is strongly-typed; it goes beyond just having a notion of "paths" and tracks what kind of object paths point to, and what kind of resource strings name. This is invaluable, because it means you get feedback when you accidentally pass an object file instead of a library name or any number of other confusions.

Personally, this meant the error messages I got were helpful enough that my first meson-built project was working after a half-hour of deciding to port it over despite using several system libraries and doing compile-time code generation.

Meson's language is not Turing-complete, so it's easy to analyze for errors. Unlike CMake and autotools, Meson's language looks like a real (pythonish) programming language, and it isn't string-oriented; dances of escaping, substitution, and unescaping are uncommon.

Compared to autotools or hand-rolled Makefiles, CMake is a step in the right direction; meson is a leap.


How happy have you been with Meson in complicated projects with multiple directories. Especially where things are complex and different options are used in different places. Make, in spite of all it's craziness would be a good tool if it any sane kind of support for this.

CMake tries hard to to do better, but then introduces its own layers of craziness. So it's fine as long as I am not doing anything unusual, but as soon as I need to understand what is going on, I find a dizzying array of barely working moving parts beneath me.


Just as a data point - Chrome has more code than the linux kernel - Would you say BioShock Infinite seriously has a larger code base than Chrome?

I think a lot of people don't estimate correctly just how huge Chrome is.


I would expect kernels to be quite small (#file, line-count) compared to major Applications like Unreal Games and Civilization 5. I've never worked on Chrome, but I can safely say the amount of source code in a few Unreal Games and Civilization5 dwarfs the drivers and OS code I've worked on. Take Unreal then add a team of developers adding onto it for multiple years thru multiple releases. Then add all the middleware (Havok, AudioEngines, NaturalMotion).

OS are much larger than kernel, I'd guess all the driver code exceeds the actual kernel.

People always think they code base is large, but having built most of the Call of Duties and many Unreal games, all the OS code I've worked on is trivial in size comparison. There is probably something bigger, but games seem bigger than many major apps in my experience.


For reference, the kernel has ~15 million LoC, and according to a not exactly reliable or verifiable infographic on reddit, BioShock infinite contains 631 miles, which would be between 3 and 10 million LoC.


Also, Linux kernel is C vs the mentioned rest C++.


Why is it so big?


It's an operating system (pretending to be a browser).


It is also an interpreter for a big number of convoluted (and a bigger still of non-convoluted) languages.


A lot of time has been spent on optimizing Chrome's build: - Ninja build system will perfectly parallelize the build without overloading resources (modulo this OS bug) - Meta-build system was recently completely replaced (gyp-> gn) to improve builds - Lots of work on clang-cl to allow compiling Chrome for Windows without using Microsoft's compiler - Distributed build system to further increase parallelism

So, lots of work has been done to deal with the build times. And probably not a lot of low hanging fruit to be found. But, still more work is being done. Support is being added for 'jumbo' builds (aka unity builds, where multiple translation units are #included into one) which is helping a bit with compile and link times.


Unity builds are a big win in Unreal Engine and basically required w/Unreal, the actual win is surprising large.

EDIT: I'm sure lots of work has been done, not trying to degrade that. Just sharing my experience on my projects, never worked with Chrome.


> Damn, are we doing this wrong?

Yes, and I mean "we" as in "this industry".

I just recently talked to someone whose Swift framework(s) were compiling at roughly 16 lines per second. Spurred on by Jonathan Blow's musings on compile times for Jai[1], I started tinkering with tcc a little. It compiled a generated 200KLOC file (lots of small functions) in around 200 ms.

Then there are Smalltalk and Lisp systems that are always up and pretty much only ever compile the current method.

We also used to have true separate compilation, but that appears to be falling out of favor.

Of course none of these are C++ and they also don't optimize as well etc. Yet, how much of the code really needs to be optimized that well? And how much of that code needs to be recompiled that much?

So we know how to solve this in principle, we just aren't putting the pieces together.

[1] https://www.youtube.com/watch?v=14zlJ98gJKA He mentioned that a decent size game should compile either instantly or in a couple of seconds


Are you serious? You want to make a product: a Web browser. What technology are you going to choose? The one that makes your browser fast but gives you more work, or the one that makes your browser slower but makes compilation less of a nuisance to you? It's mind boggling that some would openly say that, hey, who cares about performance that much, these compilation times bother me, the developer. On a browser of all things! Would you be okay with your browser being 2x or 3x slower?


False dilemma. You can have both. It's okay if a fully optimized release mode binary takes a bit longer to compile, but compiling a few million lines of code for a debug build shouldn't take more than a second or two.

Also consider the leverage factor. Improvements to the compiler benefit all users of the programming language, so it's worthwhile to invest in high quality compilers.


In what language can we have both?

Yes, we can use caching compilers (https://wiki.archlinux.org/index.php/ccache) to speed up builds with few changes. We can lower optimization levels (although that barely gives you an increase in compile speed compared to the runtime speed you lose and the fact that it makes your program do slightly different things).

There's no slider from "pessimum" to "optimum". You need to do wildly different things to optimize past this point for compile speed. Erlang hot-reload and at-runtime-code-gen from other langs come to mind. But that will almost definitely slow down your program because of the new infrastructure your code has to deal with.

I have observed that there can be a nice balance with Java and the auto-reloading tools that are available for it. But I am unaware of their limitations and how a web browser might trigger those limitations.


D is a language where you can have both. C++ architected correctly can get much closer though. Much larger compilation units are a start. After that, realizing that modularity comes from data formats and protocols means you can start to think about minimal coupling between pieces. I think dynamic libraries for development are very under utilized.


Besides those people already pointed (and Haskell - yep, the always slow GHC can do that), there's no reason C++ couldn't have both (except from large templates).


> In what language can we have both?

D has very fast compilation times compared with C++.

Rust is another option.


Rust doesn't compile very quickly at the moment. Helpfully, there's a live thread about the matter on r/rust [1]. Broadly speaking, it's about the same as C++. Some aspects are faster, some slower. Points worth noting (from that thread and elsewhere):

* Everything up to and including typechecking (and borrowchecking) takes a third to a half of the time, with lowering from there to a binary taking the rest of the time; that means (a) you can get a 2-3x speedup if you only need to check the code is compilable, and (b) overall speed isn't likely to improve a lot unless LLVM gets a lot faster.

* Rust doesn't currently do good incremental compilation, so there are potential big wins for day-to-day use there.

* There is a mad plan to do debug builds (unoptimised, fast, for minute-to-minute development) using a different compiler backend, Cretonne [2]. If that ever happens, it could be much, much faster.

[1] https://www.reddit.com/r/rust/comments/6m97hl/how_do_typical...

[2] https://internals.rust-lang.org/t/possible-alternative-compi...


> In what language can we have both?

Have you actively tried to find one?


> You can have both. It's okay if a fully optimized release mode binary takes a bit longer to compile, but compiling a few million lines of code for a debug build shouldn't take more than a second or two.

> You can have both.

Pretty bold claim, without proof.


It's not difficult to spit out machine code at a high pace. TCC is one example given by the grandparent, but it's certainly not the only fast compiler out there. Languages like Turbo Pascal were designed for rapid single-pass compilation, way back in the 80s.

A million lines of code represents an AST with a few million nodes in it, which compiles to a binary of a few megabytes. To do this we have computers with a dozen cores running at 4ghz each, 100GB of memory and blazingly fast SSD drives.

It's easy to forget, but computers themselves aren't slow. The software we write is just inefficient.


What you are saying is that it is possible in theory, but no one has done it yet. So in some future were someone rewrites all C++ compilers to not be so slow then we won't need to compromise.

Most C++ devs have to work with tools that currently exist and so we are stuck with what the compiler devs give us. Believe it or not C++ compiler devs are pretty smart people and have largely optimized it as much as possible without a language redesign.

That language redesign is in the works with modules, but the dust hasn't settled yet, so that is also a discussion for the future. In the mean time no other language delivers the performance C++ does right now. So if I want to ship product right now the very real dilemma is fast product with slow builds (and a bunch of tools for dealing with that) with C++ or use some other language for a faster compiler and a slower product.

Then there is Rust, but that is another whole can of worms and not in use in most shops yet (Just switching to something has a huge cost).


C++ is a language that's incredibly hard to compile efficiently and incrementally, because it suffers from header file explosion (among other things) and as you mentioned it has no working module system.

C compiles a _lot_ faster than C++, so that's always an option. And as other people have pointed out, you can get C++ code to compile much more quickly by being very disciplined about what features you use and how your code is laid out.

So I agree that if you want to ship something right now all your options have significant downsides. I think software engineers as an industry don't take tooling nearly as seriously as they should. Tools are performance amplifiers and we currently waste a staggering number of manhours working with poorly designed, unreliable, poorly documented and agonizingly slow tools.


Tell me, how do you do mutual recursion in a single-pass define-before-use compiler?


Single pass generally means one crack at each compilation unit. it's ok to keep a list of unresolved forward references and go back and inject (fix-up) the address once it's known. I mean, that would still count as a single pass. If they're not in the same file, the linker does it.


Easy, stack trampoline (or skyscraper a la cheney on the m.t.a)


Are we speaking in a general sense here? Because the root of this is headers needing to be compiled with (almost) every use in C++. We could get rid of that while maintaining the same functionality. It's not a very bold claim unless there's a requirement of "no significant language changes"


Then show me how, please.


Turbo Pascal.


Yes, I am totally serious.

Where did I write "who cares about performance"? And why do you think any of what I said is going to cost 2x-3x performance? Performance has been either a major part of or simply my entire job for most of my career, and I usually make projects I run into at least an order of magnitude faster. For example by switching a project from pure C to Objective-C. Or ditching SQLite despite the fact that it's super-optimized. Or by turning a 12+ machine distributed system into a single JAR running on a single box.

The Web browser and WWW were invented on a NeXT with Objective-C. It wasn't just a browser, but also an editor. In ~5KLOC written in a couple of months by a single person. NCSA Mosaic took a team of 5 a year and was 100KLOC in C++. No editing. So pure code-size is also a problem. And of course these days code size has a significant performance impact all by itself, but also 20x the code in C++ is going to take a significantly longer time to compile.

In terms of performance, the myth that you need to use something like C++ for the entire project is just that: a myth. First, the entire codebase doesn't need to have the same performance levels, a lot of code is pretty cold and won't have measurable impact on performance, especially if you have good/hi-perf components to interact with. See 97:3 and "The End of Optimizing Compilers". Or my "BookLightning" imposition program, which has its core imposition routine written in Objective-Smalltalk, probably one of the slowest languages currently in existence. Yet it beats Apple's CoreGraphics (written in C and heavily optimized) at the similar task of n-up printing by orders of magnitude.

Second, time lost waiting for the compiler is not "convenience", it is productivity. If you get done more quickly, you have more time to spend on optimizing the parts of the program that really matter, and thoughtful optimization tends to have a much larger impact on performance than thoughtless performance. The idea that this is purely a language thing is naive. See, for example, https://www.youtube.com/watch?v=kHG_zw75SjE

Third, you don't need to have C++ style compilers and features to have a language that has fast code, see for example Turbo Pascal mentioned in other comments. When TP came out, we had a Pascal compiler running on our PDP-11 that used something like 4-5 passes and took ages to compile code. TP was essentially instantaneous, so fast that our CS teacher just kept hitting that compile key just for the joy of watching it do its thing. It also produced really fast code.


Point taken, but my point was more about pragmatism. We know it's possible to have a fast compiler that generates fast code (indeed every time this is discussed someone brings up TP). But it's no use talking about a 30 year old compiler or about how the first WWW browser was superbly written. What I mean is I'm not talking about possible, I'm talking about feasible now. If I want to write a performance critical project right now what tool(s) should I use? The answer is most likely C++.


Furthermore, using a "slow" language for big parts of the code can make the whole project faster: size matters as an input into performance. A compact bytecode executed by a small interpreter thrashes the cache and memory hierarchy a lot less than tons and tons of ahead-of-time-compiled native code.

Use high-level interpreted languages to make your life easier, but also use them to make the page cache's life easier.


A good example of the architectures you are describing are Android and UWP.

Although the lower levels are written in a mix of C and C++, the OS Frameworks are explicitly designed for Java, C# and VB.NET.

Trying to use C or C++ for anything more than moving pixels or audio around is a world of pain.

The Android team even dropped the idea of using C++ APIs on Brillo and instead brought the Android stack, with ability to write user space drivers in Java (!).


> Would you be okay with your browser being 2x or 3x slower?

No, I will switch to firefox or something. I'm a user, I don't care how hard it is for developers I can about my workflow which is using a browser on various machines, some of which are very slow.


Tired developers make more mistakes. Rushed developers cut corners.

A faster compiler isn't just some frivolity. It's a power tool. A force multiplier.


Of the developers I've met recently, this wouldn't surprise me at all. The world revolves around them. Not the product. Not the user. Not the company. They are a "developer" or worse, an "engineer" and can do no wrong.


Jai, though unreleased and in development, is an language that explicitly aims to have extremely good runtime and compile-time performance.


Why not fast browser and fast compilation at the same time?


This might be theoretically possible, but I don't think it's been done yet. All the fast browsers so far take a while to build.


It's worthwhile to note that Blink (v. the whole of Chromium) has had its build time quintuple in the past four years or so, and its starting point was far slower than Presto (which may or may not qualify as "fast" in people's books, depending on what features you care about).


Show me how, then.


I didn't say I know how to do it. And I didn't say it's easy. Yes, currently we have fast browser and slow compilation. Just let's not assume situation can't be improved at all.


Bjarne's made the mistake of having C++ rely on C's linker model, so it meant no modules.

Now we can see it as a big mistake, but on those days probably it was one of the reasons why C++'s adoption took off.

Also while C lacked modules, most Algol and PL/I derived languages supported them since the late 60's.

Swift's case has the issue of mixing type inference with subtyping, so lots of time is spent there.

All in all, I really miss TP compile times and at least on Java/.NET, even with AOT compilers are close enough.

EDIT: some typos


Straightforward integration with existing tooling was not a "mistake", it was a design point. There were plenty of competing runtimes even in the 80's that were better than C's linker model. C++ succeeded because it didn't create that friction.


> "but on those days probably it was one of the reasons why C++'s adoption took off."

I know "Design and Evolution of C++" quite well, and have been a C++ user since Turbo C++ 1.0 for MS-DOS.


Sure, but it wasn't a "mistake". Stroustrup absolutely wanted a C with classes, and that meant tight integration with C toolchains. Symbol mangling was the clever idea invented to implement that very deliberate choice, not a fortuitous happenstance.


I'm pretty sure @pjmlp is using the word "mistake" to say "a decision that turned out to be bad". English is not my native language, but judging by the ways I've seen it used and what dictionary definitions I can find, it seems quite acceptable.

Stroustrup made the decision on purpose and consciously, but it turned out to have disastrous effects.


Yep, you got it right.


Actually name mangling predates C++.


OCaml uses C's linker model, and yet still manages to have working Modula-like modules (even with cross-module inlining). So there's an existence proof that it's possible to do it well.


Well, that comes with a few caveats:

1. OCaml generates additional information that it stores in .cmi/.cmx files.

2. OCaml does not allow for mutual dependencies between modules, even in the linking stage. Object files must be provided in topologically sorted order to the linker.

3. OCaml supports shared generics, which cuts down on the amount of code replication (at the expense of requiring additional boxing and tagged integers in order to have a uniform data representation).


All true except #3 (partially).

> 1. OCaml generates additional information that it stores in .cmi/.cmx files.

On this point I'd say that it could probably embed the cmx file as "NOTE" sections in the ELF object files, but likely they didn't do it that way because it's easier to make it work cross-platform. Every "pre-compiled header" system I've seen generates some kind of extra file of compiled data which you have to manage, so I don't think this is a roadblock.

> 2. OCaml does not allow for mutual dependencies between modules, even in the linking stage. Object files must be provided in topologically sorted order to the linker.

I believe this is to do with the language rather than to do with modules? For safety reasons, OCaml doesn't allow uninitialized data to exist.

Although (and I say this as someone who likes OCaml) it does sometimes produce contortions where you have to split a natural module in order to satisfy the dependency requirement. I've long said that OCaml needs a better system for hierarchical modules and hiding submodules (better than functors, which are obscure for most programmers).

> 3. [...] at the expense of requiring additional boxing and tagged integers [...]

I think this is fixed by OCaml GADTs: https://blogs.janestreet.com/why-gadts-matter-for-performanc... However this is a new feature and maybe not everyone is using it so #3 is still a fair point.


> I believe this is to do with the language rather than to do with modules?

Both, sort of. The problem is that mutually recursive modules are tricky. So, it's a limitation of the language, but one that is there for a reason.

> I think this is fixed by OCaml GADTs

No, GADTs solve a different problem. Essentially, normal ADTs lose type information (due to runtime polymorphism). GADTs give you compile time polymorphism, so the compiler can track which variant a given expression uses. Consider this:

  # type t = Int of int | String of string;;
  type t = Int of int | String of string
  # [ Int 1; String "x" ];;
  - : t list = [Int 1; String "x"]
  # type _ t = Int: int -> int t | String: string -> string t;;
  type _ t = Int : int -> int t | String : string -> string t
  # [ Int 1; String "x" ];;
  Error: This expression has type string t
         but an expression was expected of type int t
         Type string is not compatible with type int 
The problem with functors (and also type parameters) is the following. Assume that you have a functor such as:

  module F(S: sig type t val f: t -> t end) = struct ... end
To avoid code duplication, F has to pass arguments to S.f using the same stack layout, regardless of whether it's (say) a float, an int, or a list. This means that floats need to get boxed (so that they use the same memory layout) and integers have to be tagged (because the GC can't tell from the stack frame what the type of the value is).


Where can I learn more about it in a high level way instead of delving into source code?

I am curious how it is done in a portable way across all OSes, specially crude system linkers and OSes without POSIX semantics.

For example, I imagine this can be made via ELF sections, but not all OSes use ELF.


These links explain the extra files: https://ocaml.org/learn/tutorials/filenames.html https://realworldocaml.org/v1/en/html/the-compiler-frontend-... https://realworldocaml.org/v1/en/html/the-compiler-backend-b...

The cmx data could be converted to ELF note sections, but the whole thing has to work on Windows as well, so I guess they didn't want to depend on ELF.

In most projects, you can add this to your Makefile and forget about it:

    .mli.cmi:
            ocamlfind ocamlc $(OCAMLFLAGS) $(OCAMLPACKAGES) \
                -c $< -o $@
    .ml.cmo:
            ocamlfind ocamlc $(OCAMLFLAGS) $(OCAMLPACKAGES) \
                -c $< -o $@
    .ml.cmx:
            ocamlfind ocamlopt $(OCAMLFLAGS) $(OCAMLPACKAGES) \
                -c $< -o $@

    clean:
            rm -f *.cmi *.cmo *.cmx *.cma *.cmxa


Thanks for the hints.


My guess: The trick is not to have "template instantiation" but "module instantiation" (aka Functors in OCaml). Now you can instantiate only once. For example, if the compiler encounters "List<Foo>", it would instantiate into the "List$Foo.o" file or not if it already exists. Java works similarly, except the files have the extension "class" instead of "o".

More generally speaking: The trick must be to not generate identical instantiations multiple times. So you must have a way to check, if you already generated it. Of course, the devil is in the details (e.g. is equivalence on the syntactic level enough?).


You are focusing only on generics and missing all the module metadata and related module type information.

In module based languages, the symbol table is expected to be stored on the binary, either to be directly used by tools or to generate human readable formats (.mli).

So if one uses the system C linker, it means being constrained to the file format used by such linker.


> Bjarne's made the mistake of having C++ rely on C's linker model, so it meant no modules.

> Now we can see it as a big mistake, but on those days probably it was one of the reasons why C++'s adoption took off.

No mistake, just no choice - the original (1986 or so) C++ cfront compiled C++ to C which it fed to the C compiler and linker chain.


> "but on those days probably it was one of the reasons why C++'s adoption took off."

It is a mistake with 2017 eyes, because build times are now insupportable.

Of course it was the right decision in 1986 when trying to get adoption inside AT&T.

Also I have read "Design and Evolution of C++" back when it was published, and know C++ since Turbo C++ 1.0 for MS-DOS, so I grew with the language.

Which is also a reason why I still select it as member for my Java, .NET and C++ toolbox trio.

However the first ANSI C++ was approved in 1998, and many of us were expecting to get some kind of module support in C++0x.


Modules would be very hard to make work in C++, there's way too much entanglement at all levels. Of course the rot actually started with the ANSI C committee when it introduced typedef and broke context free parsing. C++ just compounds this kind of problem with template lookup stupidity. Its what you get when languages are designed by people with no understanding of basic computer science.


I used to be a C++ guy for 20 years, but won't go back unless absolutely necessary. I mean, it's tempting -- there are some cool new language features. I find that the abstractions are leaky, though, so you still have to understand all the hairy edge-cases. The language has gone insane. I'm 40 -- I want to get stuff done before I die, not play clever games to get around my language / system.


A bit older here.

C++ has been by next loved language after Turbo Pascal, since then I learned and used countless languages, but C++ was always on the "if you can only pick 5" kind of list.

Since 2006 I am mostly a Java/.NET languages guy, but still keep C++ on that list.

Mostly because I won't use C unless obliged to do so, and all languages intended to be "a better C++" still haven't proved themselves on the type of work we do, thus decreasing our productivity.

Because in spite of Swift, Java, .NET and JavaScript, C++ is the best supported option from OS vendors SDKs.

I dream of the day I could have an OpenJDK with AOT compilation to native code with support for value types, or a .NET Native that can target any OS instead of UWP apps.

Until then C++ it is, but only for those little things requiring low level systems code.


> I dream of the day I could have an OpenJDK with AOT compilation to native code with support for value types

Check out http://www.scala-native.org/en/latest/

Maybe it will have value types after java gets them


I am aware of it, but when one works in teams at customer sites, we are bound to what IT gives us, on the sanctioned images for externals.

The presentation at Scala days was interesting.


I've given up on C++ 5 years ago, after 10 years of getting paid to develop it. I find out the extra money that come from a C++ job doesn't cover the gray hairs of trying to tame the language so you don't shoot yourself in the foot a dozen times every time you call a method.


Do C++ developer have more salary?


Yes, because of the industries where the language is mostly used.

Fintech, HPC, aeronautics, robotics, infotainment,....

The only industry where devs are badly paid is games industry, but that is common to all languages.


Go is a good example of a language with fast compilation times. Of course, the optimizer needs improvements, but I believe they keep managing to make it faster as they improve the output rather than slower.


Go's compilation times are fast, but it also took a significant dive in 1.5, when they rewrote the compiler in Go.

They're slowly improving it to return to pre-1.5 performance, but last I checked, it wasn't there yet. The impact is insignificant on small projects, of course, but easily felt on larger (100Kloc+) ones.

While the recent optimizer improvements are great, my wish is for Go to switch to an architecture that uses LLVM as the backend, in order to leverage that project's considerable optimizer and code generator work. I don't know if this would be possible while at the same time retaining the current compilation speed, however.


The important point about Go in this case is that it's fundamentally more efficient because it has real modules and can do incremental compilation.

Sometimes people don't realize this because they always use `go build` which, as the result of a design flaw, discards the incremental objects. When you use `go install` (or `go build -i`) each subsequent build is super fast.


Huh, really? Why is that? Non-incremental builds shouldn't be needed at all, but besides that, just based on the names I'd expect `go build` to be the cheap one and `go install` to be the expensive one.


It's unfortunately not a well-known feature. The Go extension to VSCode was using "go build" (without "-i") for a long time, and if you're working on something big like Kubernetes, it's almost impossible to work with.

The annoying thing is that "go install" also installs binaries if you run it against a "main" package. I believe the only way to build incrementally for all cases without installing binaries is to use "-o /dev/null" in the main case.


I seriously hope they don't do it.

Go being bootstrapped is a good argument against people that don't belive it is suitable for systems programming.

Depending on C or C++ as implementation always gives arguments it could not have been done differently.

Also we should not turn our FOSS compilers into a LLVM monoculture.


Wasn't that Wirth's rule for Oberon and/or Pascal? Any optimization introduced has to have sufficient cost vs. benefit ratio that it makes the compiler faster compiling itself.


In Go's case, I think they're simply improving a lot of unrelated aspects of the compiler while adding new optimizations. I like the idea of that rule but there are definitely cases where I would want an optimization that could take a long time during compilation but provide an immense benefit later.


Ah yes, I remember that rule, I think it's incredibly clever. I believe he had another rule, language updates can only /strip/ features, so the core language will always get smaller and smaller.

Brilliant


Intuitively, that doesn't seem clever to me. (You're committing yourself to a less efficient, more clumsy language for the benefit of... what exactly?)

How does that law improve the language without falling into the trap mentioned elsewhere in this thread? (Optimizing for a pleasant "compile experience" at the cost of everything else)


Well, the rule is attributed to Niklaus Wirth, whose credits include Modula, Modula-2, Oberon, Oberon-2, and Oberon-7. The -* languages are extensions of their originals, so it seems his rule only applies within a single edition of the language. They can add new things, because they are new languages.

The justification, then, seems to be that if you legitimately need new features, then the language has failed and you should start over anyway. I think Python 3 is sort of an offshoot of this idea, except that many of the new features keep getting backported to 2.7 anyway.


Given that Oberon-7 is a subset of Oberon, reducing it to the essential of a type safe systems programming language, I wouldn't consider it an extension. :)

There is also Active Oberon and Component Pascal, but he wasn't directly involved.


My mistake, I'm not intimately familiar with it. From what I can tell, Modula-2 and Oberon-2 were both extensions, though, to be used as successors to the previous language.


This whole sub-thread tangent-ed into C++ is bad because it's compilation causes this problem. Problems with C++ builds are well understood...

This specific issue prob impact other batch workloads with lots of small tasks (processes). There's no reason this should be happening on a 24 core machine.


I'm not sure if the approach of Lisp and Jai (and also D) is the best one. They all have practically arbitrary code execution at compile time, so they can be arbitrarily slow to compile. In C++ template-programming is so hard that few people do it, but in those languages it is just as easy as normal code.

With mainstream languages, code generation is done by the build system which can avoid repetition. Caching generated code feels like a good idea to me. Doing it with compile time execution is (unnecessarily?) hard.


It seems that, in Jai, arbitrary code execution at compile time is exactly the point. The build file itself is just another Jai program. It's easy to wring one's hands about a novice programmer making the computer do unnecessary work, but Blow's philosophy seems to be to trust the programmer to understand the code they're writing. And if they don't, they should probably be using another language.


Compiling Common Lisp code isn't too slow. There are some quick compilers like Clozure CL.

What slows some Common Lisp native code compilers down is more advanced optimization: type inference, type propagation etc, lots of optimization rules, style checking, code inlining, etc.


Could it be that we are doing the web browser wrong?

I think large parts of chrome actually belong into the OS. The network parts, the drawing library (skia), the crypto implementation, the window and tab management, and so on.

The javascript engine could be factored out, too, so more apps could benefit from it (without bundling a whole frickin Chromium).

Video and audio would be deferred to DirectShow, Quartz, VLC, Mplayer, ...

Ideally, what remains is just a layout engine and some glue code for the UI. It's the monolithic kernel vs microkernel debate all over again.

Plugins have a bad rep in the context of browsers, but I think this "microkernel browser" where everything is a plugin or OS library can be potentially more secure than the current state, since we can wall off the components between interfaces much better.

I also think it would be much "freer". Browsers like Firefox and Chrome are open-source, but they are free in license only. I can't realistically go ahead and make my own browser. The whole thing is so complex that you have to be Google or Apple or Microsoft to do that. The best I could achive is a reskin of WebKit. I think that would be different with a more modular browser.


> I think large parts of chrome actually belong into the OS. The network parts, the drawing library (skia), the crypto implementation, the window and tab management, and so on.

The problem is cross-platform support. Depending on the OS is obvious if every OS supported the required features.


But all of that is already cross platform.

When I say e.g. skia should be part of the OS, I don't neccessarily mean MS should ship it and update it yearly. I mean Google should still ship and auto update it, but also go though the effort of documenting it, maintaining strict backwards compatible APIs, and letting other programs consume it. I don't care who the actual vendor is. I know that's a lot to ask for, but OTOH it is insane to statically link that kind of code. Especially if you have multiple electron apps, that would work fine with a shared runtime.


So create more work for themselves that they don't actually have to do. I'm sure some developers might be willing to do some of that in their spare time, but I think "a lot to ask for" is a serous understatement.


I'd be wary of moving the crypto into the OS, because OS upgrades are few and far between. Browsers are easier to upgrade, as we know from the rather aggressive auto-upgrade cycles of Chrome and Firefox, whereas if you have bad and/or now-known-to-be-insecure crypto in the OS, well, you're stuck with it for the foreseeable future. People are still running Windows XP.


Why can't a library be part of the OS and updated frequently? If it is critical to update, why shouldn't all apps benefit from it? Why can MS update code in Edge frequently, but shouldn't be able to update a .dll as frequently?

Ideally, I'd want critical code (encryption, code signing, bootloaders, kernels, runtimes) to be from a trusted vendor, and preferably simple and open source. I trust the MS, Apple, Google of 2017 not to completely fuck it up. (We already trust them as browser vendors.)

I don't care if the keep calc.exe stable for 10 years, but I expect them to patch crypto.dll immediately. You could do that stealthily, outside of major updates, as it has no user facing changes.

The benefit of this model is that it allows third party apps from small vendors to profit from the up to date security that only the tech Giants can provide.

The downside is of course that it is quite hard to maintain perfect backwards compatibility while pushing updates, but if the components and APIs are small enough I think it is possible.


No, the web browser is all-righty. It has become the universal VM, so it's only natural that it is as big and slow to compile as an OS.


When elinks is compiled with javascript support you must provide an external javascript library.

I really like elinks, it's a shame I can't use it to view blackboard and other sites I have to use...


You are going backward with the 'Internet Explorer' like buy-in. That approach would only be faster in one segment; everything else will be slower.

There is no point in speeding up the raw compile time by a couple of minutes if you are increasing the development and testing time by a couple of weeks.


What does IE have to do with this? And I don't think this will increase development time. If anything, it will allow people to innovate faster, since it is easier to contribute.


Well there's the NetSurf project: http://www.netsurf-browser.org


> Damn, are we doing this wrong?

Yes: Not isolating different modules sufficiently to allow you to avoid including most headers when compiling most modules.

Patterns to do this in C++ has been well understood for two decades:

Strict separation of concerns coupled with facades at the boundaries that let all the implementation details of the modules remain hidden.

Yes, it has a cost: You're incurring extra call overhead across module boundaries, and lose in-lining across module boundaries, so you need to choose how you separate your code carefully. But the end-result is so much more pleasant to work with.


Unfortunately that's not feasible for highly performance-sensitive projects like browsers or games.


It absolutely is. If most of your time is spent on calls traversing large portions of a code-base that size, then you have a far bigger problem in that you'll be blowing your cache all the time. Fix that problem, and you're halfway there to creating better separated modules that can be encapsulated the way I described.


Can you point to an example where this has been done?


A 21-year old book is dedicated to this subject:

https://www.amazon.com/Large-Scale-Software-Design-John-Lako...


> Unfortunately that's not feasible for highly performance-sensitive projects like browsers or games.

Bullshit. Code needs to be compiled, but it isn't required to build everything from scratch whenever someone touches a source file.

Additionally, not all code is located in any hot path.


Code needs to be compiled, but it isn't required to build everything from scratch whenever someone touches a source file.

Except for C++, where a tiny change in a single object will require recompiling every file that transitively includes that object's header.


Depends on the change and the though given to header file dependencies.

PIMPL, forward declarations, pre-compiled headers, binary libraries are all tools to reduce such dependencies.


I think that only re-linking should be required if you only change source files and not headers. Headers implicitly convey are the sizes, inheritance and other stuff that dependencies need for compilation.

I suppose you could have some extra aggressive optimizations that force inlining, but I haven't seen a need for this, even in game dev.


The stated numbers are for full rebuilds.


This would be the end of the discussion if it weren't for stupidity like this: https://groups.google.com/a/chromium.org/forum/#!msg/chromiu...

Generally, I find when people crow about performance, the product they're talking about usually has some questionable architectural/design/implentation decisions that dominate the performance issues so I have to do my best not to roll my eyes.

Yes, you can write performant C++ using well-understood compiler firewalls, interfaces, etc that reduce your compile time.


I once cut 30% of page generation times for a commercial CMS in half a day by just skimming through their output generation code and changing std::string method invocations to get rid of unnecessary temporaries.

People very rarely has any clue about this at all.


Agreed - though I'd say that performance-sensitivity is more a function of the number of users than the application domain.

If a few hundred or few thousand people each have to build Chrome from scratch a couple times, and making their compilation process much slower makes each of a trillion pageviews a millisecond faster...the break-even point seems to be about a 27 hour build time sacrifice.


Chrome has a staggering amount of C++ code. It's not all heavily hand-optimized. Probably very little of it is optimized at all, or needs to be.

They're relying on the compiler working its magic to make non-hand-optimized code run pretty fast. That's fine, but it requires you to expose a lot of stuff in headers and that slows down compilation.

I'm fairly sure, like other commenters, that they could speed up compilation a lot and impact performance very little by carefully modularizing their header files. But that's a really big job.


The 90/10 rule still holds. Sometimes it's even more skewed.


Even Knuth talked about it being a 97:3 rule[1] and according to some it's gotten more skewed since then[2]

[1] http://sbel.wisc.edu/Courses/ME964/Literature/knuthProgrammi...

[2] http://blog.cr.yp.to/20150314-optimizing.html


While what you're saying might be true, I can't help but think about Pascal units and say: "It didn't have to be this hard! We solved this problem 40 years ago!"


The heir to the Pascal/Delphi kingdom seems to be Nim[0], though it takes its syntax from Python.

Compilation is impressively quick, even though it goes through C.

[0] https://nim-lang.org/


Which is a language I absolutely adore, and am building a homomorphic encryption based product leveraging Hyperledger on a rather obscure 160 hardware thread POWER8 server; it can definitely work for real production tasks today, even if some parts are rough (and hell, C++ can be rough too). Parallel compilation on this machine is stupidly quick :)


Any pointers to what power hardware you are running?


You don't need pointers - just follow the fan noise from wherever you are.


Sorry, it's the same in Delphi (Pascal successor). The compilation time goes up exponentially as number of units goes up. Compilation of 2.000 files (750.000 lines) takes about 20 minutes.


With the caveat that one doesn't need to do "make world" every time a few files change.


Yes there are plenty of other solutions if you use other languages, but I decided to constrain myself to how you'd address it in C++


And then Modula-2 came along and... well, it was mostly the same. But the compactness of Pascal output left fond memories.


Wasn't this one of the reasons why the Go project started at Google, not necessarily for Chrome, but because C++ project compile speeds were horrendous for some internal projects.

Not saying Chrome could and should just switch to Go, it definitely would not be the right fit! But it's interesting that these sorts of builds still occur and consume a lot of developer's time.


Yes, C++ build times were a significant factor in Go's creation. Rob Pike describes it here:

https://talks.golang.org/2012/splash.article

He covers all the same points about header files and how Go addresses those issues.


> Damn, are we doing this wrong?

Give me a tool that's: (i) as fast, (ii) as mature and well supported, (iii) as powerful as C++ and I will switch in a heartbeat. But until there is such an alternative it's futile to complain about the shortcomings of C++, because if you want the powerful, zero-cost abstractions, the mountains of support, and access to billions of existing lines of code, you pretty much have nowhere else to go.


So: a C++ competitor shall beat it in every dimension you've choosen, until then it's futile to complain about the shortcomings of C++. This doesn't sounds very logical, only like a case of sampling bias.


Unfortunately some projects have an "everything" requirement. That is to say the software must be fast, and written in a way that interface close to the metal. We need to do a lot of parallel processing. Now it's C++ or Rust. Then we need a GUI, and CUDA so we're down to C++. That's why project uses C++.


It's the same old C++ rhetoric "only C++ can do it". Before it only C could do it, and before it only assembly could do it.

Reasoning starting from conclusions to lead to initial constraints is backwards reasoning. For example you don't talk about maintenance or productivity, and yet you end up making a choice without factoring this. Chances are, the choice in most codebases is made because of existing code and culture, not because of rational reasons.


No its in there. For example, a similar OSS software called MicroManager is a veritable cluster duck with half the code base dedicated to interfacing between C++ and Java. It doesn't hit the performance spec. The real problem I've had with C++ is finding devs, typically senior C++ software engineer at $130k vs junior Python dev at $70k.

But from the engineering side it's the only "everything" language. (There aren't any good GUI kits for C, and NVCC is C++)


Well, it's true that they aren't good UI toolkits in D either (let's say as good as Qt). For me it works as the "everything" language, I also wrote CUDA bindings once (obviously that wouldn't work with mixed host/gpu code which I hope no one really use).


> But until there is such an alternative it's futile to complain about the shortcomings of C++

Then how does C++ improve?


Slowly but rather steadily like it has been doing so far.

We are likely getting modules (and reflection) with the next iteration (C++20), which -- if it moves like the last two versions -- will be almost completed and already supported by GCC, VS and Clang in two years. Clang and VS2015 even support modules experimentally already.


> Then how does C++ improve?

It keeps adopting D features.


These numbers don't seem abnormal; I recall building Safari many years ago, and having multi-GB of intermediate products shrink down to a 30MB executable (plus a few hundred MB of debug symbol).

So, I have a thought: if we're spending all this time to compile functions (particularly template functions) that are just thrown away later, why are we performing all our optimization passes up-front? Surely, optimization passes in a project like Chrome must eat up a lot of compilation cycles, and if that's literally wasted, why do it in the first place? Can we have a prelink step where we figure out which symbols will eventually make it, and feed that backwards into the compiler?

Maybe a more efficient general approach might be to simply have the optimizer be a thing that runs after the linker, so that the front-end compiler just tries to translate C++ into some intermediate representation as fast as possible. The linker can do the LTO thing, then split the output into multiple chunks for parallelization, and finally merge the optimized chunks back together. With LLVM, it feels like the bitcode makes this a possible compilation approach...


> These numbers don't seem abnormal;

Hmm...not "abnormal" in the sense that we've gotten used to it: yes. Heck, last I heard building OneNote for Mac takes about half a day on a powerful MacPro.

But I'd say definitely abnormal in terms of how things should be.


Doesn't link-time code generation already exist (at least on MSVC, I think). It makes the linking step so much more expensive, though, which sucks for incremental builds.


Ah, yes, I knew I had to be missing something! Needing to support incremental builds are what makes C++ compilation so frustrating. I wonder why this is so difficult though - dependency tracking should be a thing that can carry through the link stage. Just track what files a given function depend on, and only recompile/recodegen functions that have changed. Of course, it still sucks massively if you change a header file, but that's why you separate the header from the implementation of the methods :)

/LTCG also doesn't seem to parallelize well - last I checked it still ran all the codegen on one core. Maybe that's different now?


LTCG as of VS2015 is an incremental process when it can be, which did wonders for build times.


I think that, especially in templated code, you only know which part of code can be thrown away because of the optimization.


Yes, you're doing it wrong.

The trick is you've got to reduce your "saturation level" of #includes in header files, by preferring forward declarations over #includes, and using the PIMPL pattern to move your classes' implementations into isolated files, so that transitive dependencies of dependencies don't all get recursively #included in.

When it comes to templates, one has to be very aggressive in asking "Does this (sub-part) really have to be in template code, or can we factor this code out?" Any time I write my own template classes, I separate things between a base class that is not a template, and make the template class derived from it. Any computation which does not explicitly depend on the type parameter, or which can be implemented by the non-template code if the template just overrides a few protected virtual functions to carry out the details, gets moved to the non-template base class.

If your problem is not with template classes which you have written, but with templates from a library, consider that in most (all?) cases there is still some "root" location (in your code) which is binding these templates to these user-types. This root location will either itself be a (user-written) template class, or it is an ordinary class which "knows" both the template and the bound-type(s). Both of these cases can be dealt with either by separating it into non-template base and derived template, or using the PIMPL idiom, or both.

The general principle is that what you allow in your headers should be the lower bound of the information needed to specify the system. Unfortunately this takes active work and vigilance to maintain, and a C++ programmer is not going to understand the need for it until they reach the point of 30 minute builds and 1.4GB's of .o files.


I've found that I generally regret doing this kind of thing to the extent that you need to do it to make a meaningful difference. The problem is that all this stuff comes at a cost -- my source code is no longer structured in a semantically meaningful way.

The SICP quote comes to mind here: "Programs must be written for people to read, and only incidentally for machines to execute." I greatly prefer to have my code organized in a sensible way. I want to know that "here is where the FooWidget code is".

It's not the end of the world, and people can adjust, but part of what I hate about working on just about anyone's Java code is this constant mental assault of "no, you need to be in the FooWidgetFactoryImpl file to find that code". Just let me have "customer.cpp" or whatever, and I'll live with grabbing coffee during the build.

Admittedly, I don't work on truly large applications. I can imagine priorities change when builds take two hours instead of the 15 minutes I might have to live with.


> Yes, you're doing it wrong.

Your comment is great but I have spent enough time working on Chromium to know that they have people working on the build who know all of this stuff and much more. They understand the build from the top to the bottom of the toolchain stack. (@evmar used to be one of these people and he actually commented in this thread at https://news.ycombinator.com/item?id=14736611.) I am sure your parent commenter is a great developer but I get the impression he/she is not one of the Chromium build people.


It'd be crazy to think about the energy spent over time on building just chrome - turn that into a carbon footprint and itd be shocking! I always try to write code for humans and not over-engineer or prematurely optimize code, but sometimes I wonder about a section of code over the long term - that bit of javascript thats going to be run on lots of phones and other devices over years or decades - and how much electricity its going to cost more just because I used `map` instead of a `for` loop... I can't imagine working with a build process that takes that long to run.


Aspects of Unix philosophy where "your done when there's nothing left to remove" need some revival.

http://www.catb.org/esr/writings/taoup/html/ch04s02.html#com...

http://suckless.org


> a from-scratch build of Chrome will take at least 30 minutes on a

That all.

I once work for a rather big accounting software company and the full build of their accounting product took about 4 to 5 hours to complete on the build server.

We ran the build at the close of business every day and the build engineer had to log remotely just to make sure the build worked, other wise the QA team would have nothing to test in the morning.

It too was written using the C++ language.


C was ok. Not good, but ok.

Then we tried to attach the OO paradigm to it and we got the monstrosity that is C++ (and as a consequence of that - Java - which has fixed some issues but still suffers)

And don't get me started on templates

I'm so glad that paradigm is starting to die out and hopefully Rust, Go and others will take over (their object model still doesn't get around my head but it will eventually)


C has the same problem because it relies on includes and does not have modules.


Yes, however that was acceptable at the scale C was used and given its origins. Not good, but acceptable

Pascal could have been a better choice (sigh)

C without typedefs also compiles faster

C++ is like plugging an engine to a skateboard to make it run faster


Sure, but it still compiles much faster than C++, and that's due to language differences.


And luckily Java is now also trying to fix modules. Maybe, someday, those concepts will arrive over in C++ land, too.


They've been working on modules for C++ for a long time. It's coming.


The modules proposal for C++ has been around a long time. We need it now more than ever. Projects are getting bigger and the C++ culture is evolving from runtime polymorphism to compile time polymorphism. Text inclusion is just not good enough.


There are a million responses to your comment, but unless I missed something, no one seems to be redirecting you back to the actual problem at hand and are recommending all sorts of generic ways to reduce the time required to compile a large C++ project... but the issue here was not the generic one: the entire thing that made this article interesting is that his CPU was not being used at all, so this has nothing to do with "a Chrome build is truly a computational load to be reckoned with". He even goes out of his way to show his CPU load graphs so we can see multiple seconds in a row where his computer is 98% idle and yet he still can't even move his mouse.

Your comment, and essentially every other single one on this entire thread thereby makes me wonder if anyone on this subthread read the article :/.

OK, I decided to search for the word "process", and found one person responding to you who did read the article, and depressingly only a handful of people even responding to the top-level article who apparently read the article. This entire post is such a great example of "the problem with this kind of discussion forum" :/.

https://news.ycombinator.com/item?id=14735977

REGARDLESS...

What was described in this article wasn't "Chrome's build is too slow", it was "there is a weird issue in Windows 10 (which apparently wasn't even a problem with Windows 7, and so we could easily argue is a regression) where process destruction takes a global lock on something that is seemingly shared with basic things like UI message passing". The fact that he was running a Chrome build to demonstrate how this manages to occasionally more than decimate the processing power of his computer was just a random example that this user ran into: it could have been any task doing anything that involved spawning a lot of processes, and the story would have been exactly the same.

Now, that said, if you want to redirect this to "what can the Chrome team do to mitigate this issue", and you want the answer to not be "please please lean on Microsoft to do something about this weird lock regression in Windows 10 so as to improve the process spawn parallelism for every project, not just compiling Chrome"... well, "sure", we can say you are "doing this wrong", and it is arguably even a "trivial fix"!

Right now, the C++ compiler pipeline tends to spawn at least one (if not more than one) process per translation unit. If gcc or clang (I'm not sure which one would be easier for this; I'm going to be honest and say "probably clang" even though it feels like a punch in the gut) were to be refactored into a build server and then the build manager (make or cmake or ninja or whatever it is Google decided to write this week) connected to a pool of those to queue builds, you would work around this issue and apparently get a noticeable speed up in Chrome compiles on Windows 10, due to the existence of this process destruction lock.

One could even imagine ninja just building clang into itself and then running clang directly in-process on a large number of threads (rather than a large number of processes), and so there would only be a single giant process that did the entire build, end-to-end. That would probably bring a bunch of other performance advantages to bear as well, and is probably a weekend-long project to get to a proof-of-concept stage for an intern, come to think of it... you should get on it! ;P


You're right. I didn't read the whole article. I skimmed it. But I did actually pick up on the locking problem and see the graphs with huge amounts of idle time. I knew this comment was only marginally related to the actual article. So, :-/

However I suspect that as soon as that lock regression in Windows is fixed, that monster CPU load is coming back home to roost, and the workstation is going to be just as dead as my 64 core Linux workstation has been when I've actually '-j 500' without gomacc up and running correctly.

So, by all means, Microsoft should fix this lock regression.

But there's this...elephant...in...this room here.


> Your comment, and essentially every other single one on this entire thread thereby makes me wonder if anyone on this subthread read the article :/.

Just let people talk about what they want to talk about, maybe? The main problem in the article is interesting but far less actionable than the overall situation of slow compilation.

Do you want things like rampant speculation and insulting windows 10? Do you expect everyone to pull out kernel debuggers to be able to make directly relevant comments? It's okay to talk about a related issue. Concluding that they didn't read the article is kind of insulting.


However, they have just merged a feature called jumbo which squashes lots of compilation units together. The guy who developed it (Daniel Bratell from Opera) reckons an improvement of 70%-95% in per-file compilation time[1]. But, it's only for the core of Blink right now.

[1] https://groups.google.com/a/chromium.org/forum/#!searchin/ch...


> TBH I don't remember toughing out a full build without resorting to goma.

I have! Every time you guys bump a snapshot, my Gentoo boxes whirl away and heat my house, compiling a new version from scratch. On an octocore Skylake Xeon laptop, this takes 2 hours 48 minutes.


Makes me wonder about a distributed compilation. If everyone had a monster workstation but not everyone compiles at the same time, a theoretical networked compiler (which may already exist as I haven't really checked) could spread the files out among available workstations and bring things back together near the end.

As the largest issue is the throwing away of duplicate work, I'd see it as a kind of reverse binary tree: machines working on files that depend on others talk together, then when finished send the condensed work up the chain (and signal their availability for the next workload chunk or phase) until everything collapses down back to the original machine.


https://wiki.archlinux.org/index.php/Distcc

I used that 10+ years ago on Gentoo, and never saw anyone using it since. Don't know how often is used now days.


I used it ~3 years ago on Funtoo; there's definitely still people that use it!



The best solution today for distributed build is: http://www.fastbuild.org/docs/home.html

It is faster than IncrediBuild, even faster than SN-DBS and has multi-platform support, the only problem is that it requires its own build script.


Ok, I see distcc helping people with computer labs and server farms, but I thought mysterydip wanted to help the Gentoo community and other such. I.e. peer-to-peer build sharing.

So what kind of cryptographic guarantees would you need for that? And if you can only verify the build results by trusting signatures from upon high, then what is the point? Perhaps those builds could be turned into work in a proof-of-work blockchain. Do compilers contain any hard-to-do, but easy to verify steps?

Whole shelves full of useless PhDs thesis are just waiting to be written on this topic.


Isn't this what IncrediBuild does on Windows?


It is. Although, as far as I'm informed (having last used it in 2013 or so), even with that approach, you'll likely have problems linking all the output in the end if your project is large enough.

Of course, thats another separate problem to begin with. I still remember dabbling with D and vibe.d and replacing the default GNU linker with ld.gold because over 90% of the build time was due to the linker...


"Concatenated" builds seem to be the best band-aid for this. Concatenate as many source files as possible before compiling, #include a bunch of cpp files into one big file. It makes tracking down errors slightly harder and macros a bit more risky, but greatly improves the overall build efficiency.


And trashes incremental compilation time, which is what really matters.


It's a balance, but on many projects the overhead of the headers themselves is so large that concatenating a few .cpp files together doesn't increase incremental compilation time significantly over simply building each .cpp file in isolation.


Incremental compilation opportunities are rarer than we'd like on C++: as soon as you add a new class member or function, that's potentially a huge recompile.


You can have all the files you're working on not concatenated.


That's painfully manual.


It doesn't need to be. I think it shouldn't be hard to make a build script that re-makes the bundles except for files modified in the last hour.


Also called unity build sometimes.


> How did we get here? Well, C++ and its stupid O(n^2) compilation complexity [...]

Wasn't large compilation time a driving force behind coming up with Go? Is a garbage-collected language not suitable for a web browser? I am just curious because I absolutely love writing Go


GC performance is a trade between throughput and pause time. There are tricks you could play with each tab in its own heap which is entirely discarded on page change, but I think it would take something expensive like Azure to really work.


> GC performance is a trade between throughput and pause time.

Go has sub-millisecond GC pauses, and even at that minimizes the need to do stop-the-world pauses (previous HN discussion https://news.ycombinator.com/item?id=12821586) I think it would be a very interesting exercise to give a crack at it. If anyone is interested, let me know.


> Go has sub-millisecond GC pauses

At the cost of throughput:

> Go optimises for pause times as the expense of throughput to such an extent that it seems willing to slow down your program by almost any amount in order to get even just slightly faster pauses. - https://blog.plan99.net/modern-garbage-collection-911ef4f8bd...


> Wasn't large compilation time driving forces behind writing Go

yes, that was one of the tenants.

> Is a garbage-collected language not suitable for a web browser?

In theory its fine; but there's a lot of historical baggage that comes along with garbage collection and the majority of languages that support it (e.g. almost no value types).

Golang fairs pretty well latency wise for a GC'd language. I'd be curious for someone with more experience than me to talk through a deep dive of instances where go's latency/throughput characteristics are and are not good enough for specific applications.


We have Java to blame for it.

With the exception of Smalltalk and its derivatives, all the GC enabled languages that came before Java had value types, even Lisp.


> Damn, are we doing this wrong?

Not using modules? Yeah I know C++'s made the mistake of not using them since the beginning and it is a long road until they are here (202x ?).

However Google was showing their modules work at CppCon 2016, so I guess Chrome does not make use of clang modules.


Yes, we are doing this wrong! But we don't have any right options yet either. I think another way of thinking about this is that if we can't imagine a better system, then the field is so mature that it has become solved and completely boring. I think systems programming is still a field for which there are new and novel problems to solve, or at least better designs to implement.


Wasn't this issue basically a driving force for Google creating Go? Or at least, a major design goal for Go was to get rid of O(n^2) compilation?


Yes, as per Rob Pike’s comments on the subject: https://commandcenter.blogspot.com/2012/06/less-is-exponenti...


However, to actually have the C++ compiler do inlining at compile time (LTO be damned), we have to put the definitions of inline functions into header files, which greatly increases their size and processing time.

I can't think of a component in a browser that would require inlined function calls in order to be performant. To really matter it would have to be many millions of calls per second.

Moreover, because the C++ compiler needs to see full class definitions to, e.g., know the size of an object and its inheritance relationships, we have to put the main meat of every class definition into a header file!

So let me suggest that you revisit those inlined function calls again. Once you start putting them into proper .cpp files and make use of forward declarations where possible the whole header dependency graph will probably simplify quite a bit.

Don't even get me started on templates.

Right. Don't use them unless there's no other way. Especially avoid for large-ish classes that do complicated stuff. If you can spare a couple CPU cycles (and most of the time you can) determine things at run-time instead of compile time.

Of course all of this is theory, not taking into account practical matters like deadlines, code readability or developer turnover.

Full disclosure: I worked at Google, but not on the Chrome team :)


Hi could you guys please fix the white flashing bug chrome has for almost 10 years? I think it might me the longest open unsolved bug I know in any IT project.

https://support.google.com/chrome/forum/AAAAP1KN0B0Rmd8IyUjG...


The important question is: how long does an incremental build take? The time to perform a full clean build is important but clean builds are rarely a part of my development flow. In my experience, efforts to reduce build time focus first on the incremental case, and rightly so.


I built Mozilla on my PowerBook G4 and it was still going 24 hours later :D Ah, good memories.


I had that experience with KDE, back in SuSE 6.3.


Have you give a look to dlang? or even rust


Rust's compile times are even worse.


That's only because they just started working on incremental compilation, it will get faster soon enough.


I've heard the same thing a year ago. It didn't happen. See this 2016 roadmap from August 2015 for example: https://blog.rust-lang.org/2015/08/14/Next-year.html


Incremental compilation is something that no major compiler for any ahead-of-time compiled language anywhere does. It's one of the most advanced features in any compiler, and as such it's taking time to implement. No C++ compiler I know of is even thinking of it.

As the first post here in this thread mentions, going down the C++ road of header files might have gotten us some short term wins, but ultimately it hits a brick wall. Incremental compilation is inescapable.


I surely do consider Visual C++ a major compiler for any ahead-of-time compiled language.

It does incremental compilation and incremental linking.

I would be quite happy if cargo was half as fast as my UWP projects.


A lot of things did happen. One of them is having the option to check the source which is much faster than compiling. 99% of the time you're compiling code in rust is to run the borrow checker and fix those errors.


That's indeed something where Rust is a lot better than C++. But for many projects (like games or GUI apps) you'll need to iterate with changes which can only be seen in the final product. So you'll need a complete build to run the binary.


Fair enough. However games and GUI apps are still a small percentage of applications developed in rust. I want that to change, of course. I just wanted to point that out.

I'm personally writing a game in Rust but the main logic is written in a compile-to-JS language and uses V8, so the issue doesn't affect me.


The basics have happened, and you can use it right now. It's not on by default yet though. Still a ton of work being done.


Yeah, last time I've tried it (a few months ago) incremental build times where a lot slower for a full rebuild and only about 10 % faster when touching a single rs file. I've already found a few issues and it seems you guys are working on it :)

But as of now: C++ incremental build times (with the right build system) are a lot better than Rust's.


They seem to be worse, but does anyone have information on how it scales?


Heh. I was reading about the origins of Golang and how Rob Pike and Robert Griesemer were waiting 3 hours or something for a C++ build to finish. And with that much time available - might as well make a brand new language that has compilation speed as a first-class feature.

Are we doing it wrong? In the case of C++ - yes, absolutely, 100% certainly, wrong. I'm not suggesting that Chrome would be better off under Golang - I'm just saddened that 30 years after the C++ abomination was born in Stroustrup's head, nothing else has come out to challenge it.

Maybe Rust?


I prefer that C++ abomination than the unsafe by design C.

Any C++ replacement needs the love of Apple, Microsoft, IBM, HP, Google and everyone else that sells operating systems.


With crap and hassle like this in the year 2017 for something as "simple" as a web browser, be sure to remind your boss Ray Kurzweil at Google, "the singularity is near" ROFLOL


At the risk of looking like I take the "Singularly" at all seriously...(and putting aside the non-sequitur about Kurzweil having anything to do with Google's shipping software)...

The implication you're making is that because computers have weird little glitches that pop up to cause havoc every once in a while, then it must be laughable to imagine they could rival the marvels of human intelligence. What that tells me for certain is that you haven't paid much attention to human intelligence.

There are flat-earthers, anti-vaccine nuts, and people convinced we faked the moon landings. We'll happily argue for thousands of years over whether the kid a virgin had is or is not the same person as his dad, but not that dad, the other dad. Show me someone who intuitively understands probabilities, and I'll show you someone who incorrectly assesses how people understand probabilities. I'm pretty sure the odd bug here and there doesn't disprove the ability of machines to outthink us.


> Damn, are we doing this wrong?

SUPER-wrong.

With all the time-loss, money-loss, opportunity-loss, C/C++/Js must have been rule out as the worst tools for our job decades ago, by ANY RATIONAL ENGINER.

And instead of band-aid them, the sane path is to freeze them and only apply critical patch and use something else. And other options are already know, have been proved and could have be a better foundation, except because developers are irrational like hell and don't wanna true progress at all.


And why should any other programming language be better?


Because exist some obvious flaws in the languages that could be solved forever.

For example, pascal is way faster to compile than C. Most pascal-variants are like this, yet provide the ability to do low-level stuff.

---

The main thing is that for some reason, we pretend is not possible to improve a language, or is sacrilege to remove problematic features, or to left-behind some things.

Programming languages are like any other software. It have UX problems, and them can be solved. Then, why not?

The mind-limiting answer is because break the language is costly and the programmers are so dedicated that expect them to never again check for null for millons of lines of code is too traumatic. Or maybe add namespaces to C++ is too hard.. but templates are ok.


> 30 minutes on a Macbook Pro--maybe an hour

Not bad. FreeBSD takes about the same, and it's plain C… oh wait, no, a large part of the compile time is LLVM/clang which is… not C.

Though there is now a clever incremental sort of thing called meta mode https://www.bsdcan.org/2014/schedule/attachments/267_freebsd...


> A Chrome build is truly a computational load to be reckoned with. Without the distributed build, a from-scratch build of Chrome will take at least 30 minutes on a Macbook Pro--maybe an hour

From someone regularly working on multi hour builds (on build servers), this imo sounds like the light kind of build I'd wish I worked with ;)

That said, this article was about a weird behaviour in Windows when destroying processes, not about heavy builds.


Heh, good to hear Google feels this pain, too. I work on the chromium codebase, and a 1 hour clean compile on a macbook pro seems optimistic.

Luckily ninja and ccache do a good job of meaning you only ever do that once (and rsync solved that problem for us). Not that a 20-second compile for a one line change is something I should be content with, but it's certainly workable.


I would like to see a more systematic solution to this problem. Most computers don't have a way to choose between throughput and real-time. So, I would love to see this available and preferably exposed as a run-time toggle.


Have you worked with Firefox? How does the build compare?


AFAIR, the full build was also around 30 minutes on a desktop i7. Unfortunately, incremental build (even for the most tiny changes) was around 2-3 minutes which was a real killer for me when I tried to do some open source work on it.


There are tricks to make it faster. If you find yourself working on the project again, ask for advice on IRC. 2-3 minutes is longer than it should take in most cases. (As long as you aren't touching widely included headers files, that is.)


yup. 30ish for a full compile, less with ccache and other tricks. Rebuilds 2-3 minutes.


Firefox is C and Rust, so I’m not sure there’s such a direct comparison.


C and Rust? Firefox looks like a ton of C++ to me: https://github.com/mozilla/gecko-dev


Primarily C++, but Rust is there as well. However it's mostly libraries which were extracted from gecko itself, so it's dependencies, not main codebase.


I stand completely corrected, don’t know why I had that in my head.


The vast, vast majority of the Firefox codebase is C++.


You're thinking of Servo - https://servo.org/


Firefox now includes many rust components.


I understand that very well, currently the project I work has 6.2GB of object files (per platform), imagine that. Chrome is not that huge. :)


What project is that? I am very curious.


> . . . and try to create encapsulation with getter/setters.

That violates encapsulation. Getters should be rare and setters almost non-existent.


Move to a language with a real module system?


The obvious solution is to change the c++ specification so that you can modify an inline function without requiring a recompile of every file where it's included


Try Dlang.


just use zapcc, and compile-time will be down. it's basically clang-4.0 so it's safe to use.


If it's basically clang 4.0, why is it not part of clang? Projects like zapcc remind me of why the GPL is a good thing.

(I predict that in 10 years all the various soft cores will be variants of RISC-V each with its own old, unmaintained and proprietary fork of LLVM).


If the times claimed by the developers are true, then it just tells me that BSD was the correct choice of license, as otherwise zapcc might not have happened. A world with a lot of high quality software, some of it open source and some of it closed source, is better than that same world minus the closed source.


I believe if GNU/ Linux hadn't happened, we would still be working on Aix, Solaris, HP-UX, SGI, Tru64, ....

And eventually some guys would be playing with a BSD variant.


Are you sure C++ is the only thing to blame, and not how you use it?


Is there any project that compiles significantly faster than Chrome while being in the same league in terms of LOC? I'd like to know if you have seen such an accomplishment even if its source is closed.


Is this the C++ version of "you're holding it wrong"?


That's just C++. Undefined behavior is no joke, and it's exceptionally hard to avoid all of it.


it's the c++ version of "why are you still using c++?"


On my case, because sometimes Java or .NET still need a little help from it.

Ideally with Java 10 or C# 8 planned features, it won't be any longer the case, but until then we have quite a few years.


what do you think would be a better choice than c++ at this point? clearly it hasn't helped with memory consumption. is mozilla getting it right-ish with rust?

also, would it kill you to like, coordinate something with the gmail team so their page doesn't kill my machine after being open for a couple of house?

come on guys.


I grew up on the Commodore 64 (1 Core, 1 hyper-thread :-), almost 1 MHz clock freq, almost 64 K usable RAM).

The machine was usually pretty responsive, but when I typed too quickly in my word processor it sometimes got stuck and ate a few characters. I used to think: "If computers were only fast enough so I could type without interruption...". If you'd asked me back then for a top ten list what I wished computers could do, this would certainly have been on the list.

Now, 30 years later, whenever my cursor gets stuck I like to think:

"If computers were only fast enough so I could type without interruption..."


Similar in sentiment to "I have always wished for my computer to be as easy to use as my telephone; my wish has come true because I can no longer figure out how to use my telephone."


I'll definitely steal this!


Bjarne Stroustrup said that[1].

[1]: https://en.wikiquote.org/wiki/Bjarne_Stroustrup


The best part is he said it in around 1990.


The worst is when this happens when you're not doing anything that should be computationally intensive, just entering text in a web app that abuses JavaScript.


Even just entering text into the search bar uses loads of resources. Every character entered does a full search of your history and bookmarks, sends a request to Google to do autocomplete, prefetches the web page you are typing, applies spellcheck, etc.


Sure, but I for one have yet to experience input lag typing into a URL bar when my computer wasn't otherwise under extremely heavy load.


I have almost the same background (Vic-20, if you thought programming was hard try having only 3.5k of RAM) and feel the same irritation...but I don't wonder why computers aren't faster, I wonder why operating systems and application is so utterly shit.

I have a Moto X phone (not too old) and watching it slow down is almost comical. Sometimes I'll decide to close a few apps and hitting the selector button brings up screenshots of all the open apps. Then I have to wait a few seconds for the X to appear on the app frames so I can close them.

If I had the inclination and temperment to be one of those YouTube personalities who makes a schtick out of complaining about things I would have no shortage of material for a 'designed by idiots' channel.


I think most of the shit is concentrated on a couple of notable platforms: Windows and Android. Outside of that space I've noticed less productivity hampering nightmare tools and OS features. Really I get perhaps one or two bits of stupid from Linux a year on the server-side, usually when integrating it with Windows ironically, and on the macOS/iOS front I haven't had a notable issue since I switched about a year ago.


I'm using Linux for my daily work for more than ten years now and develop for MacOS since around 2000, and I honestly cannot confirm this. If you have a fast and well-tuned machine, the sluggishness of modern applications might not be so notable, but it surely is there, and then there are also many usability issues of desktop software on Linux. Not to speak of browser-based applications, which mostly have unusable user interfaces anyway. For MacOS, usability is still high, but the multithreading and API layering on Cocoa is and always has felt sluggish to me. I no longer use Mail.app but remember it as a particularly bad example.

I agree with the OP, for actual use computers can do more powerful things than they used to be able to, no doubt about that, but programs and operating systems continue to feel slow and clumsy. Android in particular, but in the end all operating systems.

Bloated GUI frameworks, use of unoptimized images in GUIs, and non-optimal multicore programming are to blame, I guess.


Good luck on improving the situation as long as you have people running around who consider this all fine and normal.

https://ptrthomas.wordpress.com/2006/06/06/java-call-stack-f...


Idiomatic Java is idiotic.

Most of the hate for Java that I see is really hate for the idioms. The nice things about idioms is that you don't have to follow them. But for whatever reason, Java devs stick to them. And that's how you get monstrosities like that stack trace and Fizz Buzz Enterprise Edition.


Looking through that graph like "you know what we need? More abstractions!"


This actually annoys me badly. One of the problems I see regularly is applications that fail to log enough of the stack to see what the entry point of the thing that actually went wrong is because the syslog packet size is set to 512 bytes. The problem is clearly syslog then, not the 12KiB of stack your app throws when something goes pop!?!?


Um, yeah, the problem is someone setting an arbitrary limit because it was easier to implement. To argue otherwise is basically to claim that there is no possible justification for a deep stack, which, well, good luck proving that. Anyway, even if I don't like all that abstraction (I don't), I may not have a choice in platform (or logging system), so blaming useless syslog messages on the app developer is adding insult to injury. Not that blame is terribly useful when the best course of action is to just burn the whole thing down and start over. :)


The arbitrary limit is a performance thing. Fits neatly in one UDP datagram and passes through everything unhampered by MTU etc.

Agree that burning the whole thing down is the right thing when you get to this place :)


Logging the stack in what format? 500 bytes could fit as much as two hundred numeric entries, or as few as five fully detailed text entries.


If you think Java or Ruby is bad, don’t even try looking at the JS ecosystem.


I still can't face looking at the JS ecosystem after dealing with Netscape 4 back in the day. It did me some psychological damage which will never go away.


The nice thing about GNU/Linux is that the you can almost completely avoid all of the "desktop applications." I only ever start X when I need Firefox or mupdf. Everything else is a nice lightweight TTY app that I run in tmux. My 1.3 GHz celleron netbook is incredibly responsive set up this way.


Similar but with XFCE, it rarely feels sluggish and I nearly always know why.


I had a laptop celeron 600mhz running xfce and abiword (I think it was xubuntu 14.04 or something) with no networking (only a dialup modem was available). It was a great, responsive typewriter with formatting and backspace!


Aye, for me XFCE is the sweet spot of does what I want without getting in the way and speed.

It's funny really because I mostly use it on an 8 core 32GB Ryzen 1700 desktop.


Expect Firefox to demand a potent GPU just to load soon enough, thanks to GTK3...


Gtk3 literally uses the exact same rendering API as Gtk2, Cairo, which is accelerated by your Xorg driver the same as it has been for decades.

None of this even matters because Firefox isn't a traditional application and renders most thing itself using Skia.


Isn't the current version of firefox GTK3? I run it on my laptop with just the EFI framebufer and while it's /the/ most sluggish app installed it's still usable.


It is, and it may work for now. But as i become familiar with the mentality of the GTK devs i worry how long that will be an option.


Same, but I don't want to play text adventures for the rest of my life either. Graphics are a Good Thing.


> I no longer use Mail.app but remember it as a particularly bad example.

Huh. I used to think that my Mac Mini was just too puny for my huge mailboxes.

I now use Evolution on openSUSE running on a Ryzen 1700 with an nVME SSD, and it still feels kind of slow-ish. So maybe that program is in need of some loving optimization, too (would not surprise me if it did), or my mailboxes are just unreasonably big (would not surprise me, either). Probably a bit of both.


That's just Evolution. It's a big, complicated turd that uses a thread pool with no priorities. So you can sit there waiting to read a message while it is blocked checking your other folders. Also it's keyboard shortcuts are stupid.

Thunderbird is a lot better.


Unfortunately, Thunderbird does not like to talk to Microsoft Exchange Servers in their native tongue. If it weren't for that, or if my employer did not run on Exchange, I would be using Thunderbird already.


Let me just clarify that I don't use Linux on the desktop. I find it quite horrible to work with, particularly since the demise of Gnome 2. On the server and for development, it is fast and efficient. Most of the development work I do is from the Mac desktop to Linux machines.

I find macOS the "least bad" desktop experience. I think that's the true assertion from my initial comment. But I have a very high end MBP.

As a point for comparison which holds true across different applications, if I take the Photos application on macOS and port the data to the Photos app on Windows 10 and on Android, both of the latter are unusable with a dataset of 30GiB. I have tried this (in reverse!)

I haven't had any problems with Mail.app but I only use that for trivial home email. I use Outlook. Now that's a turd.


the op was building chrome. building on osx exhibits similar issues. there are instructions in the readme on how to configure macos not to chug and lose all responsiveness when building chrome.


You don't have to wait for the X. Just swipe them away.

Or just don't close activities ever, it's pretty pointless. It doesn't actually kill the app if it's still open.


The point I'm making is that the core UI shouldn't be limping along like this. Bad performance for an individual app is understandable, but when the basic UI is no longer efficient or responsive it makes everything worse.


way more naive but my first own desktop was a P75, a few years down a friend of a friend got his hand on a batch of P233MMX and I started to fantasize that these could be so fast, the menus may appear before you even finished clicking. I didn't know how non linear computers were at the time.

Few things also, I'm often surprised how fixed is the responsiveness of systems, hardware grew 3-4 orders of magnitude but the cruft and chaos plus the change of resolutions keep latency around the same value. It's sometimes even worse. Psychologically when I boot a 64MB windows machine and can enjoy word / IE / winamp I feel very weird (video could be offloaded to a chromecast for that matter).


Except without interrupts, your computer would miss all the keystrokes. :-)


That's so deep, it hit me in the IDT


Why i refuse to indulge the latest eyecandy and bling from the desktop world if possible.

If i didn't need a GPU to do window switching back then, why do i need it now?


FVWM and VTWM switch windows and workspaces instantaneously on my crappy netbook with just the EFI framebuffer.

I always laugh a little inside watching people try to do the same on windows 10 and OSX with all the hardware acceleration and waiting multiple for seconds.


To save battery power.


Then stop drawing all the eyecandy in the first place.


It's all just trade offs. However fast or powerful your machine is, software will use as much of that resource as possible, up to the point where it occasionally interferes with input (but not too much or you'll switch to something else).


But why? What good is an operating system on a multi-core device that allows anything to get that close to the performance envelope? This is a fine example of competition driving change for change's sake rather than real innovation and everything ending up worse as a result. I like new features as much as the next person, but not when they compromise core functionality. Not being able to type is inexcusable.


I agree. However at this point, I don't see anything an OS can do to help.

I see plenty of typing slowdowns every other day now. But I'm not sure just how many of them are the OS's fault. When your typing seems to lag, there are two places that can be slowing it down - the input side (reacting to hardware events) and the output side (drawing and updating the UI).

I suppose keyboard buffers are pretty well isolated, and native UI controls tend to work fine too. The problem is, everyone now goes for non-native controls. You type in e.g. Firefox, and it is slow not because of your OS, but because Firefox does all its UI drawing by itself. And God help you if the application you want to use is done in Electron. There's so many layers of non-nativeness on top of that, that the OS has close to zero say in what's being done. There's no way to help that - resource quotas will only make the problem worse, and giving such program free reign will only take everything else down.

All in all, it's just - again - problem of people writing shitty software, because of laziness and time-to-market reasons. Blame "Worse is Better".


Agreed, abstraction seem to be exploding these days and I’m not even sure we are at the end of the road yet! Linux or Windows never had any trouble with essentially realtime keyboard feedback in their Terminal windows. It’s not the OS.


Intellij still freezes while indexing on my work desktop which has 8 cores and 16 threads, at least they finally allowed pause/resume for that so that's a win.


It's hard to believe the OS couldn't help when Windows 10 has the problem but Windows 7 doesn't.


In this particular case, yes. But this subthread was about a more general principle of letting an app exhaust system's performance. What I'm saying is that, when facing a crappily coded app, the OS can at best choose between letting it suck or letting the performance of everything suck.


You have alternatives though. I've been using basically the same linux environment for about a decade now.

I don't have a proper desktop environment like Gnome or KDE, just Xorg and StumpWM as a window manager. Then I have Firefox, Emacs and urxvt running tmux. I use a handful of GTK applications when the need arises like Gimp, Inkscape, Evince and maybe a couple others. Done.

It boots up in a few seconds from a SSD, it's always snappy. It worked fine on a core2duo and HDD 10 years ago, it works even better on an i5 and SSD now.


Yeah I have a Linux environment on a keychain USB device that I now use often enough to think seriously about abandoning Windows (though I don't hate Win as such, but kept using it because of some applications I relied upon).

Linux has (sometimes) had sort of the opposite problem, insufficient innovation in user interface design. I'm looking forward to Gnome 3 now; I felt that when CSS took over the web a lot of UI innovation moved to the server end and stalled at the client (think the really long hiatus in the development of Enlightenment, which was at one time the cutting edge of UI design/customizability while still being fast and responsive).

If you want ideas for where Linux capabilities should b going, please go check out Flowstone, which I think is criminally under-appreciated. The current version uses Ruby, but previous incarnations allowed you deploy code in C or assembler(!) within the visual programming environment. It's doin' me a heckin' confuse that this isn't a standard development environment option for everything from shell scripts to large-scale applications. Once you go to flow-based programming text-only IDEs look masochistic and pointless, and text-only is a terrible way to teach programming to people because discovery and syntax are inaccessible and really better done by computers. I like NoFlo for js development but the Linux desktop is crying out to be brought into a flow-based paradigm.

Sorry about going a bit off-topic but when I see exhortations to go with extremely simple solutions like StumpWM I have the opposite-but-similar reaction to the OP: why am I running some beast of a computer (at least by historical standards) so I can have a 20 year old user interface? Surely there is some middle ground between cancerous levels of abstraction/feature creep and monk-like asceticism.


That's is better rewritten as "however fast or powerful your machine is, software will waste as much of that resource as possible".


"...if you let it". Some of us have decided that the value of marginal eye candy et al isn't worth interrupting our UI flow. I suspect that many people would have decided the same if most modern OSes weren't built around removing so much control from the user.


There's a talk by Guy Steele 'growing a language', he says that ideally a language should shrink as it gets smarter semantics.


I remember writing C code on my Commodore 64 in my early teens using Power C from spinnaker software. If memory serves I had to put 3 different 5 1/4 disks in to compile / link. There were compiler 1, compiler 2, and a linker diskette.


Plus ça change, plus c'est la même chose...


"The more it changes, the more it’s the same thing." [0]

[0]: https://en.wiktionary.org/wiki/plus_%C3%A7a_change,_plus_c%2...


The common idiomatic translation is "the more things change, the more they stay the same."


It's the same meaning but the tone seems completly different to me.


It does?


For English the equivalent meaning would be, "the more things change, the more they remain the same".

I don't think I've ever heard the literal translation in English.


Weird that no-one here is mentioning how Windows is the only platform that still has this problem.

UI threads on a graphical desktop should always be the most privileged processes.


Your Commodore 64 had one hyper-thread? Wow, that's forward thinking.


This is great work. I hope MS can improve Windows 10 by fixing this. I just added a new Win10 laptop with much better specs than my 3 year old rMBP, and I'm shocked by how much apparently random latency I experience with the UI in Windows 10 compared to the Mac. That's not to mention the issues of sloppier track pad (which constantly detects my left hand while I type) or the ungodly slow unzip (via 7z).

If only Apple would give us more than 16GB of RAM (in a laptop)... what a frustrating world for developers.


I mean the trackpad issue is generally down to the manufacturer and the drivers they provide, but I take your point otherwise.

A pro tip with 7z is to use the two pane interface to extract and not dragging files to Explorer. The latter option extracts the files to a temp directory before copying them whereas the former extracts directly to the destination.

The Windows file systems are pretty cruddy in general though so decompressing tonnes of small files in general will take longer than on OS X or Linux.


> I mean the trackpad issue is generally down to the manufacturer and the drivers they provide...

Was just wondering this myself. I tried a Surface Laptop this week, and its trackpad has everything I love from the 2012-era Mac trackpads. A satisfyingly snappy physical mouse click, gestures & acceleration, and a perfect size. (I can't stand Force Touch on the new MacBook Pro, and I don't like their new giant trackpads.)


MS is trying to fix the driver situation with their "precision touchpad" initiative: https://arstechnica.com/gadgets/2016/10/pc-oems-ditch-the-cu...


No complaints on the sp4 trackpad here


  The Windows file systems are pretty cruddy in general though so decompressing tonnes of small files in general will take longer than on OS X or Linux.
I haven't checked this in a while, so it could be out of date... But what you claim is windows being cruddy, might be a symtom of a slightly different mindset in data protection. Here is a test, grab a big tar.gz/zip file and uncompress it in linux and in windows from the command prompt. As soon as you see it complete, pull the power plug/battery from the machine. Plug it back in, check the result.

In the past windows seemed to be more aggressive about flushing things to disk, and applications were far more generous with calls to FlushFileBuffers() on windows than similar apps on linux were about calling fsync(). Seemingly because fsync() is a machine deathblow on linux when under IO load. Where as windows did a slightly better job of only flushing the requested file. Also, i've seen a number of "streaming" apps on windows play nice and use FILE_FLAG_WRITE_THROUGH.. which makes them run slower, but doesn't trash the whole system.

Bottom line is that I wouldn't call NTFS cruddy, it does a lot of things that still aren't mainstream in linux filesystems. If your seeing some huge performance anomaly between windows & linux look closer. NTFS may not be sexy, but its a solid piece of engineering (ok some things are cool like VSS. TxF is also cool but might get deprecated because no one knows about it).


No, NTFS is cruddy. It is eager to flush metadata but not data, and there is no way to switch it to "journal data as well" mode (at least not as far as Win8.1, maybe 10 added this mode).

This is the default behavior on ext3 and ext4 as well, of course, and if you pull the plug on NTFS, ext3 or ext4, you WILL end up with files whose length reflects recent writes but whose contents do not (and I am saying this from analyzing hundreds of cases of file contents that are impossible given program flow -- about 10:1 on NTFS than on non-data-journalled ext4). But at least on linux you can tune2fs -o journal_data, still get very decent performance and much, much better integrity.

Also, I think you are conflating sync(), fsync(), fdatasync() and FlushFileBuffers. calling fsync() or fdatasync() on linux is no worse than FlushFileBuffers - the IO killing behaviour you describe is associated with sync(), which, when you need it, is perfectly justified in killing IO.


> A pro tip with 7z is to use the two pane interface to extract and not dragging files to Explorer. The latter option extracts the files to a temp directory before copying them whereas the former extracts directly to the destination.

That's insane. I had no idea. Thanks for the tip!


It's because the 7-Zip file manager first extracts to %temp% and then copies (not moves) to the drop location. If you use the regular extract function it extracts to the destination directly. Using the Explorer context menu does the same thing, and thus is also fast.


I use the right click context menu on the archive file to 'Extract Here' which seems to accomplish the same thing.


I have just bought a Windows 10 laptop for my wife.

I don't know if it is random updates or junkware or whatever, but sometimes the system will lock out for minutes - task manager shows <50% of load on any of disk, ram and cpu, but even waiting for task manager to launch can take a minute! I've witnessed a simple program (FAR manager) taking a minute to show its UI!


Try turning off all of the malware and search features. Windows defender will run on every file, folder, or executable you touch. Play a movie with VLC? Here let us scan VLC, every DLL it depends on, every file you have opened with it, and the entire file you are about to open. They also bundled Cortana and Bing with search, which means you can get ads in your search! But also means it makes like twenty web queries every time you search. Also search indexes like non-stop for no reason I can fathom, which is a shame because I actually liked the windows 7 search features (which are now bundled with this ad serving, resource hogging, shit).

Also, don't trust task manager, it lies about utilization if you aren't running as admin and don't have it configured correctly.


Windows defender is by far the "best" malware software when it comes to having a light footprint. Scanning every byte of IO for malware signatures, and the ton of other heavyweight operations are a sure way to eat a ton of processing power. I have a under powered win10 tablet, and I have to remind myself to "disable" windows defender every time it goes to install an update. Otherwise what should be a 2-3 minute operation can run for 10-15 mins.


Yea I don't have a problem with it per se, I just wish it implemented any sort of improvement to reduce the hard-drive usage. Say, integrating with the OS to read the same memory as the program will (rather than reading it 2+ times: this kills the drive). Or perhaps storing a list - with hashes - of already scanned files, so it doesn't scan the entire 30+ GB windows dev kit every time I open a large project in visual studio (especially egregious because it forgets it already scanned it when I open a different IDE or say... compiler). The thing is out of control.


"integrating with the OS to read the same memory as the program will (rather than reading it 2+ times: this kills the drive)."

Is it even possible to read a file twice while ignoring system buffers cache?


Yeah, I've been out of the Windows game for some years now, but my wife has a few (desktop and laptop), occasionally comes to me and says "it's slow. fix it." And indeed it is slow, even on moderate browser use -- things just hang. Specs are more than adequate.

I have to say (after halfheartedly looking into it a few times): "Um, sorry honey, I have no idea. Guess it's a Windows thing." She nods, and goes back to patiently waiting minutes for the browser to stop hanging! Seems that people using Windows just accept this kind of thing as a given, to the point they don't even notice it. (I've installed Linux for her a few times, but of course, it's too different, doesn't run Photoshop, etc...)


The latency and unzipping problems can be solved with linux. Though the trackpad could get worse. The random latency and unexplained HD activity always offended me in Windows - it is _my_ laptop after-all!


I was wondering what ETW is, and found in one of his other posts: ETW (Event Tracing for Windows).

He seems to be one of the main contributors of https://github.com/google/UIforETW

Seems to be quite useful.

---

About the post: tl;dr: NtGdiCloseProcess has a system-wide global lock which is used quite often e.g. during a build of Chrome which spawns a lot of processes. This problem seems to be introduced between Windows 7 and Windows 10.

I thought that there would be a solution or a fix but it seems this is not yet fixed. "This problem has been reported to Microsoft and they are investigating."


> NtGdiCloseProcess has a system-wide global lock which is used quite often

"Holds" rather than "has", and importantly that system-wide lock must be held by things like SendMessageW (which sends a message and waits for its processing before returning) which is pretty critical for UI updates.

This is compounded by parallelised build processes as not only do they spawn lots of processes leading to lots of process destruction: it both fucks up the UI and completely serialises process destruction so your nice 24 process flights for faster processing end up taking multiple seconds to all shut down, and the more cores (and thus processes) you have the worse it is, and the worse your stutters.


> "Holds" rather than "has", and importantly that system-wide lock must be held by things like SendMessageW (which sends a message and waits for its processing before returning) which is pretty critical for UI updates.

Do you happen to know anything about it? I'm scratching my head how it's possible that process termination serializes with GUI... Maybe it hogs some lock on process descriptors which SendMessage also needs to grab for a moment to find the target process? I hope you didn't mean to say that every SendMessage call is completely serialized with each other.


> Do you happen to know anything about it? I'm scratching my head how it's possible that process termination serializes with GUI…

Not anything more than what's in the essay. But possibly some (pair of) utility function calls were added for e.g. a cleanup or notification to the OS between W7 and W10 which was not noticed at the time.

> I hope you didn't mean to say that every SendMessage call is completely serialized with each other.

It's my understanding that at least a subset of SendMessage is serialised in the kernel yes, and from the essay:

> functions like SendMessageW, apparently waiting on a kernel critical region[…], deep in the call stack in win32kbase.sys!EnterCrit (not shown)

[0] https://msdn.microsoft.com/en-us/library/windows/desktop/dd7...


As a Windows outsider, I'm puzzled why programs used as part of the Chrome build system (which I'd expect to only use console I/O) are using APIs that cause interactions with the GUI? By analogy, is this not like gcc redundantly setting up a connection to the Xserver each time it is run?

(I'm making an assumption that NtGdiCloseProcess is part of the GUI API (GDI == Graphics Device Interface) which is why it may interact with the GUI message passing.)


In the very early days of NT the GDI subsystem was in userland and you wouldn't have this problem. Unfortunately it was too slow for machines of the 90s and so GDI+user32 is very tightly integrated with the kernel.

Even to the point where it does neat things like callback user mode code from the kernel. Unwinding this without breaking things is nigh impossible at this point.


More precisely, I believe the kernel tracks what threads are "GUI" threads that use win32k.


I wonder if this is the same lock most win32k call takes.


Why is everyone talking about how C++ compilation is slow or something instead of talking about the real problem, which is that GDI is doing resource cleanup on process exit, under lock, for console-mode processes that have probably never used any GDI resources?


mxatone mentions that "Win32k locking design is just bad" in https://twitter.com/mxatone/status/884436870955913216


It's not even bad. It's neglecting any kind of "has this process ever touched win32k.sys" test.


I feel like the (brilliant post) is missing the context that if you were debugging the latency on Linux, you would have the source code to continue the investigation until you found and fixed the problem, as opposed to just teeing it up for Microsoft.


Most of the problems are solvable, if proper tools are used. Microsoft provides "PDB" files, which contains symbols for ease of debugging. You can get them from Microsoft's symbol server. Load the symbol and the binary in IDA, and the generated pseudo code is enough for most scenarios.

In theory, debugging programs on Linux should be easier. However, for some distributions (like Arch Linux) debug symbols are not provided. You have to compile the program on your own if you want to debug. It's especially painful if the target program has a large codebase!


I remember every other book on Windows programming saying "Process creation/destruction is expensive, use thread pools (or at least process pools) instead, that's the way to go on Windows". Perhaps this mindset is ingrained for Windows QA team too - they don't have [enough] test cases for such scenarios.


Seems like a feedback loop. It's expensive, so most apps avoid doing that, meaning there's less need to check for performance regressions. Then if there is a regression, it further increases the incentive for developers to avoid it, so in the future even fewer apps will do that, making it even more of an unusual use case, so even less need to test for it..


This is probably also why Cygwin and even the WSL subsystem in general are a lot slower when running more complex shell scripts, which is typically spawns tons of processes.

I wrote a pretty simple shell script to test WSL process spawn speed, which loops over a simple echo piped to a grep, and add 1 to a counter until it reaches 1000.

On my windows machine, in a Linux VM, I consistently get times like this:

  real    0m1.381s
  user    0m0.073s
  sys     0m1.472s

On the same machine in WSL, I get results like this consistently:

  real    0m14.878s
  user    0m0.469s
  sys     0m12.109s
That is 10 times slower... I don't have cygwin installed anymore, but when I tested it initially when trying out WSL, it was even slower...


The Amiga prioritized user-input interrupts before all other interrupts, so if there ever was a time you couldn't move the mouse on the Amiga, it meant that the system was well and truly crashed.

30 years on and the peecee industry still doesn't know how to design a fucking system.


I was just thinking that, my A1200 never skipped a beat.


For my own amusement, when ever I get a new OS build on my machine, I'd open up a task manager and watch CPU load just by wiggling the mouse a lot or maybe simply pressing page up and down. I'm pretty sure it's always pretty easy to generate 25% CPU doing very little. Another thing would be just opening a local file from within a running application and wondering why multiple seconds and hence billions of CPU cycles seems to be consumed what one expects should be doing a fairly menial task. (I am pretty sure in DOS 3.3 with Norton Commander it was quicker )


I'm pretty sure it's always pretty easy to generate 25% CPU doing very little

not for me, just tried mouse wiggling, scrolling, page up/down in browsers/file managers on Windows 7, 10 and Ubuntu 16 and none of those have such behaviour. Likewise for opening a file dialog. And I didn't expect otherwise actually, what OS are you talking about?


That used to be true back when the CPU did all of the GUI rendering. But now most all of it is offloaded to the GPU. Any GPU that can render Quake 3 Arena at 120 FPS (and that's ALL of them even Intel IGPs) can wiggle a window around very easily.

Not sure about file opens. Simple applications like GVIM don't seem to have seconds of delay for me, but I know what you mean with things like spreadsheet or word processor files. I guess it is all of the unzipping and XML processing.


Actually, its not as accelerated as it was back in the winXP days. Here is a random link about it: https://www.youtube.com/watch?v=ay-gqx18UTM.

Basically, the GPU card vendors could hook any part of the win32 GDI pre windows vista, and they did. Post vista, only a tiny portion of the GDI is accelerated. In theory you can avoid this by writing your application using a more modern API, but the vast majority of native windows applications continue to basically be GDI based due to age, or various GUI toolkits still being GDI based. Worse there are a number of toolkits (or browsers) which implement their own drawing routines rather than calling the system supplied ones.

The final composited results with aero are of course accelerated, but that only really tends to add additional latency. Switching to one of the basic themes makes win7 noticeably more responsive, but also tends to tear a lot. I've got an incredibly high end desktop machine, carefully tweaked/optimized and I can frequently see it the ~1/10 of a second lags while windows update after being maximized/etc. Compared to the 10 year old pretty high end XP machine (with upgraded SSD/etc, and also carefully tuned) it doesn't seem to be faster on basic desktop type operations. Fire up a recent game, or doing builds its massively faster but running word/firefox/whatever the old machine "feels" faster.

(tuned, as in I have a dozen or so, tweaks I've been collecting/researching for the past decade+, on ways to make the machine feel more responsive, it all started with MenuShow delay in win95, and has grown from there and now includes all the usual stuff plus tweaking power profiles, and a bunch of less obvious "feel" things like high DPI mice with fast base speeds).


tweaks I've been collecting/researching for the past decade

Do you happen to have this available to the public somewhere?


He said wiggling the mouse, not a window.


I wonder if this accurately captures the time spent on the task. I am surprised opening a file takes that long. I primarily live in a Linux environment, but I also run some Java code for R&D data collection on Windows also. It opens 10-20 files on startup which does not lead to a perceptible delay. I have not measured exact performance, but if opening (all 20) took more than half a second or so I would have surely noticed this.


macOS has always had really smooth window wiggling because each window is a separate layer rendered on the GPU. Windows has had problems in the past because wiggling windows causes repaints in the background ones, they had no such buffer, but Windows Aero uses a model a lot like the macOS one.

What OS are you running?


Speaking of W10, there was another annoying W10 bug where if you started typing immediately after using the touchpad there was a random delay. If you care about latency and responsiveness, it makes you want to scream at the people who implement these features.


That's not a bug. It's some kind of feature by Symantec that can be disabled in the settings.


Haha, it never crossed my mind that someone would implement this purposely. I don't have a W10 laptop so I've only noticed this on other peoples machines.


I use this feature on my laptop. Without it I would accidentally touch the touchpad and insert characters where there aren't supposed to be.


My 16mhz 68000 based Amiga 500 had smoother mouse movement than my 3.2ghz 8 core desktop.


IIRC, on the Amigas the CPU was barely involved in the process of reading the mouse events and moving the cursor sprite on the screen.


Yep, hardware sprites.


Modern GPUs still have hardware sprites for the cursor, and OSes still use them! However, the path between the mouse and updating that sprite has gotten more complex, alas.


I imagine a modern day Amiga would run all mouse and window server code on the GPU, only telling the CPU when the app needs to update its bitmap.


I think the best summary of how to do this is to take a look at Herb Sutter's three article set on "Minimizing Compile-Time Dependencies". https://herbsutter.com/gotw/

"The Compilation Firewall", or pimpl idiom, is a clever mechanism for getting code out of the header: https://herbsutter.com/gotw/_100/


I use this liberally. Generally, it is a runtime performance issue only on repeated allocations for which you can optimize if needed. Once-only allocations can be ignored when using compiler firewalls.


And the second cache miss: 1 for the cache miss, 1 for the actual object.


Right. Usually the entire application is not performance critical. Only certain sections are. For that part of the application, profile and optimize.


I had the same exact problem. A machine that is overkill spec wise but the mouse and keyboard would freeze up every minute on the minute.

I tracked it down to my desktop wallpaper being on a rotation. Seriously... how bad does that have to be implemented in Windows to actually hang the mouse and keyboard?


For all the hate Windows gets, UI lockup hasn't really been a pervasive issue for me in years, whereas on my Linux desktop that's just a typical day.


My Linux desktop never locks up. I'd suggest messing around with a different kernel, double checking graphic drivers, etc.


I was probably a bit imprecise; I do sometimes get full lock ups where I have to reboot, mostly when running TensorFlow locally, but usually I just get lock ups that last < 10 seconds when compiling things in the background.

It's at the level of "annoying, but doesn't impact my work", so I just live with it.


Ah I see. Which scheduler are using?

$ cat /sys/block/sda/queue/scheduler

noop deadline cfq [bfq]

I've been very pleased with the responsiveness of BFQ when multitasking.

And actually, I shouldn't say I never get lock-ups. There's been a couple times where the DE just freezes, and I'll have to hit Ctrl+Alt+F2 to switch to another tty and restart the display manager. But I attribute that to running bleeding-edge version of things and enabling experimental features, so that's fair.

Lastly, my MacBook Pro (2010 6,2) would also experience random freezes on Ubuntu (would have to power off/on) and upgrading the kernel from the Ubuntu default to latest mainline solved that problem completely.


$ cat /sys/block/sda/queue/scheduler cat: /sys/block/sda/queue/scheduler: No such file or directory

I guess my work does something weird with our desktop linux installs, laptop says this though:

$ cat /sys/block/sda/queue/scheduler noop [deadline] cfq


Last time I was running Ubuntu (a month or two ago), trying to use the wifi network management menu would easily lock up the interface for a while - I'm guessing it manages to do slow work on the main window manager thread. Maybe that's a common problem.


I've seen lengthy (10 seconds to an hour) freezes primarily when the system starts thrashing.


I've seen this too, I have two very similar specced workstations, one has consumer level SSD and the other has an older Intel S3500. Whenever I do heavy IO, the system with the consumer level SSD will start freezing.


Not sure why the downvotes, but okay. Tough crowd.


The first time I had to compile a Linux kernel for Android, it took only a few seconds (on the ridiculously overpowered "build machine" my employer supplied). I was sure I must have done something wrong, but no, that was the entire build. It takes longer for Android to reboot than it does to build the kernel.

It does feel like there's something seriously wrong with the massive C++ codebases we use these days for key infrastructure like browsers, and the massive compilation times we put up with.


What Knuth said: "Premature optimization is the root of all evil"

What most developers hear: "Optimization is the root of all evil"


Probably not really feasible, but I'd be interested if something comparable happens when your build process uses many threads instead of many processes. I still think using processes instead of threads is a hack, though I know the mainstream opinion says nowadays processes are the way to go and threads are a hack.


We've come to the point where building the browser from scratch takes more than building the OS itself.


That's because we've come to the point where the browser is now more akin to an OS than a networked document reader.


That's because the browser had to do things the sneaky way.

Everybody wants to control the app platform. Every time a cross platform solutions appears everyone tries to shut it down. See: Java, Flash, etc.

The only one that platform owners couldn't really shut down (though they are getting better at it), is the web. It was considered too "dumb" to shut down and too useful to completely avoid. And bit by bit, just like the boiling frog story, people added interactivity features until we arrived where we are now. A bloated, perhaps crippled cross platform solution, but the best we have now for running things on things from embedded, mobile, desktop, etc., etc.


Sounds a lot like WeChat. Messaging app that's added more and more features to now become almost an OS on top of the smartphone OS in China.


I'm reading this thread on a Chromebook and really getting a kick...


Interesting metric actually. Building Windows takes 12 hours [1]. It's a bit harder to find metrics on Linux. You can build the kernel in 60 seconds apparently [2] but that is not a complete operating system.

[1] https://stackoverflow.com/questions/226377/operating-system-... [2] http://www.phoronix.com/scan.php?page=news_item&px=MTAyNjU


Someone's old data from 2012 to build a core-image-sato using Yocto is just over an hour - http://www.burtonini.com/blog/2012/11/15/yocto-build-times/.

That said, core-image-sato used to be just a simple demo image. In my old build system, we would build the Arago Project for a TI SoC every night and it would take a few hours but that included a lot of the DSP code as well which was really slow too. So a couple of hours on average is my guess.


Building my Linux distribution from scratch takes about an hour on a modern system from top-level ./configure && make -j5 (which produces the installer ISO image), a great deal of the time, however, is spent in first building the cross-compiler toolchain (to ensure that no matter where you build my Linux distribution you get the same results).

This is the cross-compiler toolchain, the kernel, and about 250 external packages.


Can build base+X11 BSD systems in a handful of minutes on modern hardware in one command. I don't have modern hardware, and build over NFS, so I can't give a precise figure.

But yes, windows includes lots of API's and frameworks and GUI apps and such. Probably something comparable complexity wise would be Base+X11+(1 of KDE or Gnome)..


How do you define the OS? If it's just the kernel, sure, but then is it really surprising? If it's also all the userspace necessary to e.g. run said browser, then I don't think this will be true anymore.


Somebody said Chrome takes 30 minutes to build on one workstation and I don't seriously think you would build Windows 10 that fast.


"We've come to the point where building the browser from scratch takes more than building the OS itself."

It doesn't. It took ~5h to build a tiny (~150 MB) complete system (Linux kernel + Yocto based "OS") from sources on 4 core PC few years ago. With modern CPU, I guess, it might be 3h or so. On the side note, the build process generated >50GB of files.


I admit I have not done so in a while, but I vaguely remember building Net/OpenBSD from source was faster than that on moederately powerful hardware (Core2 Quad 2.4 GHz, 8GB RAM, no SSD).

I think NetBSD took ~2.5 hours, including building its toolchain.


Honestly, Yocto is a complete mess and a terrible embedded OS system IMO. Buildroot is far better, and takes far less time to build.


If "OS" isn't restricted to *nix then we have ReactOS that builds in around 10 minutes!


I am pretty sure building a common set of packages for, say, Fedora, or any version of Windows takes a couple orders of magnitude more resources than building Chrome.

On my Macs I've been doing "port install -s" most of the time and I can tell you some relatively small installs can take a very long time.


Try running Carthage update for an iOS app with 80 frameworks. Ever seen a Mac hit 55GB of ram? MacOS kills it. It takes many of these runs, and all this does is fetch prebuilt binaries. Yes, everything I said here is beyond stupid.


Is it necessary to update _all_ of them at once? I think that's the same problem as OP... rebuilding the whole world is slow, so why not do it incrementally?

Also, 80 frameworks ⊙⊙ I'm assuming some of these are internal, and written in Swift, meaning they can't be compiled into static libs (easily).


Can take the alternate approach -

Run your builds in a hugely underpowered VM, and wait much longer.. your regular usage will be largely unimpacted, although the builds take longer.

Source: Currently running a ~1000 package dpb(1)[1] build of my needed openbsd ports on a dual-core KVM machine hosted on a 8-9 year old amd64x2 2.2ghz. 3 Days and counting, will probably be done around next weekend.

From there, incremental updates are mostly slight, and can complete overnight from a cron job.

.. [1] https://man.openbsd.org/dpb


Out of curiosity why do you keep such an old machine around? I have found that newer machines can be had for free, and 7-year-old Xeon servers can be had for <$200 from IT recyclers.


The electricity consumption numbers are also relevant.


My workstation was recently "upgraded" from a Window 8.1 workstation with a 4th gen i5 to a Windows 10 laptop with 5th gen i7. The extra RAM and SSD over HDD is great, but whether it's because it went from a quad-core to a dual-core hyperthreaded CPU, or because of the jump to Windows 10, the mouse lag is considerably more noticeable now. I've convinced them to upgrade my laptop again, but now this article doesn't give me much hope for my new work toy.


While the issue closing processes slowly is unique to Windows 10, I've found similar situations of not being able to control my OS on RHEL, Fedora, Ubuntu and MacOS.

At this point I genuinely believe latency will be the death of general purpose computing.

iOS and Android very rarely get out of control. Apple is already pushing iOS devices as laptop replacements. But we lose a lot on these devices with their locked down OSs and inability to install the software we want.


All the midrange Android devices I own get out of control all the time if you switch between heavy apps too quickly. And every once in a while while I wait for the system to become responsive again, some low level process like "Android System" will crash and restart itself. I shudder to think what will happen once we start throwing desktop-class loads at Android using it's current application stack.


Android 7 on a Pixel device, and just browsing with Chrome will cause it to completely freeze (no panning, can't focus the URL, can't open task switcher) for seconds when first landing on a heavy page. This has been a problem since day one and with every single Android device I've ever used, as there's still a whole bunch of stuff on the main UI thread as any developer on Android will intimately know, and it absolutely will freeze when under a load. It's just fundamentally architected wrong, since you can block the UI thread it does block.


Android isn't locked down. But it also isn't exactly smooth. my Android device frequently shows me white screens or startup splashes when doing something as simple as switching tasks. It clearly can't keep multiple apps in memory at once. Somehow iOS manages to produce a much smoother experience. I don't think it's a fundamental issue though. Just a matter of priorities.


I've seen Android get "out of control" on lower end devices. If you stick with higher end devices and/or first-party the experience is usually very slick. I don't think I've ever had a OnePlus device freeze up or stutter on me, for instance.


My OnePlus X certainly does. When charging or after 10/15 minutes of intense graphical use it starts getting sluggish and unresponsive.


Probably the CPU/GPU scaling back because the phone is getting too hot?


I just wanted to say good work. I'm impressed you dug so deep. A fix here could really impact the entire Windows 10 user base.


Saw the headline and thought "must be Windows".

I'm not a Windows hater, but one of my long standing gripes about Windows is that it just seems to have terrible multitasking compared to OSX.

I'm sure there are reasons but it just seems utterly symbolic of Microsoft that they never managed to get Windows to multitask in a rock solid, smooth and reliable way like OSX.


I am absolutely a Windows hater, but I've experienced the same problem in Linux, MacOS, Windows, BSD, you name it.

See for example Con Kolivas' famous rant about how Linux schedulers were ignoring the interactivity requirements of desktop usage, and resulted in a terrible experience with constant tiny freezes.


It seems to have improved these past years but beach balls used to be comically widespread on OS X. I'm not at all convinced OS X is in any better shape than Windows.

And OS X certainly doesn't have anywhere near all the kick ass monitoring tools that Windows as, such as the ones shown in this article.


You do know that OS X have DTrace?



not just "it has DTrace" it has one of the best implementations/supports of DTrace in any operating system.

It's quite impressive.


> And OS X certainly doesn't have anywhere near all the kick ass monitoring tools that Windows as, such as the ones shown in this article

It certainly helps that Intel really likes Windows, but I haven't seen Mac users complaining as much as Windows users do, so, it may be for lack of demand (and the bundling of dtrace, for ages).


Uhm I have a 40-core machine with 256GB of ram at work and I can make it completely unresponsive without taxing either ram or the cpu to 100% - we have a certain computational load that just destroys the CPU<->Ram bandwidth, so cpu usage is about 30-40%, ram sits at 50%, and yet the computer is completely unresponsive. It's exactly the same on both Windows Server and Linux - we're just running into hardware limits.


Interesting. I've been CPU mining on my old IVY bridge and i don't even notice it causing much performance drop. Free coins for me!


I'm reading your comment while upgrading brew. This prevents me from working because typing in Sublime Text slows down to a crawl during brew's compilations ;)


> my long standing gripes about Windows is that it just seems to have terrible multitasking compared to OSX.

Using both Macs and Linux laptops, I'm sometimes shocked at how the Mac sometimes locks up when the Linux machines degrade much more gracefully under heavy loads. I never dug too deep into it, but it feels like it's something with HFS+ under heavy IO. I hope APFS fixes that.


I'm far from an expert on the topic, but I'm pretty sure APFS does a lot of global locking, which would definitely explain at least some of that.


I just noticed now, 4 days late, that I wrote APFS but meant HFS+. Sorry.


Interesting. Back when I switched my Firefox builds from Core2 Duo T8300 @ 2.40 GHz + spinning disk + OS X to i7-950 @ 3.07GHz + SSD + Ubuntu, GUI responsiveness during Firefox build went from OK to bad. Canonical's support suggested getting a second computer as a build machine.


May be the machine too. I "feel" my Xeon desktop and my i3 laptop degrade more smoothly than the i7 laptop. It's entirely subjective, of course, but when I torture the machines the i3 go from usable to uncomfortable while the i7 goes very usable to unbearable. The Xeon goes from very usable to usable. Could be the i7 throttling more heavily from thermal issues.


Ever since I purged Dropbox from my life I haven't had intermittent UI lockups on macOS.


Interesting... One Mac here has Dropbox and the other has Microsoft's OneDrive. Maybe it's their fault.


This has little to nothing to do with multitasking.


OS X just slows everything down so you won't notice when something get stuck.

I've been a Mac user for 7 years now, my latest greatest MacBook Pro halts for a few seconds when I connect an external monitor and everything is frozen and unresponsive, like I'm looking at a screenshot of the system

OS multitasking is still usually shit...


That's at least excusable, unlike GUI freeze caused by termination of a number of processes which don't even use the GUI.


My ten year old crappy MacBook Pro doesn't do that. Weird.


I have a new Ryzen 7 CPU (you know, 8 cores, 16 threads), with 64GB RAM and for HDD a Samsung 960 PRO M.2 drive.

But when I went and plugged in an external Seagate 4TB drive, and tried to "dd zero" the s#it out of it, my whole system became unresponsive after a while, obviously I had to reset the machine as it wouldn't "kill -9" the process that made the system unresponsive.

Trying to type was a no go as keys would sometimes become "stuck". Moving the mouse around was an exercise in predictability too.

All this happened in the latest Ubuntu 17.04 64bit... #sadstory


Seems like a good way to deal with this (outside of trying to convince Microsoft to fix it) is to spin up a VM and just give it a few cores, and do the build inside the VM to isolate this behavior.


Why the OS don't prioritize UI threads on one or two cores?


How does it know which threads are UI threads?


I think it should at least know which thread handles mouse movement.


Won't help if the problem is as the author describes (contention for a system-wide lock).


I think the windows scheduler might already do this, but the issue is that UI message queues take a highly contended lock.


When I converted from Windows 7 to 10 I noticed that I started getting audio latency/glitches/squelches from my external audio interface- a Focusrite Scarlett 2i4. The mouse would also sometimes hang. Doing some basic tests, the problem seemed to be coming from network drivers... but I could never resolve it. I wonder if the author's discovery has anything to do with the issues.


Thats why my machine gets that slow after running for weeks... explains everything!


I wonder if the "privacy" features in Win10 play a role here. Seems like some extra process accounting could cause delays not present in previous versions.


Thank you for bringing attention to this. Experiencing this on our W10 workstations.

I hope MS does something about this immediately. It's maddening.


Does this also occur on other OSes, like Linux, or MacOS? Can you move your Chrome build to another OS and not experience this problem?


Use the gold linker if your setup permits it.

This issue went away for me when I switched.


Wasn't this the title of a Bruce Springsteen song?


I can't imagine that moving the Win32k stuff back to CSRSS would help much in this case, right? Though it is still a good thing especially for terminal servers where hopefully one CSRSS process crashing just terminate the session.


> moving the Win32k stuff back to CSRSS

Out of curiosity, was there some big move of functionality from CSRSS into Win32k earlier? When and what?


In NT4.


[flagged]


The post identified a problem in Windows locking behavior, and showed the process that was used to reach the conclusion.

Your comment on the other hand tells us more about you than about the person you are criticizing.


Then why are you commenting? Kind of a crappy attitude you have there.


not really, I was kinda expecting the post would also contain interactions with Microsoft, leading to the resolution of the bug in update XX. The post kinda left me unsatisfied.


While I too am anxious to hear the outcome after Microsoft investigates the author still shared something they were interested in of which holds a value to the HN community as many of us love digging into a system and finding issues. This was the main purpose of the article.

Regardless the author may not know how critical Microsoft has classified this issue. It could be something they don't resolve for years. Since the Microsoft investigation and possible patch has no barring on the content of this article I wouldn't expect it to be held until such a time.

I'm sure we'll get a follow up on HN when or if it gets patched. Even better maybe posting this article helps get it fixed sooner :)


You can fix those problems by writing a post that would contain said interactions, resolution and update.


Sure, and you're entitled to feel that way, but that doesn't make his attitude any less crappy.


Microsoft aren't great at resolving reported bugs even if you pay them for it.


There's too many spinning wheels. Stop it! People are going to just stop using the internet. If I could, I would. But I can't. So please just build simple websites!


So what now ?

Should/does the author wait getting some traction on Hacker News and hopefully be noticed by dev at Microsoft, or is there some way to provide Microsoft directly these data, skipping Tier-1 support ?

The author seem to be working at Google, so he might get some leverage from that, but what about Random Joe ? Keep enjoying the bug "forever", I guess ?


There are many many MS bugs that last for years. Pretty much any time you search for a bug in MS software you will find someone in a support forum giving generic advice (known issue that we will work on, not a bug, please reinstall Windows and applications) and users complaining.

Typical example: https://excel.uservoice.com/forums/304921-excel-for-windows-... basic problem that has been there many many years. For this one there are workarounds but many others there are not.

The only real long-term solution is supporting competitive products.


>The only real long-term solution is supporting competitive products.

Not really viable when the only OS that has software parity is OS X and the only OS with hardware parity is Linux.

If we drew a venn diagram it would be a straight line of circles with windows sitting in the middle. To switch away you have to make a choice between less software or less processing/GPU power.


Yes. That's why Microsoft gets away with not being responsive to customers. However using alternative systems (e.g. Mac, Linux, Google docs, LibreOffice) where possible does help even if only a little.

There are also advantages to the free solutions: no requirements to manage licenses and some improved functionality (e.g. seamless sharing with Google docs).

The fact that MS is including bash on Windows shows that the pressure is working.


Did you read the article? The bug was already submitted to Microsoft and they are investigating.

So the answer to your question is: wait until they're done investigation and possibly release a patch.


I skipped the conclusion, though, I'd still be interested about the channel that could be used to report such issues.


From the article:

> This problem has been reported to Microsoft and they are investigating.


The author also used to work at Microsoft in the performance team...


The terrifying result of having ABI compatibility with the first commercially successful system of its kind from the '80s.

Granted, they aren't doing themselves any favours with their new straight-jacket style application ABI.


You are trotting out the same tired line without reading the article carefully? ;)

This is a regression. The article shows it worked fine up to Windows 7.


7 is the best


Regardless of how many windows/any other OS versions I use later on, I will remember Win7 fondly. Its just so.. less cumbersome. May be the drastic changes that went into its preceeding and succeeding versions have a role to play in this.


There is this old quote that Algol 60 was an improvement not only on its predecessors but also its successors.

The same could be said about Windows 7, I guess. ;-)


How is ABI compatibility with early systems the cause of this bug? You'll note that it wasn't present in Windows 7.


64-bit versions of Windows 10 (and 8/7 maybe?) are no longer compatible with 16-bit code.


Yep, no 64-bit Windows has 16-bit compatibility.


I read mouse as house and was confused for the longest time.


Same happened to me... initially I thought it must be some 1st world problem.


Just use linux


Staying responsive (under load) is also still a work in progress on Linux:

https://www.phoronix.com/scan.php?page=news_item&px=BFQ-Queu...


That is a disk scheduler and mostly for spinning disks. I haven't seen a stuck mouse cursor for a long time in linux.


I always found this peculiar about linux; even when the system is swapping hard and application windows/window managers completely freeze, the cursor always remains responsive and movement rendered without so much as a hitch.

I wonder if xorg has some sort of kernel support to enable this.


Not on my computer. Something like a WM will freeze while it waits in queue to read those 0.2kB it needs of the disk (poor software design, if you ask me). But the mouse will freeze when IO buffers get filled. Just writing a huge file to a (slow-ish) usb stick will make my whole computer freeze, including the mouse, because the kernel usb code doesn't limit the buffer to some sane size (there was a kernel patch, and there is an option, and even with it turned on the problem is still there) (note that the probable reasoning for that is the fact that usb sucks).

A mouse cursor, AFAIK, is a gpu thing, not that it matters (wayland, i remember something that, will use normal gpu rendering to render the mouse). In the UNIX haters handbook there is a section about X where it is written that displays used to have 2-3 planes (IIRC, 2 planes + a cursor plane. I do recommend reading the good book, as it is funny).

I remember some talk about better kernel buffers management just for things like this. Memory management in the kernel is one of those actually hard things.


Xorg used to get SIGIO on input; it's now less responsive and uses threads: http://who-t.blogspot.com/2016/09/input-threads-in-x-server....


My only disk is an M2 SSD, and when I run an OpenStreetMap import in the background, my Ubuntu desktop becomes unusable.

I don't remember if the mouse cursor is affected, though, and I don't know if the new IO scheduler really fixes this.


> I haven't seen a stuck mouse cursor for a long time in linux.

That's because the only cursor you have in Linux is the terminal one.


that's asking for downvotes, I voted this up nonetheless.


so basically a fork-bomb. I think Linux can still buckle under one, nothing that obscene...


No, the article says this is a regression in how processes close.


A fork bomb exhausts all system resources, this is not that, it doesn't actually use any resources, rather it locks the system out of them by serialising both process termination and UI updates.


Well, a fork bomb doesn't really use anything except it makes the process table management extremely time consuming. Not quite the same, but still conceptually similar


When I saw his workstation specs, I thought, thats the exact same one I have at work! Then I checked the bottom, yep, he's at Google too.


Does yours run Windows? Do you get to choose? If you could choose, which OS would you use? Any clear favourites with your peers for development? Just collecting anecdotes ;)


Lucky bastards


> the C++ compiler needs to see full class definitions to, e.g., know the size of an object and its inheritance relationships, we have to put the main meat of every class definition into a header file!

Without knowing anything about modern C++ and its compilers, this seems fixable.

I'm thinking a header compiler "hint" indicating the size of an object. When compiling the full class, you get an error if the number is wrong.


He's also written some stuff regarding Visual Studio perf.

https://randomascii.wordpress.com/2014/04/15/self-inflicted-...

Close to my heart as I use Visual Studio all day. Horrid piece of software.

I'll probably get downvoted for this.


I live in two worlds at the moment - supporting and maintaining a large PHP application (for which I use phpStorm) and developing .NET applications (for which I, obviously, use VS).

I've been a VS user for nigh-on 10 years now and always found the experience really fantastic. I think it's certainly more coherent that the IntelliJ-based IDEs (which, to be fair, are also very good).

Anyway, I'm genuinely surprised to see VS described as "horrid" - what specifically do you dislike about it?


"what specifically do you dislike about it?"

It's not the tooling I dislike it's the performance. No matter what I throw at it I still get the same experience. It feels like 95% of everything it does is blocking the UI.


Improving perf in VS is hard without massive rewrites. The fundamental problem is that this is originally a COM app (as in, heavily using COM to componentize itself) designed back in mid-90s. Consequently, you get all the wonders of things such as STA apartments, and code that insists running thereon.

As it gets rewritten, new managed bits don't care about all that stuff. But so long as there's one bit of legacy code anywhere in the particular flow that needs to run on STA (usually it's UI thread), you get this whole "you have 20 cores and 60 logical threads, but all those threads need to sync on STA, so everything is serialized and slow" thing.

Even for the new code, the problem is that all those old COM APIs that it needs to interact with (not just for VS itself, but also for the sake of backwards compatibility with third party extensions) are usually synchronous. So if you want background processing, you need to spawn a thread - but, of course, threads aren't free, either.


I have the same experience, on Visual Studio for Mac. I was debugging some C# code with an infinite loop the other day, and it locked up the debugger too! Every time. No way to pause and step through the program, so I had to rely on the OS crash log to figure out what was going wrong.


In my experience, IntelliJ has even worse performance. Xcode is better, at the cost of very limited features.


VS for .NET is a different beast than VS for C++, the latter being a far less tool-friendly environment. There's also the point that newer versions sometimes improve a lot over older ones, but licensing costs may keep people on older versions for too long (that, and the fact that with large C++ projects an upgrade is rarely straight-forward). This then ends up with people hating VS when all they've used is VS2008 on C++.

Not that I don't have any complaints (although those may very well be due to certain extensions), but I've always found VS to be much more responsive, stable, and featureful on the .NET side than C++; much of that certainly due to complexities of the respective languages.


FWIW I agree with you. Visual Studio looks like it was designed by fisher price, the whole thing takes up ~60gb of space, and if you're not running it on a high-end work station it's likely going to lag when editing large code bases. Not to mention it's closed source software.


It's against HN policy to downvote-bait.


> I'll probably get downvoted for this.

Obliged :)

No, seriously, why are people doing that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: