Interestingly enough Microsoft arrived at the same number. I think it really stresses how hard it is to reason about memory manually. I'm still surprised how much stuff is being written in non-safe languages even if people could get away with a managed language.
One thing that I just recently noticed when scrolling through some github repos is how much new software in the linux ecosystem is still written in C despite it being probably avoidable, like flatpak.
You know, I totally buy that 70% of the vulnerabilities in complex C++ code relate to memory safety, especially in something like Chromium, which is incredibly complex and includes a lot of third-party code that wasn't always designed at first to be robust to untrusted input, and also fast-moving with a ton of code added or churning every week. But I'm not sure I buy that memory-safe languages will consequently be as beneficial for safety (especially in other kinds of software) as that fact would suggest.
For one, not all software is like Chromium. If you look at something like OpenSSH, the vast majority of their security holes have nothing to do with memory safety and are just logic bugs (often caused by features that somebody added that aren't core to the basic SSH experience, e.g. code that interfaces with X11 or something) or protocol weaknesses. (http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=openssh)
The other effect is that in practice, memory-safe languages can come with security baggage of their own. If you look at the zillions of security holes in something like Rails or Wordpress or Django, a fair portion of them relate to an attacker's ability to invoke sophisticated-but-unintended behaviors that are more likely to be hiding in these managed languages (and their support libraries) than in something like C++. E.g. CVE-2013-0156 or CVE-2013-0277 or apparently any current Python or Ruby program that, even today, calls yaml.load on untrusted input. That kind of "security hole from unwanted latent functionality" is less likely to exist in C++. (I realize this is a contrarian view not shared by the vast majority of PL/security experts, but the ones I hang out with seem to interpret "memory-safe language" to mean "expert-written Haskell or, if you want to slum it, Rust" and are not thinking "random Ruby/Python/JavaScript/enterprise Java".)
Not to mention the countless high-profile security holes that have nothing to do with memory safety, e.g. Shellshock, "goto fail", Lucky Thirteen, BEAST, CRIME, POODLE, FREAK, Logjam, etc. Or bugs that are very relevant to Chromium but aren't really Chromium's fault and probably don't appear on their own list, e.g. Meltdown/Spectre etc.
I don't understand your argument. You're basically saying "it's possible to write safe code in unsafe languages and vice-versa". Obviously that's true, but I'm not really sure what that tells us.
I mean, supposing that Chromium was rewritten in a safer language, have we any reason to believe that these memory issues would be replaced by a similar number of non-memory-safety issues?
>That kind of "security hole from unwanted latent functionality" is less likely to exist in C++.
Why? That seems like a non-sequitur to me. For me the real difference between the projects you list is that Chromium and OpenSSH are not, as you put it "random Ruby/Python/JavaScript/enterprise Java", they are heavily audited and have a lot of resources allocated into preventing the very issues you mention. Comparing OpenSSH to Wordpress and attributing their respective security track record mainly to the languages they use is fairly absurd IMO. If you're clueless enough not to properly sanitize untrusted input can I really believe that you'd manage to write safe C code?
This thread is oddly reminiscent of discussions around gun control. Because something is not 100% effective doesn't mean that it's not valuable. Although I guess we do need these unsafe languages in case we ever need to overthrow a tyrannical government... no wait, I'm getting confused.
> Although I guess we do need these unsafe languages in case we ever need to overthrow a tyrannical government... no wait, I'm getting confused.
I have heard people say "we rely on exploits to be able to exercise software freedom on locked-down platforms" and "if games were written in Rust, a lot of the glitches speedrunners use would go away and we'd lose our hobby" so... you're not always off.
Wow. To equate logic errors with being clueless? You do not need to be clueless to not sanitize input. You just need to forget about it, or incorrectly assume the input is safe, or, basically have a bad day. I make bad assumptions and stupid logic mistakes all the time. I guess everybody does because I see logic bugs everywhere.
Also you assert it is absurd to compare WordPress and openssh In the context of this argument why? They are both widely used software with very high stakes on their reliability and security. They show very well the contrast that even though one is memory managed and the other is not, that does not stop both having serious bugs. Actually it shows memory management is a don't care variable.
On the other hand if chromium finds their project has much more memory management issues than other issues, then yes: memory management for the functionality of chromium seems to be a relevant facto upon which improvements can be done.
I think the previous poster was referring to SQL injection vulnerabilities from not sanitizing DB query parameters. And that's something you only f* up if you're clueless (like I was back when I used to do that).
I meant that input sanitization is generally an easier problem to solve than writing safe C. If you can't do the former reliably, I'm certain you wouldn't be able to do the latter.
Input sanitization should generally be handled very close to the interface, once you went through this step you should expect to only deal with safe data. Memory safety covers the entire application, if some non-critical debugging function 10 level deep in the call stack messes up when it computes the size of a message buffer you can have a remote code execution vulnerability.
>Also you assert it is absurd to compare WordPress and openssh In the context of this argument why? They are both widely used software with very high stakes on their reliability and security.
Wordpress is a complicated ecosystem with multiple plugins that each can introduce security problems. Its attack surface is also incredibly large since it's generally public facing and anybody can interact with a lot of the features by default. That coupled with the fact that it's extremely popular make it a very good target for attackers. It's also a relatively fast moving target since the web changes fairly fast and new features have to be implemented regularly.
OpenSSH does mostly one thing and does it well. Its surface of attack is a lot smaller and it's vastly easier to audit and test. Its development is also overseen by the OpenBSD developers, who are famously uncompromising regarding safety. The feature set is very stable and changes very slowly.
>Actually it shows memory management is a don't care variable.
The absence of evidence is not the evidence of absence. How much development effort went into making sure that these issues didn't exist? How many thousands of hours auditing the code and making sure it did what people wanted? In a safer language this time could've been spent auditing other parts of the codebase or implementing other features.
I'd also like to point out that one of the biggest OpenSSH vulnerabilities ever found (in the Debian version of OpenSSH where the maintainer heavily reduced the generated key entropy by mistake) was indirectly caused by a memory issue since the reason for the patch was a false-positive returned by Valgrind regarding use of uninitialized memory.
I think it's a fairly good point you're replying to and so don't shoot the messenger.
There are tons of comments, including here on HN on a regular basis, which one could use to conclude that WordPress should be more secure than anything written in C no matter the quality. And yet, it isn't true in that instance. So it's fine to advocate rust or whatever, but let's not make that other, more dangerous conclusion.
Yes, it is important to note that switching to a memory safe language does not magically solve all your bugs. My own personal anecdote: when I found multiple bypasses in the sandboxing mechanism on macOS, it was not because the relevant code was written in C++, but because the dynamic linker is a really complex system; the things I found clearly flew under the engineer’s radar when they were designing it and they didn’t think beforehand about how those components interacted together. Still, being able to reduce the attack surface area to solely be logic errors is much better than having to deal with memory safety thrown in as well. (Similarly, languages with stricter type systems magically get rid of issues like “you gave me a string and I wanted an int”, but the problems of “you forgot to check the password in this one case” still remain.)
I think reduction in the number of errors is a worthwhile goal. Switching from C++ to a language without memory safety issues while being just as fast would reduce the number of things the developer and code reviewer have to worry about. They could spend more time searching for bugs in the logic, potentially eliminating further issues.
In practice what I’ve found is that people prefer to deal in absolutes. A large reduction in a category of bugs isn’t enough, it needs to be eliminated altogether, they say. If it’s still possible in a contrived example, what’s the point of investing in switching?
It's a miracle that people who think like that use computers at all - after all, computers only reduce the time it takes to perform a clerical task or a calculation, they do not eliminate it.
No, it's a reflection of a very important heuristic used in programming (that's arguably much more fundamental on a philosophical level, but correct words to categorize it escape me): the zero-one-many principle. If there's more than one something, you have to treat it as if there may be an unbounded number of something. You can only get it out of your head if you can put reliable bounds on it; the best if you can prove there's less than two of something.
A thing can happen zero times (never). An example might be 1+1 =0. You could get really unlucky with cosmic rays or something, but really adding two registers, both containing 1, that result isn't going to be zero unless there's some sort of hardware failure.
A thing can happen once. an example might be, you can delete a file once. There are ways to get unlucky, of corse, but once it's unlinked it's gone.
if it happens more than once, you really should probably think about an unbounded number of times. now days, that sorta means 2^64 times. There will be bugs when something overflows int64, but I hope you get the gist.
The parent comment is talking about invariants you can use in an algorithm.
I think you might be worried about python vs C or something along those lines. Really, it should all work with a pencil and paper. Which is obviously going to be slower than pushing around electrons. But if you can find those invariants, 0-1-many, you can make a better algorithm. If you're stuck with a pencil it'll still make that faster.
It's not a category error if you're doing an initial guess. Not to mention, particularly in information security, most of "risk calculation" is intuition and guesses, maaaybe sometimes plugged into a probabilistic framework with something resembling rigor.
I'd say binary considerations are incompatible with risk calculations because no person and no procedure is perfect nor perfectly followed, nothing is completely bug free, etc. Some small part of a calculation might appear binary, but other terms usually dominate.
Risk calculations are far broader than infosec and don't deserve the dismissiveness you seem to be casting towards them. Risk calculations are the core of business. Almost every decision a business makes is a risk calculation; every action has an opportunity cost if it isn't intrinsically risky, and actions with certainty are very rare.
(For the avoidance of doubt, I believe that use of memory-unsafe languages should be avoided if reasonably possible, but there are still plenty of reasonable reasons to use C, C++ etc. instead.)
I'm not trying to dismiss risk calculations. I appreciate them and challenges involved, having worked on tools supporting risk calculations in corporate space.
I feel this thread is getting out of hand. I initially replied to say why, in general, the kind of thinking that makes you unsatisfied with reductions but not eliminations of concerns, is common among programmers - because it's a sound heuristic. Reducing is good, but eliminating is better.
In many ways yes but not always. After writing lots of C++ and then going back to managed world I often find cases where the opposite is true. Questions like "will this be mutated?" or "will the library shallow copy, deep copy, or hold a reference to this object?" are very hard to answer in most GC languages.
You're exceedingly lucky if the developers you work with these days are aware of the differences between shallow and deep copy, or how object references work.
This is where I make the case for having knowledge of asm/C/C++. The JavaScript developers that started on JavaScript or, maybe, Ruby have no clue what the hell is going on in the language they are working in. I see this all the time. From senior and lead developers, too.
They have no mental model for how pointers work. Which means they have no concept of what an object is, on a memory-organization level. They aren't always aware of when mutation will occur, and they are entirely too reliant on operations that are inefficient. In Lisp, this would be the developer than uses CONS everywhere. There is no awareness of the allocations behind the mechanisms. There is premature optimization and then there is not driving into the damn pothole in the first place! One of these is F1 racing and the other is basic driving skills.
I believe this may also explain why so many developers are so bad at using git. Git is entirely based on pointers. There is an elegance and simplicity to git that is lost on many people.
I’m not sure that is the underlying problem, the lack of knowledge about pointers, I think it is more about that the job programmer today is one of the most common jobs today, that alone lowers the general skill level.
In the old days people usually chose to become a programmer for a reason, today it is like any other job.
Most devs learn what the need to do their jobs. And you can be a very effective Ruby developer without needing to know much of this information.
Just like you can be a very effective C or C++ developer without knowing anything about css (which in my opinion is far more difficult to understand than when and how stack vs heap allocations are made)
I've never found programming so relaxing as when working in languages where almost everything is immutable. I think we're living beneath our privileges here.
I can agree with that when comparing with plain C code. But reference-counting in C++ and Objective-C leads to exactly the same problem as with managed code. What saves C is that reference-counting there leads to a lot of boilerplate and bugs. To avoid that one needs very clear ownership model.
I saw an essay by someone discussing going back to C and eschewing leet pointer and decentralized memory allocation for a system of centralized memory allocation and passing handles[1] instead of pointers. Which appeared to make most of the memory safety problems go away.
[1] handles could be unboxed to expose a pointer. But unboxing had machinery to detect and trap on bad handles
That reminds me about early Fortran where everything was statically allocated including local variables and function return address. To simulate data structures one used arrays of various dimensions. Effectively array indexes where handles. This was not memory safe as there were no bounds checks. But add those when the compiler cannot eliminate them, and one gets fast and memory safe language.
Edit: that static allocation was essentially a centralized store. Everything repeats itself.
That's the essay. I think it's interesting because he's mimicking the way I've been writing firmware. I've found most of the issues with memory safety disappear when you use static and centralized resource allocation, avoid monkeying with pointer arithmetic, and avoid making temporary copies of pointers 'for later'.
Sure, you can shared_ptr everything and pretend your are writing C#, which as you say circles back into the same problems of spaghetti-ownership. The difference with c++ or rust is that you have options: plain &-reference for non persistent pointers, const &-references for read only non persistent pointers, copy for true copies, move/unique_ptr for transfer of ownership and shared_ptr for shared ownership. All this information is readily accessible to the caller through the function signature.
> I can agree with that when comparing with plain C code. But reference-counting in C++ and Objective-C leads to exactly the same problem as with managed code. What saves C is that reference-counting there leads to a lot of boilerplate and bugs.
You've already been greyed out, but I agree 100% with what you said.
That's my main beef with this kind of bi-weekly discussion where people always put C and C++ in the same bag.
I rarely, if ever, make memory management errors in C. The model can't be more simple. If you don't explicitly allocate, you don't have to care about it (in the general case, it is on the stack and will disappear when exiting a function, anyway you also don't have to care about those implementation details, just do as if it disappeared with scope). If you allocate something (on the heap), then you free it later. Malloc(), free(), end of story. Okay, don't keep pointers to areas you will free or realloc, of course. Any function which returns pointers tomemory chunks are always to be treated as malloc'ed, those objects can never be on the stack and never have a fancy automatic management of any sort.
Now almost each piece of code I wrote in Objective-C contained memory errors/leaks. Be cause 1. I didn't grasp the more complicated, less deterministic (so to speak) models and the jargon well 2. you (well, I) never know right away which type of management was used for the object that some function returned, you have to check the doc and pray it is clearly written 2.b you have to keep that information in mind until the moment when you'd like to release the said object in order to release what should be released and not release what is/will be/has been automatically released now/later/sooner/whoknows.
And I don't even talk about the abomination that is Objective-C++, where I drowned in muddy waters.
Correct, that is true for any language where you deal with raw threads, managed or unmanaged. That tells us that we should avoid using raw threads and instead use some other API to achieve parallelism.
It's not true in Rust. In Rust, objects cannot be concurrently accessed from multiple threads unless they are explicitly marked as thread-safe (with the Sync trait), or wrapped in a a Mutex, which ensures that the safety variants are upheld.
Of course, you can bypass this by using 'unsafe' code, where you are explicitly telling the compiler that you have manually verified Rust's safety guarantees.
This is why a significant portion of Rust's community gets agitated with (possibly unnecessary) use of 'unsafe' in popular libraries.
Then I stand corrected and applaud the creators of Rust for haven taken these measures for writing thread safe code, because it is supereasy to get threading wrong, even for the most experienced developer.
Interestingly enough, this was the raison d'être of the Rust project at the beginning, while zero-cost abstraction and memory-safety without GC (and the borrow-checker) weren't part if the initial goal. When Rust was first released as a research project it had green threads and a GC (interestingly close to Go actually), but it was already designed with data-race freedom in mind.
And in the last quadrant, you can use non-raw threads like the Task or Parallel APIs and still have threading issues in C#.
I'm trying to learn Android and some libraries give zero indication of what thread your callbacks will be called on.
Maybe there's a convention somewhere that says "On Android, you can't assume anything about callbacks, so always assume you're on some anonymous worker thread and lock everything or dispatch to a thread you own"
You cannot assume anything about process or thread lifetimes on Android, as the simple act of rotating the screen will restart your application, and it can be killed at any moment and restarted later, due to memory pressure, or because the use has switched applications.
So whatever was the state of your application can be completely different when the callback is supposed to be invoked later on.
Memory safety and performance are not a contradiction, they are widely independant. Yes, some strategies like GC induce an overhead. But in the age of multiprocessor machines, this overhead gets very much reduced by parallel collection. And practice has shown, in programs with a lot of dynamic memory allocation, garbage collected languages often perform even better.
There are different approaches, like memory safety by the compiler as Rust does it.
And with "performance" in mind, you shouldn't forget, that most large C/C++ programs tend to use libraries to provide some amount of safety, but those libraries are of course adding some overhead too.
Finally, the less time you spend on questions of memory management, the more time you have to write an actually fast program.
Sorry, this is hand wavy nonsense. One counter example, we switched from Cassandra (JVM) to Scylla (C++). It was a win in terms of both query latency and infrastructure costs as we required fewer machines to handle the same load.
As for having more time to write a fast program... that's funny. If you want a fast program on something JVM based you're pretty much going to be spending the majority of your time writing things in a way where the GC plays as little role as possible.
Sorry, this is hand wavy nonsense. One counter example, we switched from Cassandra (JVM) to Scylla (C++). It was a win in terms of both query latency and infrastructure costs as we required fewer machines to handle the same load.
Sorry, this is not hand wavy nonsense. And what you are providing is called annectodal evidence.
Also, your universe seems to consist only of the JVM as a memory-safe alternative to C++. Yes, there are a lot of bloated, badly performing programs implemented in Java. However this isn't a given. Yes, some design decisions for Java introduce the risk of bloat, but you can avoid them at much less effort (and risk) than memory corruptions and new features like the value classes are reducing the bloat quite a bit. But still, the JVM is extremely high performance, so for speed in surprisingly many cases, it often beats C++. Virtual method calls are just better optimizable at run time, the Java JIT creates excellent code and Java has some of the best garbage collectors, so at really dynamic memory loads, it beats any manual management by a wide marging.
And of course, there is a whole world beyond Java as alternatives. Rust has been explicitly designed to excel at tasks C++ traditionally shines for, while giving your full safety.
There is an open-source project by LMAX (a forex trading company) called Disruptor[1] that squeezes as much as possible out of the JVM. It's awesome. I ported it to C++ years ago when I wanted to learn about low-latency techniques. However, if you look at the code they need to actually break out of the JVM's safety net to get the performance they need[2]. I couldn't help but ask myself why they didn't just use C++, and when asked one of the devs did admit that their own C++ ports had an approx 10% performance increase (although this was ~7 years ago maybe)
Rust is certainly interesting and it's on my radar. I wonder though, when it comes to having it in use in anger if its guarantees turn out to be over sold, just like the JVM's safety claims were. Time will tell.
Edit: I tend to focus on comapring against the JVM because pretty much any framework you use on The Cloud is JVM based. I'm of the opinion that there are cost savings to be had if these were ported to more appropriate languages, hence the Cassandra vs Scylla comparison. The money saved was 'noticeable'.
Which is why such projects use the JVM, they save money in developer salaries, developer pool, bug fixes due to security exploits, available set of tooling and libraries, while caring to hand optimize a tiny set of libraries for specialized use cases.
Java 15 just accepted the JEP for native memory management, yet another stepping stone for having value types support.
If the cadence continues, Java will eventually have all the features that it should have had in 1996, had Sun properly taken into consideration languages like Modula-3 and Eiffel.
Which you can get today in a language like Swift, C#, Nim or D, productivity of GC, type safe, while having the language features to do C++ like resource management.
There are notable examples where garbage collection has both better throughput and latency than manual memory collection: persistent data structures come to mind, because in the absence of garbage collection, you have giant awful cascades of reference count manipulation any time a structure would get freed.
You'll never actually see anyone tout the benefits of using GC for this, though, because the performance characteristics of persistent data structures are so horrendous compared to mutable ones, no one actually uses them in C++.
You can use GC to implement the persistent structure in C++/Rust just fine. But then you pay the GC cost for only that structure, not for all the other things.
I wouldn't be so sure. There is no easy answer. It typically wins in trivial benchmarks, where the total heap size is small. However once you have a big heap of other long lived stuff, there is a significant indirect cost caused by scanning the heap, memory barriers and evicting good data from caches. This cost is negligible only if your total heap size is much larger than the live memory set. Also modern manual memory allocators are not as slow as they used to be a few decades ago and they actually allocate in low tens of nanoseconds.
If your problem demands many heap allocations, GC wins also vs. manual memory management. Heap allocation in a generational GC is as fast as stack allocation, you don't get fragmented heaps.
It is nowhere near stack allocation, please stop this nonsense. There was a paper claiming that, but the requirement was to set heap size 7x bigger than needed.
Stack is also very hot in cache. Memory that GC is handing allocations from is not.
I recently ported some of Cassandra code from Java to stack-only Rust and I got ~25x performance improvement, most from avoiding heap allocations and GC.
That depends on the allocator used. The default libc one works but isn't great performance-wise. It is possible to intercept calls to malloc/free by using LD_PRELOAD on Linux. That will allow you to use allocators such as jemalloc or tcmalloc instead.
Of course, repeatedly allocating and freeing is poor for performance. Cache/pre-allocate when you can. This goes for managed languages too.
It's usually a trade-off since C++ may be more time consuming and therefore most costly to write. It's also a lot harder to get bug-free because of the manual memory management.
Both Java and C# may be somewhat slower, but the maintainability and freedom from memory management issues more than makes up for this.
Any engineer worth his salt will take this into account.
> Yes, some strategies like GC induce an overhead. But in the age of multiprocessor machines, this overhead gets very much reduced by parallel collection.
If you unlucky and given GC is not well optimized for given workload, then memory usage overhead can be huge. Just a few days ago had such problem with Go (HeapIdle grew until OOM).
Go GC is less mature than Java GC, but Java GC is not free too - sometimes you need to spend a lot of time time to tune GC settings or to optimize code to avoid GC problems.
In my case case it would be faster to use malloc/free than to spend time fighting with GC (looking for a workarounds).
On average GC saves development time and allows to avoid memory management bugs, but in some cases overhead is big and developers have to spend more time, not less.
I have not claimed that a GC is always faster. Indeed, having a GC enables some to write a totally bad performing program. However this doesn't contradict that fact, that in many cases a GC does not only not mean a slower program, but sometimes a faster program. Usually, you don't have to "fight" the GC. In situations, where manual management is vastly better, you do use object pools and preallocated arrays even in a language with a GC.
One of the Java vendors acquired by PTC, Aonix, used to sell real time Java implementations for military deployments, including weapons controls and targeting systems.
You don't want a couple of ms pause when playing with such "toys".
Both C and C++ have zero built in knowledge about parallelism, so in a modern world where most things are parallel and asynchronous I don’t agree at all with that statement.
> Both C and C++ have zero built in knowledge about parallelism, so in a modern world where most things are parallel and asynchronous I don’t agree at all with that statement.
And ?
C (even Fortran) had threads and was used to create high performance program with high degree of parallelism before any "concurrent" modern language was even born.
You can not agree if you want. But fact are there, 98% of programs running on the biggest "parallel" machines nowadays (supercomputers) are C, C++ or Fortran.
You don't need to be "designed" concurrent to be efficient at it. The same way you do not need to be designed "Cloud-native (bullshit)" to run on a virtual machine.
I think that it sets a different mentality if it is part of tools you use, you solve problems differently. It is of course possible to write highly concurrent code in traditional sequential languages and your examples proves that, but supercomputers are a special case with budgets for that. I'm talking about the general case, and I think the post I replied to also assumed that.
We have to give programmers, with different backgrounds & training, tools to write high performant code in their everyday job. Many of the tools we use today are not designed for that. We are stuck in a mental model 50 years old that is no longer true.
Here are some interesting stackoverflow answers. You are of course free to dismiss these answers as anecdotal.
Both C and C++ have had a concurrent memory model and threads in the standard library for a decade now. And POSIX threads which are pretty much the same thing have been there for a quarter century on *NIX.
What do you mean by language constructs? As in, a `synchronized` keyword? Or are you thinking along the lines of `async` `await`?
If you mean the `synchronized` keyword, then correct, they don't have that. Most languages do not have that concept. C++ does have mutexes and has had them since C++11 (nearly a decade). C++ also has as part of the language spec the concept of threads, again, there since C++11.
If you meant co-routines, then C++ just added them with C++20.
Or do you mean something different like green threads (ala go)?
C++ certainly has concurrency constructs and has been expanding them since C++11.
Like Gos goroutines or Erlangs processes. And more of message passing. C++ has added many of these things later on in the standard library, bolted on afterwards.
If you think that using libraries to do things in C++ means they are "bolted on", then you don't understand C++.
The entire design of C++ is to enable efficient libraries for these kinds of things to be built. And it does, and has, and will continue to for decades more.
The real point of using C and (less so, but still) C++ is not to have the ultimate performance, but to have ultimate control over the program execution. Garbage collection and real-time code are not compatible, because GC introduces random, unexpected stalls in the program flow; even simple reference counting can be harmful, if the last reference get eliminated in the wrong time, and creates a cascade of deallocations.
My personal experience from reading code that uses Chromium's C++ garbage collector is that that's often not true. While there might no longer be use-after-free errors, it's also no longer possible to make assertions like "object X must outlive Y" because object Y could be referenced arbitrarily and kept alive longer than expected. To get around that, objects might have an explicit shutdown step. But shutting down a large graph of objects is often fraught with peril, especially when that work can span multiple processes.
Some things, namely array indexing and RefCell borrowing, have unavoidable extra runtime overhead in the default safe usage, but it's unlikely that it is significant (often array bounds checks are either essential and thus needed in C++ as well or optimized out by LLVM in Rust, and RefCell usage is generally rare), and you can use unsafe unchecked operations as well.
Yes, they do exist. The safety of Rust comes from the compiler (which takes its time), not from run-time checks. For certain style of code, the JVM is actually faster than C++. You also have to consider, that a lot of things you do in a C++ program to make it safer add up performance costs.
It depends on the nature of the code. Static code is very fast in C++, but dynamic code is less so. Virtual method dispatch in C++ is comparatively slow. Hotspot can eliminate much of the cost by using runtime type information. Using runtime information generally is a way for Hotspot to perform optimizations, a static compiler cannot do, because they would be a bad tradeoff. Heap allocation in the JVM is as fast as stack allocation, this gives another boost vs. heap allocation in C++. And even GCing the youngest generation is basically cost free, as long as you don't have many surviving objects.
I don't claim that Java is always faster than C++, that would be silly and there are plenty of Java programs in the wild which proves that it isn't. But there are quite some tasks at which Java is indeed faster.
In the general case, virtual dispatch has an overhead in both C++ and Java (which btw is only <5 machine instructions), but it's true that Java can sometimes eliminate it with runtime information, which C++ can not use.
However, in my experience, using virtual dispatch is a relatively rare ocurrence in C++ (compared to the vast majority of method calls). On the other hand, on Java, most of _everything else_ is indirect and has overhead: All objects are allocated on the heap, primitives (int) are often objects (Integer) where they ought not to be, all objects have 16 bytes of overhead, etc.
But the JVM will convert those heap allocations to stack allocations! And it will realize those Integers are used as int and remove the overhead! And it will realize you're not using the information on the header of every object!
Perhaps in synthetic benchmarks, but in real programs, where there are an almost infinite amount of code paths, dumb 'data transfer' objects are common and things need to be modularized, the JVM is forced to assume the worst case can happen (even if you as an human can prove that it won't happen) and inhibit those optimizations. And now you have indirect accesses everywhere, memory overhead (=cache trashing) everywhere, the runtime can't vectorize that tight due to Integers, etc.
In fact, I can't think of any domain where there is heavy competition and where high performance is a determining factor where Java has won to C/C++. In browsers, it certainly has not.
The number of instructions is much less important than what they are doing. In the case of virtual dispatch, it's doing a memory lookup. If that memory is in cache it could be relatively inexpensive but not guaranteed. However, if you have to hit main memory then things are much slower.
> But the JVM will convert those heap allocations to stack allocations!
I was surprised recently to learn this is not the case. (at least, not with hotspot) The JVM will try and "scalarize" things (pull the fields out of the object which may push them onto the stack) but it won't actually allocate a full object on the stack (OpenJ9 will, but I don't see people using that very often).
It is also somewhat bad at doing the Integer to int conversion. That is mainly because the Integer::valueOf method will break things (EA has a hard time realizing this is a non-escaped value). Simple code like
Integer a = 1;
a++;
can screw up the current analysis and end up in heap allocations.
There is current work to try and make these things better, but it hasn't landed yet (AFAIK).
> In fact, I can't think of any domain where there is heavy competition and where high performance is a determining factor where Java has won to C/C++. In browsers, it certainly has not.
I think the realm where the JVM can potentially beat C++ is, funnily, work that requires a lot of memory. The thing that the JVM memory model has going for it is that heap allocations are relatively cheap compared to heap allocations in C++. If you have a ton of tiny short lived object allocations then the JVM will do a great job at managing them for you.
> I think the realm where the JVM can potentially beat C++ is, funnily, work that requires a lot of memory. The thing that the JVM memory model has going for it is that heap allocations are relatively cheap compared to heap allocations in C++. If you have a ton of tiny short lived object allocations then the JVM will do a great job at managing them for you.
Except C++ also has many other options besides malloc() each of these individual objects.
Allocation heavy C++ can be pretty slow. It's also true unoptimizable (pure) virtual methods performing little work have quite a bit overhead that JVM can avoid.
Of course you normally try to avoid writing C++ like that.
The claim is probably that idiomatic code in one case is faster than idiomatic code in the other, not that you can't write C++ code that isn't equally fast.
For example, if you use `std::vector` or `std::unique_ptr` you would be deallocating memory when those go out of scope. The JVM might actually never do that, e.g., if the program terminates before sufficient memory pressure arises.
Writing Rust code that leaks a `std::vector` is trivial, but doing the same in C++ actually requires some skill.
I disagree in that it requires no skill: it requires the user to avoid doing the obvious thing:
vector<int> foo {...};
and heap allocate a vector on the heap with `new` (without using a smart pointer), and then avoiding your linters warnings about this (e.g. clang-tidy).
If this is common in your place of work, I truly pity you.
Also, trading a call to `free` for a second call to malloc, and a second pointer indirection to the vector elements, isn't a very effective way of improving performance to beat the JVM. The whole idea behind leaking memory is doing less operations, not more :D
In Rust, leaking "the right way" is trivial (mem::forget is safe), but in C++, leaking the stack allocated vector probably requires putting it behind an union or aligned_storage or similar.
> the problems of “you forgot to check the password in this one case” still remain.
Maybe type systems could, in some cases, prevent this type of problem. You could construct a system where performing a password authentication returns an object of type Authorized, and performing operations requires passing in objects of type Authorized. I don't know how often this is useful in practice though.
If your type system can handle the case where I can divert code flow along an unsandboxed path by replacing functions from the system libraries even before you can achieve code execution (and long before you certainly though I had code execution), I would like to hear it. But expressed in a less brusque way: yes, encoding state in your type system is generally great, and there is a lot of work being done in languages where you can formally “prove” guarantees about your code. The issue is that your proofs only hold true if the system they run on is correct, you chose exactly the right thing to check, you didn’t just leave out part of the program because you couldn’t fit it into the type system…it’s a nontrivial problem.
Yes, I replied to the one I meant to. The reason we use type systems is that they let us prove things about our code: simple things like “this is a string, not an int”, more complicated things like “this variable cannot be null”, and to the advanced “I have written up a proof in a theorem-proving language that this program will not deviate from the published specifications in these ways”. My point is that the thing that people really want to prove, which is obviously “does this program work and do so correctly”, is fairly impossible to do so in practice due to some of the problems I mentioned. You can only approximate at this along certain domains.
I do this all the time in Scala. Define a marker for "required security level", slap it in a free monad or add it to an existing effect stack, then it just naturally gets propagated right the way up, and you can then put the checking where it makes sense (e.g. right at the outermost request handling level).
A language with a decent type system can easily take care of issues like forgetting to check the password along one code path, or at least significantly reduce the effort involved.
It doesn't necessarily solve this completely, but it can be used to significantly reduce the amount of code you have to manually audit. For example, you could use tokens to represent permissions, and arrange your code so that only a small portion of your code can create them. Then you can rely on the presence of that token indicating that a check has been performed in the rest of your code.
The type system is a tool to reduce the space of invalid inputs.
This is also easy. There are different ways to do it.
Here's an example: create a type, such as "RoleAuthorization". Make it a required parameter for access-controlled methods, either through the constructor of the parent object or explicitly in the method.
In order to create a RoleAuthorization, you must first pass the user's Role object in an authorization method.
If a method requires authorization, the compiler will complain if you don't first check the current user's role through the authorization method.
> I'll wait.
If you're sincerely interested in discussing something, antagonizing someone this way (and implying that you are infallible) really doesn't help.
If you just wanted to make what you believe is a statement of fact, just make it.
If you don't want to discuss the issue at all and would instead prefer to be unchallenged on this topic, why write about it in a comments section?
That assumes you know all the roles ahead of time. In many systems they will consist of a large number of variant both in how they authenticate and in what they're allowed to do. It may even depend on environment ideas which cannot be represented in types. (like: user X can issue commands Y, Z in times when they're on call, where "on-call" is a custom operator's rule, not something known earlier)
Not to mention the countless high-profile security holes that have nothing to do with memory safety
I'm not sure this is a sensible way to compare - Heartbleed is a memory safety bug and is a bigger bug than the rest of the ones you've listed combined. The various terrifying nameless iOS exploits that have had actual real-world, documented usage are not clever crypto protocol bugs with catchy names, either.
OpenSSH focuses on security and correctness, not performance. That allows to use simple idiomatic C. Still, as with Chromium, it does use process isolation to mitigate memory-safety bugs.
Meltdown & Spectre is actually consequence of C, Intel adapted their CPUs to emulate the architecture that C implements, so you as developer can still believe that you are working close to the metal. However modern CPUs are very complex so the abstraction was leaky, thus bugs.
This is why we have to abandon C, not only is the language unsafe, it has also driven hardware manufacturing to a bad state with a reinforcing feedback loop, the more C we write the more hardware manufacturers want to make you believe that we still are working on PDP-11.
A very odd article. Everything that said applies to any other compiled programming language, not only C. Even if Pascal or Ada had become more popular than C/C++, it would still have lead to the same architectural decisions we see in x86. In fact the biggest offender - SMT is actually the opposite of instruction-level parallelism the authors are blaming the x86 for.
Chromium, Mozilla, and Microsoft have posted studies that attribute ~70% of their security vulnerabilities to memory safety.
I'm not sure which point you are trying to make, but yes, 100%-70% = 30%, i.e., there are many security vulnerabilities in Firefox, Chrome and Windows that are not attributed to memory safety by these projects, and preventing memory unsafety wouldn't remove all of them (at most "only" 70% of them).
Nobody is claiming here that fixing memory unsafety in software fixes hardware bugs, nor anything about OpenSSH or other projects, and your anecdotes do not show how many security vulnerabilities are caused in those projects due to memory unsafety.
The "Rule of 2" link in the article actually says this : "A recent study by Matt Miller from Microsoft Security states that “~70% of the vulnerabilities addressed through a security update each year continue to be memory safety issues”.
C on Unix-like systems isn't so surprising. The K&R book devotes a large section to the Unix system interface for example, the Linux kernel is one of the biggest C projects in the world, and even when using a higher level language to write your programs you're ultimately calling into libc or wrapping other C libraries or syscalls.
It's a case of "When in Rome, do as the Romans do".
As Rust shows, it’s not really impossible from a technical standpoint in many areas. The issues are more to do with convention and legacy and familiarity, for the most part. (Technically, C does have a couple of advantages: it’s stable, familiar, and extremely well supported in most places, and has basically had the world revolve around it for a couple decades at least. But there are a number of domains where these are surmountable.)
C is also much better at dynamically linking, and results in much smaller binaries. Still not good justification for flatpak. Or crun (the reimplementation of runc in C) for that matter. The Red Hat crew has a problem with their reliance on C, Python, and Go. They need something like Rust (or even Vala, which does ARC from what I can tell) in their arsenal.
What exactly is the issue with Rust and dynamic linking? Is it that it can't dynamically load Rust interfaces? There's nothing wrong with linking plain old symbols from shared objects in Rust.
Rust has no stable ABI contrary to C and also support much higher concept like polymorphism or high order functions which are much harder to represent in a simple ABI.
Real dynamic linking on that (dlopen-style) can be a real nightmare, specially between compiler versions.
You can also add that Rust do not have a runtime, which is in many cases an advantage, but also tend to make libraries bigger.
Rust can create dynamic libraries that import and export symbols using the C ABI, so it is not worse than C in such a comparisons, since it can do exactly the same thing as C.
Similar to C++ though, e.g. explosions of template expansions, you don't have to use those higher-level and harder-to-keep-concise features. Other fancy languages have stable ABIs, it's not impossible, nor does it require that it's complex to use.
Though the average one is probably harder, yes. I'd wager that it's mostly lack of care or motivation though, made a bit worse by common language features.
> Similar to C++ though, e.g. explosions of template expansions, you don't have to use those higher-level and harder-to-keep-concise features.
C++ has a stable ABI. Fragile, but stable. Or doing things like updating libQt on your Linux distribution without recompiling half of the world would be close to impossible :)
This is a relatively new feature and there have been one or two ABI breaks since they thought it was done that required recompiling half the world. I think it's been solid for at least most of the last decade though so... stable enough? I'm not sure if there is an actual spec somewhere though, certainly not a proper one from a standards body. It's just "whatever GCC or MSVC do on that OS/CPU combo" and those two happen to have stopped changing the ABI every compiling release at some point. They're both at least based on the relatively well specified Itanium C++ ABI though so that helps.
C++ is also older than Rust by, oh... a few years.
I'm not aware of any plan to not stabilize Rust's ABI, it just hasn't happened yet. It's completely fair to label that a deal-breaker for using Rust, but trying to draw a hypothetical box around it with the label "its ABI will be difficult to create and/or use" seems a bit unwarranted.
It's not, people are just incapable of making realistic assessments of what level of performance they require.
If you're using C or C++ "for performance" and not using a profiler in day-to-day development, you don't need to be using C or C++ and should be using a memory-safe language instead.
"Wirth's law is an adage on computer performance which states that software is getting slower more rapidly than hardware is becoming faster.
The adage is named after Niklaus Wirth, who discussed it in his 1995 article "A Plea for Lean Software".[1][2]"
We could do better languages. But it just takes insane resources to compete with the others ecosystem. IDE's, the Pandas, the format on save.
I've been a professional C++-developer in the past and one of the great things is the tooling and the just sheer knowledge of my past C++ colleagues. They know what happens in the kernel, they know the performance optimisation tricks, they know a lot, because the knowledge they gather just have a longer lifespan.
Ask a JS-developer and they will tell you all the web frameworks that came before React and their quirks...
Software is getting slower due to poor algorithm choice, not due to the one-off factor of maybe 2 that you pay by switching to a compiled but memory-safe language. If C++ was a sensible choice in 1995, 18 months later you would have got the same performance out of a memory-safe language; we're now 10 cycles of Moore's law down the road, our hardware is 1024x faster and our languages are certainly not 1024x slower (unless you're using a scripting language). It's more about stuff that's accidentally quadratic, and that kind of error is if anything easier to make in C++ than in a more concise language where it's easier to see what you're doing.
C++ developers may be smart people because you have to be smart to do anything in C++ - imagine if the same brainpower that's being spent tracking memory usage and pointer/reference distinctions could be put into your actual application logic instead.
I'd argue that the focus on performance is what drove the developers to C++ in the first place. So they will know both the algorithms and the low-level details.
If performance is on your mind constantly, why wouldn't you choose the one with the least restrictions on what you can achieve?
It's not like those people would find joy in being locked into the JVM instruction-set.
I agree with you that algorithm choice is what's most relevant.
However: I'd argue that accidentally quadratic algorithms are easier to hide in a concise language. Writing out a quadratic loop explicitly takes space, and that space alone might make people pay more attention than some subtle implicit language construct. Either way, the most common source of unintended quadratic (or higher) behavior is helper functions and library calls.
The other thing to keep in mind when it comes to algorithms is that cache behavior and therefore memory layout matters a lot for performance on modern hardware. Managed languages really stand in the way of optimizing memory layout, which can be a systematic performance disadvantage compared to C++.
I do hope we get some more innovation in the design space occupied by Rust, where you get fairly explicit control over memory layout, but still have statically checked memory safety guarantees.
> I'd argue that accidentally quadratic algorithms are easier to hide in a concise language. Writing out a quadratic loop explicitly takes space, and that space alone might make people pay more attention than some subtle implicit language construct. Either way, the most common source of unintended quadratic (or higher) behavior is helper functions and library calls.
I disagree. When every loop is full of cruft around setting up the iterators, it's easy to drift past what's actually happening. In a language where looping over a list takes a single syntactic token, it's a lot more obvious when you've nested several such loops.
> The other thing to keep in mind when it comes to algorithms is that cache behavior and therefore memory layout matters a lot for performance on modern hardware. Managed languages really stand in the way of optimizing memory layout, which can be a systematic performance disadvantage compared to C++.
C++ doesn't really make cache behaviour clear either though. I agree that we need better tooling for handling those aspects of high-performance code, but they actually need to come from somewhere lower-level than C++.
Nested loops are obvious in most languages, including C++ -- unless you happen to work with people who don't indent their code properly, but then you have bigger problems than the choice of language.
The real problems tend to come from where the quadratic behaviour doesn't come from nested loops, but from library calls. The canonical example of this is building up a string with successive string concatenation in C.
As for cache behaviour, C++ allows you to control memory layout, which is really what's required there, while most managed languages don't give you that control at all.
> Nested loops are obvious in most languages, including C++ -- unless you happen to work with people who don't indent their code properly, but then you have bigger problems than the choice of language.
We live in a fallen world. In a large enterprise codebase there will almost certainly be parts that aren't indented correctly. And even if everything is indented perfectly, the sheer amount of stuff in a C++ codebase makes everything far, far less obvious.
Drivel. You can't have it both ways. It's easy to see what you're doing in C++ because you have to do it! It's the whole point of the language, and apparently the source of bugs.
Correct me if I'm wrong, but I doubt you've ever written a program in modern C++?
"Modern" C++ is the No True Scotsman of programming languages, so you define it clearly and then I'll tell you whether I've written any. But I've written C++, including professionally. I expect to write some at work tomorrow, in fact.
It's not easy to see what algorithms you're using in a C++ codebase, because most of the lines of code are taken up micromanaging details that are broadly irrelevant. Yes, C++ makes it easy to tell whether you're using 8 bytes or 16 in this one datastructure. But you drown in a sea of those details and lose track of whether you're creating 10 or 10,000 instances of it.
I'd define modern as using RAII extensively and using C++11 at least.
As for algorithms, I honestly don't know what you mean. They're all documented online with respective big-O running times[1]. If you're talking about making unintended copies of things, then yes, C++ does expect you to know what's going on... it's the whole point of the language. If that's too much for you then don't use it, but that doesn't make it a bad language (I'm not denying it has some hair-pulling moments) Use std::move() when appropriate.
In modern C++ the reference count is always 1 or 0 for 80% of the code and so there is no need to actually maintain the count. That other 15% needs a count which I agree is slower than gc done right. The final 5% has cycles and cannot be done by reference count.
Not really. Only shared_ptr uses ref counting, unique_ptr doesn't, and looking at our code base (highly networking orientated) we only use shared_ptr once. You could, in theory, use shared_ptr everywhere, but then you're not using the language properly and may as well resort to Java or similar.
Handling large numbers of connections is more easily done in Java, in my experience, since Java's async support is relatively straightforward.
I would lay money that the language isn't your real bottleneck. Switching languages might save you a factor of 2, using better algorithms or datastructures can save you a factor of 1000 or more. How much profiling do you do?
> If you're using C or C++ "for performance" and not using a profiler in day-to-day development, you don't need to be using C or C++ and should be using a memory-safe language instead.
That's in my mind wrong for at least two reasons:
- If you are making building blocks (libraries) for other languages, you have to use C or C++ (maybe Rust soon).
They are currently the only languages that can be bridge to the rest of the world (Python, Ruby, JS) without loosing your mind.
The main reason is that they are without GC, meaning deterministic destruction of object, meaning easy to interface with language with GC.
- When you aim for high performance you don't profile day 1, that's a complete waste of time. You will never transform every of your function in a critical kernel. You profile when you reach a performance bottleneck and you need it.
> If you are making building blocks (libraries) for other languages, you have to use C or C++ (maybe Rust soon).
Sure (though there are often other ways to achieve the thing you actually want to do). But in that case you're not doing it "for performance".
> When you aim for high performance you don't profile day 1, that's a complete waste of time. You will never transform every of your function in a critical kernel. You profile when you reach a performance bottleneck and you need it.
In that case your performance requirements are not extreme enough to justify using C/C++ for your whole application. Write it in a safer language, when you hit performance issues profile and optimise, and maybe drop into C/C++ for those few "critical kernels" in the unlikely event that it turns out you actually need to.
> In that case your performance requirements are not extreme enough to justify using C/C++ for your whole application.
Again : no. That is in my experience both wrong and over-idealist.
For many applications, the overhead in term of dev time of writing bindings for every of your compute kernels + the pleasure of debugging the problems associated with them and heterogeneous build chains is generally several order of magnitude higher in term of man-hour-cost than just doing your program entirely in C/C++/rust.
There is an other aspects generally ignored:
- Theory say that often 80% of the compute time is consumed in 20% of the code. That's often wrong, and many HPC simulators do not have any kernel taking more than 3-7% of total run time. Consequently, everything might one day need to be optimised.
- Many performance critical algorithm are state of art and evolve. Meaning your innocent little function in "memory managed" language might become tomorrow a new bottleneck. And you do not want to have to rewrite that all the time.
A lot of new devs are wrongly scared of manual memory management where it became a non-problem with RAII in C++[1-2]x or the borrow checker in Rust.
And generally the ones that are scared are the one that do not use it.
The mental overhead with memory in C++ does not come so much with the lifetime of object, it comes mainly with HOW to use efficiently YOUR memory: aligned object in memory, cache effect, indirection, cost of polymorphism, allocation, etc, etc.
All these aspects, you do not think about them in memory managed language because you can not: You have no control over it.
And that's also why they bite you in the face in term of performance, generally much more than the 2x you quoted before. Just a remember, a cache miss and it's ~200 cycle you loose
> - Theory say that often 80% of the compute time is consumed in 20% of the code. That's often wrong, and many HPC simulators do not have any kernel taking more than 3-7% of total run time. Consequently, everything might one day need to be optimised.
> - Many performance critical algorithm are state of art and evolve. Meaning your innocent little function in "memory managed" language might become tomorrow a new bottleneck. And you do not want to have to rewrite that all the time.
You can't have it both ways. If it's really common for everything to become a performance bottleneck, it's worth profiling from the start so that you avoid having major pitfalls anywhere. If it's rare and exceptional, FFI for those cases is fine.
> The mental overhead with memory in C++ does not come so much with the lifetime of object, it comes mainly with HOW to use efficiently YOUR memory: aligned object in memory, cache effect, indirection, cost of polymorphism, allocation, etc, etc.
> All these aspects, you do not think about them in memory managed language because you can not: You have no control over it.
> And that's also why they bite you in the face in term of performance, generally much more than the 2x you quoted before. Just a remember, a cache miss and it's ~200 cycle you loose
On the contrary. Plenty of people do those kind of things in, say, Java. They require knowing about compiler internals and using unsupported hints, or even bypassing parts of the compiler. But so does controlling these things in in C++.
> If it's really common for everything to become a performance bottleneck, it's worth profiling from the start so that you avoid having major pitfalls anywhere. If it's rare and exceptional, FFI for those cases is fine.
You can have it both way, code evolve. It is pretty common in performance critical code that a minor, almost never call function in one scenario, become a performance critical bottleneck in an other. If you already played with large scaled simulation software, this happened almost every week depending of your inputs on what you are interested to simulate.
> On the contrary. Plenty of people do those kind of things in, say, Java. They require knowing about compiler internals and using unsupported hints, or even bypassing parts of the compiler
No it's not again. I have been developing in C++ for 15 years, including in the HPC world and I (close to) never had to touch a compiler internal.
The language gives you what you need for performances, you do not need to play with that.
At the opposite JIT compilers like V8 or Java are monster of complexity very sensible to side effect [1] and controlling things like "Does this data fit in my L2 cache ?" in them is close to impossible because even things like "Where are my data and what are there size ?" is an hard question.
Once again, there is theory and there is practice.
Theory is what you say. Practice is that 98% of performance critical software in HPC, game industry, physics and High Frequency Trading is in C++/C (maybe Rust soon). And this is why.
> You can have it both way, code evolve. It is pretty common in performance critical code that a minor, almost never call function in one scenario, become a performance critical bottleneck in an other. If you already played with large scaled simulation software, this happened almost every week depending of your inputs on what you are interested to simulate.
In which case you're in the "worth profiling from day 1" world. It's much easier to work on the performance of code when you're already working on it and have it in your head - particularly in a verbose language like C++ where it takes a relatively long time to comprehend existing code - so if there's a decent chance that the performance of this code is going to be important in the future, profiling as you write saves you time overall.
> No it's not again. I have been developing in C++ for 15 years, including in the HPC world and I (close to) never had to touch a compiler internal. The language gives you what you need for performances, you do not need to play with that.
I said be aware of, not touch. If you weren't doing things like memory alignment pragmas then I guess your performance requirements were never so stringent. Fact is that a Java program that's fitting its data into L2 or avoiding cache line aliasing will blow a C++ program that isn't out of the water.
> Theory is what you say. Practice is that 98% of performance critical software in HPC, game industry, physics and High Frequency Trading is in C++/C (maybe Rust soon). And this is why.
HPC/physics follow questionable development practices in a lot of areas, and the games industry follows questionable everything practices. HFT uses a lot of Java and even higher level languages. C++ survives because people are rewarded for being seen to put a lot of effort into performance, and are not rewarded for avoiding bugs.
> C++ survives because people are rewarded for being seen to put a lot of effort into performance, and are not rewarded for avoiding bugs.
That's pure bullshit.
C++ survives because, even in 2020, it does the job.
Most people criticizing C++ are still stuck in there mind with C++98 and its quirks.
C++ evolved and modern C++ is at least as productive as Java or C# when used correctly.
That's why he is still actively uses and continue to grow.
This message just translate at best, your feelings (as a Java/scala developers ?). And you allow yourself to insult both the HPC Industry and the game industry without even providing metrics based on your "feelings".
> C++ evolved and modern C++ is at least as productive as Java or C# when used correctly.
You're claiming this in a thread about how a flagship project from one of the biggest names in the industry found that 70% of their security bugs were things that wouldn't have happened in Java or C#. Your statement may be true, but only for a kind of "correctly" that doesn't actually exist in practice.
> That's why he is still actively uses and continue to grow.
Where are you getting those stats?
> This message just translate at best, your feelings (as a Java/scala developers ?). And you allow yourself to insult both the HPC Industry and the game industry without even providing metrics based on your "feelings".
Would you defend either of those industries as a haven of good coding practices? Do you believe that they have fewer bugs, make better use of up-to-date tools, make more data-driven decisions, than other parts of the industry? I'm repeating a reputation rather than a specific metric, sure, but does anyone actually dispute that reputation?
> You're claiming this in a thread about how a flagship project from one of the biggest names in the industry found that 70% of their security bugs were things that wouldn't have happened in Java or C#.
Which is a project which is born in 2008, and still ship codes from the 90's. That also include one of the most optimized (meaning complex) piece of code world wide: V8.
You have nothing that come even close to the complexity, usability and popularity of Chrome in both Java and C# world. Ironically, even Microsoft uses a C++-core in its software, including MS Office and Edge. Maybe you should reflect on that.
>Would you defend either of those industries as a haven of good coding practices?
Every industry has domain driven standards in term of coding practice. They all have their reasons based on deadlines, usages, iteration cycle, developer backgrounds, safety.
Pretending that one culture is superior to the other is both pretentious and let appear a bad misunderstanding of the world we are in.
Now this is my last comment on this thread. I do not think you are open to any discussion.
> Which is a project which is born in 2008, and still ship codes from the 90's.
Back in 2008 C++ advocates were saying the same thing: all those errors are only in old codebases, modern C++ doesn't have those problems. At what point should we stop believing it?
> You have nothing that come even close to the complexity, usability and popularity of Chrome in both Java and C# world.
Nonsense. There are dozens of more complex, more usable, and more popular systems written in Java and C#.
> Ironically, even Microsoft uses a C++-core in its software, including MS Office and Edge.
In the older projects that they're most conservative about, yes. Large companies change slowly. Doesn't mean what they're doing today is wise.
> Every industry has domain driven standards in term of coding practice. They all have their reasons based on deadlines, usages, iteration cycle, developer backgrounds, safety.
Which is to say that good development practices will be a lesser or greater priority level in different industries.
There's performance critical software in hft written in Java and erlang too. People like to think C++ has the monopoly, but it doesn't.
And game programming isn't normally "performance critical" so much as OS-less and embedded. Sometimes there's real-time constraints, but very few parts of a AAA game are the inner rendering loop that has real time requirements. Mort of it if boring stuff that could be (and often is!) written in python or more often Lua.
I will agree in part, if you aren’t using C or C++ in performance critical software you probably could use another language. I think your profiler-every-day requirement is, maybe, a tiny bit too restrictive, but not by much.
There is certainly a need for the industry (and hobbyists) to take stock of both what is necessary and what is just desired. I just don’t want to see the bar for using a language that doesn’t handle all aspects of memory usage and performance to be restricted to soft real-time applications and infrastructure projects. Digging into low level programming can be fun and rewarding.
My problem is essentially that even if I am not absolutely concerned with maximum performance across memory usage, binary size, CPU efficiency, bandwidth, etc. I still truly enjoy the options for control (or the illusion of it) that is provided by C. I enjoy memory layout design, allocator design, cache-efficiency considerations and the like; while at the same time, I don’t have a huge love of malloc and free or having to track down segfaults. I think ‘performance-by-default’ is a viable language design goal and want more of it in my tools.
I keep posting comments mentioning my pet language project, mostly in hopes that when I see it on my threads display I can continue to shame myself into getting it released on time, and this is one of those. These kinds of concerns motivated me to design a language for personal use that gives me what I want, but doesn’t require (but can) allow me to deal with other things. I enjoy using Coq, Haskell, Idris, Lisp, ATS, and Clean. But those language deprive me of things I really do enjoy. So I’m going for a low level language (in the Perlis sense) that has an experimental type theory. This will certainly allow for a memory safe subset and is the kind of thing I want to see more of from others.
The problem with memory management is a problem with lifetime management, which Rust reasons about in terms of ownership management, which it attempts to reason about statically, with help from the programmer. GC attempts to do the same thing, dynamically, with less help from the programmer.
Both of those methods still allow leaks, which is why Rust encourages RAII. [1] Are there other structured lifetimes we can get compilers to enforce for us, like they enforce certain invariants about flow control using control structures?
> I'm still surprised how much stuff is being written in non-safe languages even if people could get away with a managed language.
The distinction between unsafe and managed languages does not make any sense. You can for example have a safe C implementation. No need to move to another language.
In both ways safety comes by the implementation. It just so happens that so called "safe" languages tend to have a single safe implementation while "unsafe" ones tend to have multiple unsafe and one or two safe ones.
Just imagine that SQLite is written in C and the major of bugs are logical/optimization ones, not related to memory management.
And when talking about C++, since 2011 there are smart pointers to help developers managing memory kind of automatically.
I believe most of the problems are due the people typing on those keyboards such bad code, especially when it comes to "smart" code.
In 2020 we have plenty of tools supporting the developer job (memory sanitizers, static analysis tools, linting, profilers, etc.): The big problem is failing to / ignoring to use such tools.
Ah, the "bad code" fallacy. If we hadn't had enough experiences already, the article we are disucssing here clearly shows, how difficult it is, to get C/C++ code reasonably error free. Yes, there exist a free applications, which are pretty good in that respect, but if your goal is to be error-free, the development time balloons. And you are often limited to very simple memory allocation schemas.
If you think that "smart pointers" should be used, why not just use a memory-safe language?
I would argue that most of the memory errors actually pertain to C programs, not C++ ones (providing they use proper encapsulation and libraries such as STL).
C is syntactically C++ being a subset of C++ and I've seen many programs which claim to be C++ but are actually programmed using C methodologies.
Actually, sqlite has its share of memory safety issues. https://bugs.chromium.org/p/chromium/issues/list?q=Type%3DBu... (and to be clear, this is just a coarse search, and the bugs there aren't necessarily all bugs in sqlite, but there are definitely some)
No, the issue is that the incentives are wrong (features, features, features for promotion) and indeed the engineers (on average) are not that talented.
Matches closely the number of memory related security issues Firefox had in their CSS parser before the rewrite in Rust [1]
> Over the course of its lifetime, there have been 69 security bugs in Firefox’s style component. If we’d had a time machine and could have written this component in Rust from the start, 51 (73.9%) of these bugs would not have been possible
In this context I believe that the rewrite benefitted Firefox more than Rust
At Firefox they already knew the problem they were trying to solve using Rust, they already wrote that software, already discovered many of the overlooked complications involved in writing a modern browser, so, in conclusion, even a rewrite in plain C would have solved many of the bugs.
The simple operation of rewriting the same software with previous knowledge of how it works usually leads to simpler code (at the cost of developers time)
Firefox is also the entity that invented Rust, so it's in their best interest to publicize it as "the final weapon" against bug, but "if we had used Rust from the beginning these bugs would not have been possible" is just wishful thinking.
Rust itself could not be there without the browser war and the pressure that contemporary web puts on software that runs it
> At Firefox they already knew the problem they were trying to solve using Rust, they already wrote that software, already discovered many of the overlooked complications involved in writing a modern browser, so, in conclusion, even a rewrite in plain C would have solved many of the bugs.
The Rust re-write was the third attempt; the first two were in C++ and failed.
the posts linked in the beginning of the thread talk about 51 memory safety bugs over the course of many years, from 2002 to 2018.
Stylo has been the default CSS parser starting from the beginning of 2018.
It's good that Rust could have avoided them, but is it a fair comparison?
I think that when at Firefox they started to think about a new architecture to better enable parallelism they began improving considerably, Rust is only a part of that.
and ported them to Elixir and still use them in my programming lessons
> Rust was key to the success.
For them
It's important to specify that Rust, built by Firefox, lead to a Firefox success.
Just like Dart, created by Google, is the language of choice for Flutter, also created by Google.
I know you've been working at Mozilla to work on Rust and I believe Rust is very good, but I also think Mozilla could have used other languages, there were a few that could led them to success, but they understood this are times where the "means of production" aren't the machines but engineers tools, and creating a programming language is the best way to control part of that world.
which ones? If they tried 2 or 3 times with C++ and the Rust one succeeded, what other information would you need to have to convince you that Rust was the differentiating factor? It seems like you just don't want to admit that Rust was the key to their success in the project, even when you have someone who was there telling you that it was.
We aren't going to get research study levels of replication on large projects like this, so I don't know what standard you're looking for here.
> what other information would you need to have to convince you that Rust was the differentiating factor?
The fact that Chrome is doing just fine without it?
> t seems like you just don't want to admit that Rust was the key to their success in the projec
It seems like you are trying a classic ad personam, I agree that Rust was one of the changing factor, I also wrote it, but just for Firefox, not in general.
Which is the original point of this sub-thread.
> We aren't going to get research study levels of replication on large projects like this
I don't think Firefox is the only large project out there. nor the largest.
Anyway I wasn't implying anything bad, just that you worked for years at Mozilla on Rust and it's like asking Anders Hejlsberg if C# enabled Microsoft to do things that have failed before with C++ or if TypeScript is better than vanilla JavScript.
Mozilla developed Rust specifically for this kind of rewrite. That was the entire point of it. (As relayed to me by one of the designers in 2011 or so.)
They didn't think just a rewrite in C would be enough, they didn't think any other existing language would be sufficient, and then they went off to design Rust. So the statement "Firefox is also the entity that invented Rust" kind of misses the point.
Some people seem to think Mozilla gets royalties every time you invoke rustc.
And that if I just took off my rose-tinted glasses, I'd realize my Rust code is buggy, unsafe, slow, and hard to maintain, and the only reason I'm using Rust is because of hype.
It was pretty hyped when I started using it, in 2015.
> And that if I just took off my rose-tinted glasses, I'd realize my Rust code is buggy, unsafe, slow, and hard to maintain
I didn't get the impression.
I understood that Firefox talks about their success in rewriting in Rust because it's their language, they control it and are the major sponsor and user.
I don't think Google or MS or any other company heavily involved in crafting programming languages for their own purposes will ever go that route for some of their core software, because they can't control the language and if they tried they would get the blame for trying.
There is a branch in the repository right now trying it out. Rust is also used in ChromeOS.
> Mozilla controls Rust because it's the largest Rust user.
Mozilla is not the largest Rust user, nor does the largest user control the language. Governance is consensus-based, and anyone is eligible to join.
> You said it yourself "the only real way to get a job working on Rust was to work at Mozilla"
I may have said that a long, long time ago, but it's not true today. The Rust team at Mozilla has been shrinking, and other companies have been letting folks work on Rust as part of their job.
And volunteers are like, 10x-25x more numerous than people who are paid to do so.
You're not even aknowledging the fact that there could be different opinions on the matter, if I was you I wouldn't play the card "you're dismissing evidence".
You work on Rust, that's a fact, I'm giving you credit for it.
Can you say it doesn't affect your judgement at all?
Rust is now an official UWP / WinUI binding, part of Project Reunion, core Windows platform, and is shipping on Visual Studio Code and Azure IoT products.
Microsoft is now a OpenJDK contributor has bought jClarity, Java has had several talks at Build and has parity with .NET on Azure SDKs, Office doesn't dictate all business lines.
And I bet you weren't reverse engineering Office to discover which ActiveX were implemented in J++ instead of VB 6.
> Java has had several talks at Build and has parity with .NET on Azure SDKs, Office doesn't dictate all business lines.
And Linux is the most installed OS on Azure...
I can buy milk from my butcher, but his core product is still meat.
you still fail to see the difference between what they offer to potential clients and what they use internally.
They are expanding the offer but are still a software house in the end.
It also means that MS is using its weight on free (as in free speech) technologies, like Google has done before with other OSS projects and we all know how it ended.
> Office doesn't dictate all business lines
Obviously, it doesn't.
It only generates 33% of the revenues and 39% of the operative margins.
The second largest segment for revenues, behind computing (mainly HW), the first for margins.
Cloud comes third - and last - for revenues and second for profits with a pretty strong growth - less than 2018 but still strong -, but keep in mind that they include the Office 365 online offer and Gaming cloud in the segment.
> And I bet you weren't reverse engineering Office to discover which ActiveX were implemented in J++ instead of VB 6.
I think Rust is pretty much a meta-rewrite. The same way you describe a team learning from an existing product (CSS parser) what needed to be addressed, they applied the same logic but one level higher: on the very tool they were using.
I think it's clever, it was risky but it seems to be paying off.
Not to be snarky, but my initial response to this was "Large project that uses language known for memory safety issues discovers that most of their security bugs stem from memory safety issues". In a more serious note, I hope this motivates the Chromium team to invest more in Rust. While the other options sound like good "in the meantime" solutions, switching to a language that, at its core, is designed to prevent these sorts of issues would be a huge benefit to society as a whole considering that Chromium is the most popular browser engine in the world.
But "large project that uses language known for memory safety issues" describes a lot of important software today. It's not like there are tons of practical language options for memory-safe high-performance code, and there were essentially none when many such projects started.
So while this may be unsurprising to those paying attention to Rust and memory safety, it's still relevant to a lot of software and a great confirmation of Rust's importance.
The previous posters were talking about languages available when chromium was started. So while Rust probably would be the language of choice today (see Firefox), it wasn't an option, but Java was already around and mature.
I don’t think Java is an option either for writing a web browser specifically for reasons of overheard. Rust is probably an option now but bringing up Java in this context was weird.
Do you really think at any time in Chrome's lifespan that its developers would accept a decision that is so foundational (aka hard to reverse) and slows everything down by a factor of 2? I don't. If they had, I don't think it would be the most popular browser today, and arguably it wouldn't even still be around. IIRC, speed was a major selling point when it was released.
(I also don't really think Java is that much slower but substitute a more precise guess and my statement stands. And I think Java and other GCed languages really do use ~2X as much RAM which is also unacceptable.)
The speed of Chrome's from-scratch JavaScript engine was a major selling point when released, yes. But that speed came from a lot of places - after all, competing JavaScript engines were also written in C++, and in particular areas V8 was thousands of times faster than them.
Using Java over C++ would have meant a factor of 2 up-front performance cost. But it would also have meant significantly less time debugging, easier testing, faster iteration, better automated refactoring support... all of which would have added up to being able to spend more development effort finding the kind of algorithmic improvements that give you those 1000x speedups. I'm not at all convinced that the end result would have been slower.
In particular areas (short timespans, specific tasks), Java can be thousands of times slower than C++/Rust. (Warmup/pre-JIT function evaluation, GC pauses, etc.) I have these kinds of performance rough spots, but I'm surprised to hear you do. It doesn't seem consistent with an argument for Java.
I don't think you'll find a realistic case where Java is thousands of times slower than C++/Rust. You're even less likely to find a case where it's a worse order of asymptotic complexity. Whereas you'll find plenty of cases where (at least 2008-era) SpiderMonkey is in a worse complexity order than V8.
You don't have to look far at all to find such cases. Startup takes forever. It's not that there's a specific small operation that programs in both languages do and is O(log n) in C++ and O(n^2) in Java. It's more that the Java programs do work that is unthinkable with C++. They run an optimizing compiler (once they decide its worthwhile on a given function). That's not something C++ programs do. [1] Until that happens, they may be running completely unoptimized interpreted code. They might go through the whole thing again if they fill their perm gen (or whatever it's called) and evict the optimized version of the code. They run a garbage collector which sometimes stops the world. Your tiny function might have to wait for gigabytes of heap to be scanned.
Over sufficiently long runs, throughput is easily within the factor of 2 you mentioned. But over short timescales, thousands of times slower is completely plausible. Some C++ CLI application might run in 5 ms where the equivalent Java program takes 5 seconds.
IIRC there was some article recently challenging the assumption that Java programs commonly reach the optimized steady state at all. I can't find it though.
[1] Except V8 of course on the user-supplied JavaScript. I'd call that quite different than optimizing all Chrome's own code.
Sorry, I had a longer reply to your previous message that I lost somehow.
In the context of something like Chrome that's already willing to implement a custom process model, I don't think that there are business requirements where that kind of large overhead is unavoidable. Running an individual unix-style JVM process can indeed perform very poorly on the default settings. But that's not the only way to build a web browser.
It’s probably doable but I don’t know with what kind of performance envelope. Java famously has very high runtime costs in pathological cases, and it’s very easy to shoot oneself in the foot. I completely agree that memory safety is more important, but it’s not like they wrote a new rendering engine from whole cloth, they used WebKit.
Client warmup times are not great. Java is incredibly fast but attempts to write an HTML engine in it in the past didn’t really pan out. Also, WebKit was already written in C++.
Java when written like C is within a factor of 2 at best. Java when written like Java isn't anywhere close to C/C++.
And the gap is getting _wider_ as Java is an increasingly bad fit for modern CPUs due to the heavy pointer chasing nature of it. It desperately needs value types to stay competitive in a performance battle.
To do high-throughput transaction processing involving a mix of strings and numerical operations I'm pretty sure nothing actually beats Java, not even C++.
Numerical computing is a different thing. Low-latency audio as well, etc.
It's really, really easy to beat Java at string processing. UTF-16 + immutable strings is not a recipe for performance. The UTF-16 in particular is quite killer - it basically means you have ~half the memory bandwidth in many, many workloads, and the JIT & GC can't do anything about it (other than use up even more of your already limited memory bandwidth, that is)
As for "numerical operations" - what about Java makes you think it's at all particularly good at it? Especially once boxing enters the picture because you're using an ArrayList or whatever, and simple numbers become crazy heap allocations that triple their size.
Yes, if you want to write a loop doing a single type of processing, numerical or string, you can typically write better code in C++, or a shader language.
Most business applications don't follow that pattern at all. It's a mix of many types of operations. Look at the Techempower benchmarks, for example; the Java frameworks tend to do better than the C++ ones, at least when it comes to throughput (C++ is more competitive when it comes to latency).
I believe it's due to the HotSpot compiler doing aggressive, profile-guided optimization (you can do AOT profile-guided optimization but few people actually do it; in Java you get that automatically) as well as the garbage collector. Memory allocation is a fast as incrementing a pointer. Whereas C++ will tend to do way more housekeeping; in Java that housekeeping is amortized when you call the GC. That hurts latency but helps throughput.
If your workload is heavy memory allocation churn then yes Java will absolutely be a better fit, or really any language with a good GC. But that's not "string & numerical" processing.
As for Techempower the top framework by a landslide is in Rust, not Java. That's a matter of if the framework is optimized to win at Techempower or not, wasn't that the lesson actix taught everyone?
Judging from your lol, you must surely remember debian aliot comparison pages, or various coding competition websites ranking the times for java solutions very close to c++ solutions back in the 2000s.
The C and C++ versions of those benchmarks are manually vectorized to death using vector intrinsics. You don't have those in Java, nor in the standard versions of C and C++. So yes, those speedups are real, if you invest a lot of work. But if you don't, there is no magical 5x speedup of C++ over Java.
And the Java ones don't look anything at all like typical Java, either. But no, the C/C++ ones have not been all manually vectorized to death. The binary-trees one, for example, is a fairly clean C++ implementation, and runs in less than half the time using less than half the memory of the Java version. It looks like 'only' 4 of the C++ ones use any vectorization intrinsics.
> But if you don't, there is no magical 5x speedup of C++ over Java.
Nobody said anything about "magic." AOTs are really good. Value types are really good. Return types consistently being on the stack without requiring escape analysis is really good.
Java's performance is impressive for how crippled it is by the language, but HotSpot is definitely far from magic. It can't recover from the limitations of the language. You're fully paying for the "high level" & simple nature of Java.
> And the Java ones don't look anything at all like typical Java, either.
Which ones do you mean?
> But no, the C/C++ ones have not been all manually vectorized to death.
True.
> The binary-trees one, for example, is a fairly clean C++ implementation, and runs in less than half the time using less than half the memory of the Java version.
Unfortunately I can't benchmark this myself because it uses some library I've never heard of. The numbers on the Debian site look pretty outdated, Java binary-trees takes about 1800 ms on my machine. And about 1450 ms after letting it warm up.
> AOTs are really good.
So are JITs, but we are both really deep in hand-waving territory here.
> Return types consistently being on the stack without requiring escape analysis is really good.
The fast path of object allocation is bumping a pointer into an allocation buffer and checking it against the buffer's limit. Objects that are short-lived enough that you would want to return them by value in C++ will usually never leave the allocation buffer. So it's not the exact same thing as allocating on the call stack, but it's not far.
> So it's not the exact same thing as allocating on the call stack, but it's not far.
Unfortunately it is far. Creating an address is quick, yes. But that address consistently won't be in L1/L2. Likely won't even be in L3. And if it's a return value containing more than one allocation that's easily more than one cache miss along with dependent reads.
That is, if you write your program for the benchmark, i.e. writing C in Java by writing procedural code with packed primitives, or juggling and write all magic hacks just to make the code falls into correct JIT path.
Otherwise, if you write normal OOP Java, 10x should be the performance gap.
Are you saying that the Java benchmarks linked above are written using "magic hacks"? None of them are even close to a 10x gap, despite the fact that some of the C++ ones do use magic hacks (vector intrinsics).
10x gaps occur whenever you have to interface Java with the real world—disks, memory, CPUs, virtual memory subsystems, networking stacks, etc. to get high-performance.
That's why Cassandra has so much C++ in it, and why ScyllaDB is so much faster still.
It's not the C++ per se is "faster" than Java, it's what C++ lets you easily do that Java doesn't.
Other comments have said—well, Java is more maintainable. That's also highly-dependent on the context. ScyllaDB has a much better developer velocity than Cassandra, too. (Anyone can easily verify this.) I use the Cassandra/ScyllaDB example frequently because they implement the same spec, and do so in a compatible way.
It's also really easy to put C++ on the fast path, by adding Python to the mix. For "business logic"-like situations (supposedly the bread and butter of Java), that's what actual companies do, here on Earth, with people: use Python for the easy stuff.
I like Java, and there are certainly some very high-performance Java projects (LMAX, Aeron) and it's very productive, has great tooling, and tons of libraries. There's nothing wrong with it. You can even layer more productive languages on top of the JVM. Win.
What I have a problem with are claims that "all this C++ code can be replaced with Java" at some hand-wavey minor cost. That's…not true.
People are not stupid, they use C++ today because it can do the job when nothing else really can—Java included.
P.s. It's not even true that Java allows you to "forget about memory management". I don't know why people keep saying that, but it's objectively false. If you care about performance, you have to be aware of memory allocations. The GC is not some magic "make my code run fast" card.
Furthermore, there are so many kinds of resources beyond memory! And the GC is an impediment in many cases to using those kinds of resources effectively. C++ has an extremely good story when it comes to managing every kind of computing resource in a large, maintainable codebase.
> 10x gaps occur whenever you have to interface Java with the real world—disks, memory, CPUs, virtual memory subsystems, networking stacks, etc. to get high-performance.
As noted before, the CPU- and memory-intensive benchmarks upthread don't even show a factor of 10x. Despite the fact that the C++ is heavily hand-optimized in ways that are not accessible to Java programs, and the benchmarks being very short running, so heavily penalizing Java's JIT compilation. Please come off the 10x horse, it makes you look like you are arguing from prejudice, not from data. Even for hyperbole 10x is way too much.
I have experience working on high-performance compilers for both Java and C++, and I can promise you that Java compilers don't generate so very different code for using the CPU or memory. Yes, Java has some overheads, but it also has some tricks up its sleave.
If you give gcc the appropriate 'march' parameter it can use vectorisation automatically if it figures out it can do it.
The main argument here is that native languages let you get the most out of your CPU if you want, whereas with the JVM you're probably stuck unless you use JNI, in which case why not just go straight to native?
> If you give gcc the appropriate 'march' parameter it can use vectorisation automatically if it figures out it can do it.
Yes, I'm aware of loop vectorization in C compilers. I'm also aware of loop vectorization in Java JIT compilers. Do you think there is anything specific to C or C++ that makes this easier than in Java? There isn't.
> The main argument here is that native languages let you get the most out of your CPU if you want
Most applications, including probably more than 99% of Chromium, are not the kind of code that would benefit from manually written vector intrinsics. Does the "main argument" claim that these 99% would also be 5x or 10x slower if they were written in Java?
> It's not like there are tons of practical language options for memory-safe high-performance code, and there were essentially none when many such projects started.
Chromium started in 2008, at which point there were plenty of mature cross-platform high-performance memory-safe languages that would have been suitable (e.g. OCaml or Haskell).
I am genuinely curious about this, do you really think Haskell has performance characteristics that are suitable for implementing Chromium? Haskell is wonderful for many uses, but it is difficult for me to imagine it as the basis for a modern web browser.
I will concede that a branch of GHC being utilized specifically for such a task could have been modified enough in the intervening years to enable a viable browser by 2020. But I really don’t know if it could be done with standard GHC.
> I am genuinely curious about this, do you really think Haskell has performance characteristics that are suitable for implementing Chromium? Haskell is wonderful for many uses, but it is difficult for me to imagine it as the basis for a modern web browser.
Short answer: yes. Long answer: profiling and diagnosing performance issues is still kind of a black art, but for large codebases I've seen Haskell rewrites outperform C++ significantly. You need talented Haskell developers, but surely Google should be able to find those. Most of the tasks that I can think of in a web browser seem like things that Haskell is ideally suited to - parsing, data transformation, rule-based logic - what is it that makes you think it would be unsuitable?
The memory consumption, especially the semi-non-determinism relating to lazy evaluation, was the first thing that popped into my mind. But, as I said in a comment below this, I think that Haskell as the common case language, with some FFI accessible code would probably fit the bill. I can not think of any large Haskell projects in the application space where browsers sit, but it doesn’t mean it couldn’t be done.
Chromium took parts from WebKit, but e.g. the whole JavaScript engine was completely new. Even if they needed to depend on some libraries written in C++, that doesn't justify writing the whole application in that language. And once they forked WebCore as Blink they could have gradually pushed the borderline down.
I wish I had read this comment before I replied to your parent, if I had thought about a primary Haskell or OCaml implementation with FFI and then gradually shifting I might have just kept my mouth/keyboard shut.
The "large project" part pretty much explains why the don't just rewrite it in Rust. It might be easier to audit the source code and rewrite the offending code, write C++ libraries to mitigate the issue or both. Also see:
> It might be easier to audit the source code and rewrite the offending code
What do you think Chromium developers have been doing for the last 10 years, sitting on their hands? My understanding is that the main reason why Google OSS-Fuzz exists is Chromium. The problem with this approach is that it evidently (see OP) doesn't work to find all the bugs before release.
Has there been any work on how to leverage Rust's strengths for a GC language implementation? For example, Rust assumes unique ownership for mutable values; how do you express that the value is also reachable by the GC?
If by GC you’re referring to its reference counting, either by Rc or Arc, there is no GC reachability beyond the reference counting. The responsibility for counting the contained object and dropping (cleanup before dealloc) is up to the type, either Rc or Arc (a stands for atomic, not automatic, as both are “automatic” in the sense that you never need to manually increment or decrement a reference). the owned, mutable object would contain the type information necessary to generate the right Rc calls ay the right time, but a distinct type is necessary to encapsulate the behavior.
There are also escape hatches, between unsafe leaking and the RefCell type, to provide “interior mutability”, i.e. mutable aliasing via an without the memory safety issues which come with this. Note I have only had to use the unsafe bits when passing memory up to a c stack.
I don’t believe there is a tracing GC implementation in the standard library, although I’ve heard plenty of discussion and you can find some simple implementations on crates.io. However, I would assume there would need to be some language changes to properly account for assumptions on how the Drop (free) functionality fires, not to mention significant backend work for tracing to work.
Let's put it this way: if you have a &mut to a heap allocated object (Box, whatever), you pretend it's unique, BUT its contents are also reachable by the global allocator, say, malloc. So it's not really a unique reference.
If you allocate something new, Rust pretends it is independent, assuming that malloc won't muck with the other objects it knows about.
What if we relax that assumption? Allocations may perturb other allocations - that's how moving GC heaps work! Can that be modeled usefully in Rust?
&mut T means that there is only one pointer (the &mut T) that will be used to access some memory. You can have as many other pointers to that memory as you want, as long as you don't invalidate the first one.
This code is safe:
let mut foo = 13_i32;
let x: &mut i32 = &mut foo; // First &mut
{
let y: &mut i32 = unsafe { std::mem::transmute(x as *mut i32) }; // Second &mut T (copy)
*y = 42; // Write through second
}
dbg!(*x); // Read through first
The allocator API hands out `*mut u8` so... not really. If you changed the allocator API and everything that uses it and makes assumptions about how it works then sure, you could make it hand out a pointer to a pointer so you could come in later and update the inner pointer after you've done a GC pass. You couldn't do something like handing out a normal `Box` and keeping track of where it lives so you can patch it later though. Rust doesn't have move or copy constructors so as it gets passed around it'll just be `memcpy`'d and you'll lose track of it.
Ah, I see what you’re saying, and I am afraid I am out of my depth because this touches on a lot of maintainer intentions I can’t speak to. You are correct in implementation assumptions at least.
Edit: I left out a phrase apparently, it should be “...mutable aliasing via an apparently, to the lifetime checker in each function, an immutable reference”.
This isn't really directly related to your question, but I recall hearing that in Servo and Firefox, the usage of Rust is limited as it gets close to the DOM specifically because of interactions with the JS engine and its separate garbage collector, which Rust does not understand.
Without any more specific knowledge, I bet the way a generational garbage collector moves data around really messes with Rust's view of the world...
That's an interesting question, and there has been some work in that area.
Getting shared mutability as a user of GC is basically the same as getting shared mutability in any other scenario in Rust. You use `UnsafeCell<T>` or a safe wrapper around it, like `Mutex` or `Cell`. You can get a feel for this even in idiomatic Rust using `Rc` instead of a tracing GC.
When you start looking at other methods of GC things get trickier. Even with `UnsafeCell<T>` the GC can't, in general, know whether something is borrowed (mutably or immutably) by the mutator, so the problem to be solved is finding ways to enforce that there are no references into the heap at all, when collection needs to happen.
But when you think about it, this is actually not so different from the usual problem of GC safepoints. The problem is the same- enforce that there are no un-rooted pointers into the heap, when collection needs to happen.
And it turns out there are some tricks you can do with lifetimes to get borrowck to enforce safepoints for you, and it could even be made pretty ergonomic in the future.
I think gc-arena is fairly similar to josephine- both treat the GC heap itself as a container-like object, requiring `&` or `&mut` access to dereference a GC pointer. Gc-arena also suggests that we could use generator yield points as GC safepoints- they have exactly the required lifetime properties. So you might (to borrow Python syntax) `yield from fn_that_allocates_gc_memory()`, with the compiler ensuring you don't hold any references into the heap across that call.
Both gc-arena and josephine appear to take the "unidirectional" approach, where there is no recursive re-entrance into the GC heap. For example gc-arena will not collect while the heap is being mutated, which is Rust-friendly but also impractical.
GC'd languages typically require reentrancy into managed code (managed -> native -> managed), and this has historically been the source of type confusion and other security vulnerabilities.
> gc-arena will not collect while the heap is being mutated, which is Rust-friendly but also impractical.
This is where the generator stuff I mentioned comes in- I think the approach is much more practical than it seems.
At the end of the day, any GC language needs to ensure there are no live un-rooted GC pointers at points where collection takes place. So any realistic Rust/GC integration will need to solve the same problem, if it is to retain memory safety.
And the best way (so far) to ensure all references into the heap have disappeared is to make sure they are all derived from a function parameter with an unconstrained lifetime, because then the mutator knows nothing except that it might go away when it returns. (This is "generative lifetimes" from the links.)
The real trick then is to make it look like the mutator is returning, without actually forcing it to exit. A yield point in a generator can do this, even without ever actually yielding. So you can manually insert collection points in the middle of a Rust mutator, and you can have a "managed -> native -> managed" stack where the native frame thinks its managed callee can yield across it, and borrowck will ensure anything derived from that reference parameter is gone.
Of course this is a long way off from Rust today, where generators aren't available on stable (though perhaps you could hack something together with async/await), but does look like it should eventually be possible to have reentrant heap access and memory safety across Rust and a GC language.
Surely they way most gcs achieve this is by asking you to tell the gc when you aquire a reference to an object, and tell it again when you no longer need it. You could do that with a custom implementation of drop in rust.
Ah. I am not a Rust expert but I think the language might actually support this sort of construct (though it might be inefficient to implement), as IIRC if you want something to stay at a fixed place in memory you need to “pin” it there.
>> 70% of security defects in memory-safe languages are probably also memory safety problems.
> That doesn't make any sense at all
I'm sure it's not true—70% is way too high—but the real number isn't 0% as you might expect.
In particular, (most? all?) non-Rust languages that claim or appear to be memory-safe can have data races. In Go for example, those can be exploitable. [1]
And Rust is memory-safe...in safe code, with a bug-free compiler. Real programs have some unsafe code in their transitive dependencies and are compiled with the real, buggy compiler. [2] The percentage of security bugs in Rust code that are due to memory safety problems is more than 0%.
Not to say they don’t exist or haven’t been reported, but many of the unsound issues on github for Rust don’t seem to hold water.
They all seem to misbehave on a single nightly build, affect only a very narrow and unsupported target group, be related to a bug in a unsupported release of LLVM, or simply aren’t reproducible. I think the percentage of security bugs in Rust related to memory safety, based off that list, is _effectively_ 0%. I can’t find an issue in the list that seems like it would impact Rust programs that people write today or on targets that people deploy Rust code where memory safety matters.
That probably reflects the higher standards in the Rust ecosystem. I remember when there was a big issue about “Pin is unsound”. Judging by how seriously people were taking it, it seemed like it was the next Heartbleed. Turned out that it was unsound only in a very contrived example.
I don't think soundness/security flaws due to compiler bugs are common, but they qualitatively can happen. I think that might be getting forgotten when folks are puzzled about the idea of memory safety bugs in memory-safe languages.
And there are memory-safety bugs on the CPU level, as we had to learn. However, "safe" languages cut the risk of a memory-safety related bug down by orders of magnitudes, in most cases to practically zero, as the Go example require quite an effort to create the situation. You don't end up in such a situation just by a programmers or logic error, and thats what the memory bugs are about.
I would not even say that Rust (or any other language) is or can be perfectly safe in "safe" code even with a bug-free compiler.
It is possible to access unallocated memory in 100% "safe" Rust, because what memory that is allocated and not depends on the programmer's intent.
Allocated memory is memory that you intend to use.
If I make an array of 3 integers that are not supposed to be used yet but I accidentaly access one of them, then that is a memory-safety bug because I accessed unallocated memory.
The functions that languages provide to "allocate" memory don't define what allocating memory means.
They just provide a tool that you can use to keep track of what memory you are using.
What memory is allocated or not is relative and there are multiple levels of allocation.
If we zoom out a bit we can even call C a memory-safe language because all memory accesses in C must be to allocated memory and the ones that are not will kill the process
(allocated here as in allocated to the process by the operating system).
No, I'm just saying that what memory is allocated (as in the language) is not necessarily the same as the memory that is allocated as in your intentions.
It's possible to allocate (as in language) an array that is longer than what you really need,
and when you are done using some of the memory in that array you might not want to call any free/realloc functions in order to make the program faster.
So instead you might want to just remember that the memory is unallocated (as in intentions) even if it's still allocated (as in language).
And now there is a disagreement between you and the language on what memory is allocated.
No "memory-safe" language can make sure you don't accidently use memory unintentionally.
With that said, languages like Rust really give you helpful tools to minimize the unintentional memory accesses.
My guess is that the commenter is trying to say that security issues safe languages are probably a result of calling into unsafe code or a bugs in a runtime written in an unsafe language.
Chromium initial design was a very simple main process and few untrusted relatively fat processes for web pages each holding set of tabs. That could have worked with C++. However the need for site isolation, performance considerations and hardware vulnerabilities later put an end to it.
For example, initially Chromium had no support from isolating iframes from the main document. Everything was in Blink. But isolating iframes required extremely complex code in the main process to properly forward all DOM events between process. Then that code had to be made asynchronous so web pages could not block main UI.
Then OS interfaces to GPU and it’s architecture rules out having GPU accelerated graphics and decoders in a per-site process. So that must be put in single a GPU process. Surely it is heavy sandboxed. But, as with the recent Networking process, it is still a shared component with very complex C++ code and fat platform libraries in C/Objective-C for interfacing with GPU.
So at the end the original idea of writing everything in C++ and using sandboxed processes to make memory-safety bugs harmless just does not work. One needs a memory-safe language.
The article mentioned various library-based approaches. But those, as having such foreign C++ usage, are becoming essentially domain-specific-languages that one needs to learn. That leads to boilerplating as the host language is not well-suited for that.
The idea of writing a constantly changing application and feeding it only untrusted content over the network while exposing more and more of the OS to that application does not work.
It doesn't work in Firefox with its small block of Rust code, doesn't work in Chrome nor Safari. Because it's the dumbest fucking idea ever.
But sure, at this point we don't have anything to lose by rewriting it in Rust/Swift/whatever.
And that alternative is what? you run no software on your computer? Native apps have way more access to your system then a webpage so I'm glad there are 100s or 1000s of web apps I can run rather then being forced to install native apps to get the same functionality.
Web apps are anyway nonsense both from a privacy (zero privacy) and security PoV (single high-value target, foreign jurisdiction, crappy browsers, byzantine tools, hopelessly overcomplicated architectures etc).
Every time someone tells me something along the lines of "you don't need Rust, just be a better programmer", I lookup CVEs for the projects they've worked on and sure enough there's memory safety issues.
Just because you're a good driver doesn't mean you should forgo a seat belt!
> What if with "dangerous languages" like C you could turn off pointers
You can't. C is a fundamentally defective language because it doesn't have the "memory slice"/"array" construct.
Of course you can work around it, but if you eliminate pointers then you can't do much with C. Heck you can't even print "Hello World" with it (well, ok, technically you can but not in the usual way).
Yes, the wiki has some recommendations (and I remember the C++ JSF standard) but if you follow the link to the actual document it seems to be behind a paywall
C# does that with unsafe blocks. Honestly, C# covers the gamut of needs up to classes and closures down to marshaling and pointers. If someone were to make a GC-less C# (cough Vala), I could see it replacing C++ in a matter of single-digit years.
Since dotnet core 2.1 the garbage collector has been easily pluggable, just by setting an env var, `COMPlus_GCName`, to point to your implementation.
That doesn't mean it will operate without a GC mechanism per se, but it can operate with a "zero" GC that never collects garbage. I know this isn't what you meant, but thought it was an interesting point.
GC-less C# wouldn't even replace Visual C++ codebase in 10 years let alone the entire C++ codebase. And if you are going to use unsafe C#, you might as well just stick to Visual C++. Microsoft really shouldn't have included unsafe code/context into C#. It's the one glaring flaw in an otherwise decent language.
The problem is that C without pointers is extremely limiting. At least in C++ you'd have references left.
More generally I don't think you can really "patch" a language to make it safe. It's going to leak unsafety all over the place. Consider a simple `printf("%s", 12);` which is obviously broken but doesn't really trigger any kind of memory unsafety at the call site. You might argue that "%s" is a pointer in disguise, but if you ban C strings and arrays you won't have a lot left to work with (besides in this case the string pointer is perfectly valid, it's the format string that's incorrect).
You have to be very careful that the looser language doesn't break the rules that the stricter language relies on. Have a look at the Noether design for a serious effort at doing this kind of stratified language.
C++ also has ownership/borrowing system (i.e. smart pointers) now though too, right? I'm a fan of D but isn't the problem about 1) defaults (or even required behavior) and 2) existence of libraries following good paradigms more so than just support for static memory management?
To me it seems likely the percentage is even higher in other programs - a web browser has a pretty complicated privacy/isolation model compared with most other programs so that increases the 23.9% "Other" here. Normal "logic" bugs would be more likely to have security implications.
Chromium's base library. Includes basic utilities useful across the codebase, e.g. hash maps, containers, pointer wrappers, logging, metaprogramming utilities, etc.
Inside Google there's a C++ library of many common utilities called "base". The base in Chromium is an ancient fork of it that's evolved within the Chromium codebase over many years. Abseil is another descendant of the Google internal base library.
WTF (Web Template Framework) is the similar library from WebKit, that also exists in Blink, though the two are quite different these days.
> Question to C++ pros: are there not idioms that can be used and ruthlessly enforced to avoid such problems entirely?
"In theory, yes - in practice, no."
You can use static analysis, vendor-specific annotations, runtime tools like clang's various sanitizers or valgrind, thorough code audits, use smart pointers instead of raw, have clear ownership semantics, follow MIRSA C guidelines, NASA guidelines, etc etc etc... but eventually your project will grow to the point where you'll botch an edge case, and the number of botched edge cases all your coworkers manage to add up will turn into a statistic.
Even if me and all my coworkers were all godlings incapable of making mistakes, we'd still end up debugging plenty of memory safety issues - in second/third party code, sometimes without source code, and writing bug reports / workarounds as a result.
Also, you can assume Google to be using all of the tools at their disposal. The stakes are high for them and they've been dealing with security issues for a long time. It helps, but it's obviously not enough. This seems to be true for most widely used C/C++ code bases; including those with decades of scrutiny. At best some projects have a good track record when it comes to dealing with these issues in a diligent way. Chromium seems to be one of those projects.
There are idioms that you can use to help. Bounds checking solves the classic indexing past arrays problem and you can use tools to prove that you never use e.g. the basic operator[]s on types for which it is not safe.
Some of these idioms are not zero-cost though. My understanding is that to prevent use-after-free you basically can't use bare pointers or references, at all, ever. You need shared ownership everywhere (std::shared_ptr, not 0 cost), a garbage collector (In c++ that's probably another smart pointer type, not 0 cost), or additional metadata like the lifetimes fed into Rust's borrow checker.
Based on my reading here on HN, I think Chrome has a reputation of using modern C++ features extensively to try to improve its memory safety, but it's really hard to do in C++.
Edit: I realized this answer sounds like "yes idiom will fix it". To directly answer the question of "can you use idioms to fix this", no you can't. Evidence shows we screw it up. It's broken and you can't bolt on fixes to make it work.
Humans don't do "ruthless" enforcement, machines are good at it which is what Rust is about. Humans will say stuff like "well, I'm pretty sure this is safe and I need every CPU cycle I can get". This is a security consideration so the bad guys only have to be lucky once, it's like the IRA told Thatcher (British Prime Minister, after a failed assassination attempt):
"Today we were unlucky, but remember we only have to be lucky once. You will have to be lucky always."
If you try to strictly enforce memory safety in C++, you end up with code that looks a lot like Rust. But the Rust compiler will fail to compile code that has memory-safety issues (outside of unsafe blocks), while your C++ compiler will happily ignore them.
The challenge is that the Rust compiler will fail to compile many codes that are strictly and provably memory safe but fall outside Rust’s narrow model of “memory safe”, and which are required for performance reasons. You’d have to wrap most of the code in unsafe blocks. In C++17 you can directly write type infrastructure that transparently enforces memory safety models that Rust doesn’t grok, which is a pretty nice feature.
Many people underestimate what is possible using C++ these days. Complex for sure, but also very powerful.
> Many people underestimate what is possible using C++ these days. Complex for sure, but also very powerful.
And that's the problem. There are very few true C++ experts, and most people will run into these issues as a matter of course.
This just feels like a reduction to the usual: if you are perfect, you will write perfect code, and never have memory-safety bugs. Thanks, but I'm not perfect, and I'd rather write in a language with a compiler that rejects programs with memory-safety bugs.
And yes, that does mean sometimes it'll reject some programs that are perfectly ok, but the Rust borrow checker is getting better all the time, and sometimes you just have to accept being hamstrung a little for the greater good.
I have long imagined a much weaker form your desire: a C++ runtime that detects all undefined behavior in the code that executes. Note that it is nowhere near the goal we want, which is that the program has no memory safety issues at all. Even though there are many static and dynamic analysis tools that exist today, with runtime penalties ranging from almost zero to extremely high overhead, I have yet to see a system that can do this. So I would guess that no, this is not possible. There might be a magic set of idioms, but my guess is that finding that set is going to be so impractical that we will never do it, or someone is going to write a proof that that set is empty and we will just go on living our lives slightly more depressed at the state of computing.
Containers and smart pointers will pretty much do the trick. There's really no need for manual memory management anymore, can't remember the last time I used new/delete. Only problem is most large C++ apps, Chromium included, are older than a lot of the language features that have made C++ much safer these days.
Even though Chromium was started before C++11 was standardized, it still used a smart pointer type with move semantics that was very similar to std::unique_ptr for lifetime management.
However, while lifetime management with smart pointers is certainly important, it doesn't help with non-owning pointers between objects.
The object graph in Chromium is extremely complex. Even if an object's lifetime is managed with a smart pointer, there are often raw pointer back references from other objects. And if some of these objects also have lifetimes coupled to IPC or work happening on other threads… well, you can imagine the result :)
Of course, one could make all the back references weak, but that turns out to also significantly hinder understanding of the code. It ends up being really easy to lose all assertions about when any given object that's weakly reference is actually supposed to be live, and you end up with code that's littered with checks everywhere "just in case".
I believe Chrome makes heavy use of both. The issue is that they are not precise and they are inherently coverage-guided, so you may still never hit certain issues.
The biggest problem is we are using systems programming languages to develop applications. We need to use application programming languages to develop applications. C and C++ are systems programming languages, to be used for developing operating systems, device drivers and other low level stuff.
The problem is, we haven't had a real application programming language since the demise of Pascal / Delphi. And Java / C# have been able to pick-up some of the slack, but not nearly enough as there are still tons of software being written in unsafe languages.
Does Chromium test with valgrind, various sanitizers and static checkers? Since when? This is 5 years of bugs. How many would be caught by modern tools? Could those tools be improved to catch those bugs? I'm surprised this isn't mentioned in what they're trying, but maybe they're at the practical limit.
Sure, all we need to do is rewrite billions of lines of code in some of the most popular software on the planet.
And then replace it with... a language that is exactly as performant, is available on every known platform and has replacements for all the existing libraries (including all the ones written in C!).
I'll pay 15% performance for safety[1]. It's just that we have the wrong priorities, and you betrayed that default attitude again with the requirement that everything be "exactly as fast" as before. I still do not understand this focus on performance above everything else. It is absolutely unquestioned. We can't even talk about it. And it's constantly an argument that is undone by time. Machines are still getting faster. In the heydays of Moore's law, when performance was doubling literally every 12 months, if we were arguing about 15% of CPU performance, that's 2--TWO--months of performance. Now, it might be 1-2 years to get 15% single-core performance. But almost no one is limited by single-core performance anymore.
We got stuck with the worst of all possible worlds--long compile times, long debugging times, long bugtails, horrible security, rickety and hard to refactor, ugly code. Oh, but it's fast. Fast and broken. Wonderful. Wonderful. Grandma is so much happier with her 15% faster something or other that spends 95% of its time idle waiting for user input. Oh, but that 15% of 5%...man that 0.8% is keeping me up at night! We forgot to zoom out to see that people need and depend on reliable software and utterly failed at that. It's as if we removed seatbelts, airbags, windshields, and mounted knives in random places in cars because we thought everyone wanted to go fast. No other disciple in all of engineering is so wrong with their priorities. God, society should be hopping mad at us for being so hostile to them.
[1] Obviously, performance differences between "fast" languages and "safe" languages are so variable that we might as well be talking about comparing Fords to Ferraris--without specifying whether we are talking racecars (Ford makes some fast ones)! But 15% is a number we see often in the JVM world. After years in this field, I think the fastest safe language is not more than 15% off the fastest unsafe language, across the wide spectrum. Sometimes, you can get absolutely the same performance for the innermost hot loop out of a safe language vs unsafe language in some situations, you can get 2x slower. But people get terrified they'll never figure out why, that there is some kind of hidden, unknowable "language cost" they'll never get offer. Which is of course hogwash. Every program can be tuned and improved. Offer tools to find and remove bottlenecks and stop being sloppy with allocating memory.
There’s one other big question I’d ask: how much performance has been left on the floor because people are afraid to substantially refactor old C/C++ code or, especially, make it concurrent? I would bet that the algorithmic and parallelism improvements would in many cases be far more impactful, especially on user-visible latency, than what’s hypothetically lost by switching to a safe language with modern features.
I think you missed the main point, which was the "billions of lines of code" part.
It's not a question of simply retiring C++.. it has to be replaced with something. Even safe languages frequently rely on libraries implemented in C or C++ as part of their runtime.
Even if you were willing to pay a 100% performance penalty (and there are plenty of places where I'd be fine with that), it's still a massive undertaking.
And usage of C++ is still growing exponentially, so the number of lines of code is growing faster than our ability to rewrite/replace them. I know, it's a losing battle. We'll never do it. We are stuck with C++ until the end of time because of the sunk cost fallacy. If everyone who currently works on C++ instead switching to rewriting instead of adding, we'd be able to replace everything in 5 years.
If C++ is only kept around because of a few libraries needed to implement the runtime of safe languages, then I would consider that some kind of victory.
And that's new code, not legacy. Embarrassing. I would have expected better from something written in this century. C++ does sort of have a safe subset. If you're passing raw pointers all over the place today, you're doing it wrong.
Anecdote: I had to track down a bug in one of my programs once because a vexing parse caused the compiler to emit a call to the wrong specialization of a constructor, leading to certain fields remaining uninitialized. This was in a small, greenfield project that I wrote using the full force of modern C++17, using the safest, most idiomatic constructs I could find. The fact that no compiler or sanitizers made so much as a peep makes me feel that memory unsafety is embedded so deep into the core of the language that extracting out “the good parts” is probably of similar complexity to just detecting all the problems ahead of time. (That is: impossible.)
Not refuting that c++ can be made safe easily, but this particular error should be blocked by clang-tidy. Assuming you use it and configured it properly, picking the right analyzers is an equally witchcraft as writing memory safe code in the first place. https://clang.llvm.org/extra/clang-tidy/checks/cppcoreguidel...
I’m will have to admit that I haven’t tried clang-tidy on it (I will once I get back to a computer), but I am unsure that it would be able to diagnose the issue. I linked to the code in another comment (https://news.ycombinator.com/item?id=23289614) but the issue was that I have multiple template specializations of a class, each of which has a constructor that fully initializes that specific specialization’s members. The specific problem was that I used parentheses in a context where I was letting the language deduce which constructor it needed to call based on template arguments but (as far as I understand) this made it deduce the type incorrectly and always call the unspecialized version that had none of the members, but the type system let me coerce that into one of the specialized ones (also not sure how, but presumably some magic along the lines of “they were the same class”) and the non-shared members at that point were not initialized. Oh, and this was in a constexpr context which essentially generated a switch statement on an instruction opcode at compile time. So I’m not confident that an automated tool could pick this up, although if when I try it it finds the issue I will be extremely impressed.
This can be avoided to some degree by the practice of giving all primitive members a dummy in-class initialiser. E.g. int64_t mSomeIndex=-1;
Sadly I don't think the tooling exists to enforce something like that automatically, because C++ is so hard to parse (a separate issue to memory safety). But when/if metaclasses finally land, it should be possible to write a "fixed" class/struct keyword that default initialises all fields.
Ah, that sucks. I don't think there's a solution to that apart from just adding an "Invalid" field to every enum class and initialising to that, but checking for that (asserting not invalid) adds ugly boilerplate to every usage of every enum.
All the usual warnings in both GCC and Clang, if I remember correctly, although you are free to try it yourself: https://github.com/regular-vm/libencoding/blob/1e12adf04d1d4.... On that line, if you replace the brackets with parentheses it will call the wrong constructor and leave parts of the object uninitialized; if you can get something to warn about it I would be very glad to hear more details on what I should be doing :)
(Oh, some background before I share that monstrosity with you: the project is a virtual machine I designed for a class I taught recently, and I experimented in implementing it in modern C++ and had a bit too much fun trying to see how much of it I could encode in the type system, which is why it will probably take a really long time to compile. Here is some code that actually uses that header, for reference: https://github.com/regular-vm/emulator, and I should note that I have internally changed some of the architecture slightly do accommodate an assembler that I have put aside for now.)
Ah, I don't think that's a vexing parse issue; it's something else. It's a really great example though, because it illustrates a lot of different issues at once, and I can tell you exactly how you could've avoided it (though a some of the blame does lie with C++). In fact, it's possible you haven't realized what really went wrong yet. What you have (or had, until you used braces) is a heavily-templated version of what simplifies to the following:
struct Instruction { explicit Instruction(int encoding) { } };
using T = Instruction &&;
int encoding = 0;
auto &&result = T(encoding);
There are a few things that went wrong here, but the final red flag to notice is that you should never call a constructor directly with 1 argument directly, because it's just a different syntax for a C-style cast, which we know is dangerous due to its bypassing of safety checks—and this is true regardless of whether we're dealing with C++ constructs (like classes and constructors and such), which I think might be what you're realizing now. (This is poor C++ design, but it's old and people know to avoid them syntactically just like C-style casts. It might be nice to have a warning for it too.) Rather, when you're passing a single argument, you want to write one of these syntaxes:
T result(encoding); // option 1
auto &&result = static_cast<T>(encoding); // option 2
With these, you receive an error, e.g.:
error: invalid static_cast from type 'int' to type 'T' {aka 'Instruction&&'}
I believe this is because the code requires 2 conversions to occur at once, which is an error because (surprise!) it's a generally unsafe thing to do: (a) conversion of int to Instruction, and (b) conversion of Instruction to Instruction&&.
Now you bypassed this by using the uniform initialization syntax (i.e. braces). I'm going to go out on a limb here and say that was another mistake, even though it "solved" your problem here: despite the widespread use, brace initializers are, in my experience, not a good thing, and it's unfortunate that people embraced them (ha) with open arms, and similarly goes with emplace_back() and some other things which I'll address below. The syntactic convenience they provide is just too minor compared to the issues they introduce or obscure. And in this case, they indeed actually introduce a new issue if you use them like 'T result{encoding}': they allow 2 casts to occur at once. I find that incredibly dangerous, and I think it should be at least a warning if not an outright error like before. It seems like a C++ design flaw to me, but in any case, maybe someone should get compiler writers to add a warning for this.
Anyway, let's move on. If you're reading this, you're probably noticing that you ended up with Instruction&& in the first place—that's probably not what you wanted, or at least not what you should've wanted.
And that's where we get to the heart of the issue: your real problem is that you used decltype. If I saw that during code review, I would force you to change it—and using it on declval is just adding more fuel to the ember.
The reality—which unfortunately you do not see people acknowledging—is that decltype, auto, uniform initialization syntax, emplace_back, etc. are all dangerous, and harder to reason about than they look. I think it's unfortunate that the C++ committee encouraged people to use them so much, and I think they're overused to an insane degree. People who love them for their nice syntax don't go out of their way to figure out their pitfalls, but I almost never use any of them unless I absolutely need to. Most problems that they solve (one notable exception being 'auto' with lambdas) were quite elegantly solved in C++03 using typedefs. The only caveat was that you had to give up on the idea of minimizing keystrokes. It's quite a realization when you realize that solves so many of your problems.
Anyway, I'm not trying to blame these on you. Obviously these are blamable on C++, and we could (and should) have more warnings for them. Rather, my main message is that you can avoid these problems (even if they're other people's faults) syntactically—and locally—if you don't try to embrace the absolute "latest and greatest" in C++. IMHO you should only use the newer features if they solve an actual semantic problem for you, not merely because they minimize your typing.
In fact, I think is a huge mistake people make with software in general, and here, C++ in particular. They feel if something is old then it must be bad and you have to do everything in a new way. But if you stick with what works and start caring less about people looking down on you for using "old" syntax just because it's old, you'll find a lot of the old C++03 patterns (typename Pair::first_type, etc.) are actually robust to the problems that the newer ones introduce. People just don't realize this because they're more verbose than they would like, and we're in an era where doing things old style looks bad for no good reason.
Oh also, one last thing: aside from avoiding decltype and using the equivalent of a C-style cast, one more thing that helps you avoid this is to avoid overusing templates. They also obscure what's going on, like here. Not to mention the slow compilation speed and lack of independent compilability. Those are also overused (and I see their appeal) but they're often unnecessary and make code statically difficult to reason about.
> In fact, it's possible you haven't realized what really went wrong yet.
Not only is it possible, I think it is very likely, although your explanation is something I can follow along with and very much appreciated.
> There are a few things that went wrong here, but the final red flag to notice is that you should never call a constructor directly with 1 argument directly, because it's just a different syntax for a C-style cast, which we know is dangerous due to its bypassing of safety checks—and this is true regardless of whether we're dealing with C++ constructs (like classes and constructors and such), which I think might be what you're realizing now.
Huh, interesting, I actually did not realize this. Is there a way to do this safely without creating an extra lvalue? I take it that there is no “extra explicit” keyword I can add to prevent this kind of accidental call, is there?
> I'm going to go out on a limb here and say that was another mistake, even though it "solved" your problem here: despite the widespread use, brace initializers are, in my experience, not a good thing
Yeah, I am not really a fan of them either :( Even I know of a bunch of caveats about them and C++ initialization is an extremely complicated topic…
> If you're reading this, you're probably noticing that you ended up with Instruction&& in the first place—that's probably not what you wanted, or at least not what you should've wanted.
No, but as you observed that it “works out” at some point in the pipeline so I obviously did not care to really figure out if this was what I wanted or not.
> And that's where we get to the heart of the issue: your real problem is that you used decltype. If I saw that during code review, I would force you to change it—and using it on declval is just adding more fuel to the ember.
Somewhat strangely, C++ seems like the only language where I would even consider to use such a construct. I think every other language just erases their types or simplifies them so you can be comfortable writing something like “Iterator i = collection.start” or “int size = collection.count” whereas in C++ you have some generic distance_type and it feels dirty to just work with a size_t or whatever you know the thing to be.
> IMHO you should only use the newer features if they solve an actual semantic problem for you, not merely because they minimize your typing.
A good point, but I would like to just mention that this was clearly an experiment in trying out the “latest and greatest” ;)
> Not to mention the slow compilation speed and lack of independent compilability.
Wait, you’re telling me my 100 line program shouldn’t take a dozen seconds to compile?!
> Is there a way to do this safely without creating an extra lvalue?
The static_cast<T>(arg) syntax I used does exactly this! It's what you should use pretty much everywhere instead of T(arg). If it's too much typing, yeah unfortunately it is, though life is a lot easier if you can e.g. bind 'sc' to expand to it in your editor.
> No, but as you observed that it “works out” at some point in the pipeline so I obviously did not care to really figure out if this was what I wanted or not.
Yeah... sadly C++ is just about the 2nd-to-last last language you should deal with like that. The last probably being C. :-) Pro tip that might make it easier to avoid this: use typedefs very liberally. They help you avoid auto/decltype/etc. and are quite robust. (At least if your reviewers let you. If they don't, they probably haven't learned it the hard way yet.)
> Somewhat strangely, C++ seems like the only language where I would even consider to use such a construct. I think every other language just erases their types or simplifies them so you can be comfortable writing something like “Iterator i = collection.start” or “int size = collection.count” whereas in C++ you have some generic distance_type and it feels dirty to just work with a size_t or whatever you know the thing to be.
Those languages break too actually. Go Google "binary search bug" (with quotes). For example in C# there's Length and LongLength, which is dirty. When what they really need is just a native int. Another C++ tip: almost every 'int' or 'unsigned int' you ever deal with should be size_t or ptrdiff_t, because at some point or another it's probably an array index. It's very rare for that not to be the case; the only case I can think of off the top of my head is a logarithm (i.e. the shift amount in a bit-shift expression) or a timestamp (long long). Unless you're writing a generic STL-like container or allocator type (in which case, best of luck...), you won't need to care about difference_type or size_type.
> A good point, but I would like to just mention that this was clearly an experiment in trying out the “latest and greatest” ;)
At some point, the idea that a tool is fine, it's just that everyone is using it wrong starts to become a reason to look for alternatives to the tool. Chromium is a large cpp project, with lots of resources behind it, but people still fail to write safe code for it. There are small bore efforts that can be done to tangibly improve things, but in the end it seems like the real solution is to start writing components in a language that offers better memory safety and maintains high performance. That is a huge, risky effort as well, of course.
Every C++ fan claims that this safe subset exists, but none of them can tell you what it is, or how to tell whether any given library is written in it or not.
That's a really good point. Is it possible yet to define a set of enforceable restrictions for new C++ code that makes it memory-safe?
There are some non-lint type things that would help in a safe mode. These all need type information; they're not just syntax.
- Can't keep a raw pointer. If you create one, it has to have local scope and cannot be copied to an outer scope. This is like a borrow in Rust, and limits the lifetime of the pointer. Most uses of raw pointers involve calling legacy code, and don't need much lifetime. Most trouble with pointers involves them outliving the thing to which they point.
- Can't read into or memcopy into any type that is not fully mapped. That is, all bit values have to be valid. Char OK, int OK, enum not OK, pointer not OK. This is better than prohibiting binary reads or memcopy, because programmers will not be tempted to bypass it.
- Casts into non fully mapped types are prohibited. If you need to convert something to a non fully mapped type, it requires a constructor, with checking.
That gives a sense of the general idea. Do enough analysis to see if something iffy is safe, and prohibit the cases which are not easy to show safe.
So-called "modern C++", using std::string, RAII, unique_ptr (and shared_ptr).
It's not perfect but it's certainly a giant leap from old-school C.
It's something I don't see Rust proponents address. It's easy to build a straw man argument of Rust vs. old-school C, but it's a more natural path to go from C to modern C++ than to go from C to Rust. You get to keep your compiler, build system, tools, libraries and indeed existing code.
> So-called "modern C++", using std::string, RAII, unique_ptr (and shared_ptr).
And yet we see the same memory-safety bugs come up time and time again in supposedly-modern C++ programs. So either this modern way is not enough to avoid these classes of bugs, or people very quickly fall to the temptation to use unsafe constructs due to performance or just because it's easier to write.
We're humans. If there's an easier way to do something, even if it's less safe, we'll invariably do it sometimes. I like that Rust makes it harder to do so, and makes you explicitly say that you want to do something unsafe, which I imagine deters a lot of people from going down those paths. And when someone writes a memory-safety bug in unsafe Rust, they get much more egg on their face than if they were to write the same bug in C++.
The memory management issues in my C++ programs (recently, real-time audio stuff) are where I explicitly decide not to use modern C++ / automated memory management, and write my own allocators that rely on malloc/free or some variant (aligned_alloc, etc.)
In Rust I suppose I would just use an unsafe block and have the exact same issues.
Note that good unit testing, assertions and sanitizers generally take care of the issue.
Allocators are not the only source of memory-safety bugs, and RAII can't fix all problems. In particular, aliasing probably caused most of the safety problems Chrome has had to deal with (I admit that I haven't gone and counted them). This is especially true for use-after-free bugs, where some part of the code has a pointer to some memory that has been deallocated; there's not much way for that to happen without aliasing. In Rust, memory can only be deallocated if there are no outstanding references to it. That can be determined at compile time (specifically by the borrow checker), or at run time using reference counter.
Unit tests, assertions, and sanitizers are nice, but they demonstrably don't work; Chrome and Firefox use all three. They have hundreds of thousands or millions of tests, assertions on every other line, and all kinds of compile-time sanitizers but they still have hundreds of memory safety problems a year.
We built a new project in all "modern C++". It is 100% shared_ptr, unique_ptr, std::string, RAII, etc. It initially targeted C++17 specifically to get all the "modern C++" goodness.
It segfaults. It segfaults all the time. It is entirely routine for us to run a new build through the CI process and find segfaults. We fuzz it and find dozens of segfaults. Segfaults because of uninitialized memory. Segfaults because dereferencing pointers. Segfaults because running off the end of arrays. Segfaults because trusting input from the outside world ("the length of this payload is X bytes").
This is where the "modern C++" people tell me we must be doing it wrong. But the reality is that "modern C++" isn't as safe or as foolproof as the advocates say it is. But don't take my word for it - this whole thread is about Google people coming to the same conclusion.
Meanwhile I can throw a new dev at Rust and watch them go from zero to works in a week or so, and their code doesn't segfault, doesn't panic, and actually does what it is supposed to do the first time. Code reviews are easy because I don't have to ponder the memory safety and correctness of every line of code. Reasoning about unwrap() is trivial. Finding unsafe {} is trivial (and removing it is also usually easy).
I too used to program in C++. Every Monday morning, it was the same routine: as I enter the office, the stench of decaying bodies is overwhelming. Yet I must gather my strength to collect and identify the people killed over the week-end by C++ memory safety issues gone unchecked. Once that's done, I start my daily Scrum standup at 11 and I start coding a bit. First build at 11.30, first segfault at 11.35. Then it's pretty much the same routine, after lunch I read the Valgrind and ASan reports, spotting which one of the hundreds of new safety issues it identified might be an easy fix. I go back home riding my bicycle around 7pm, making sure to avoid the cars trying to crash against me due to segfaults. Sometimes I cry at night thinking about all that.
And then one day I found Rust, and all those problems went away. I can now write fearless code, and I don't have to endure the stench of rotting bodies anymore.
I have to say, Rust is looking more mature lately. I wrote a little RSS reader in Rust two years ago, and it was a pain to get all the library version dependencies lined up. Yesterday I recompiled it. No more need for version pinning or Github references; it just worked with a default cargo.toml file. Two years ago there was too much "only works in nightly" or "you need to use this version of that library". Progress.
Any progress on a C++ to Rust converter? Not a "transpiler". Something with enough smarts to figure out when to use native Rust arrays, not "offsets" to imitate pointer arithmetic. I'm surprised that one of the big C++ users, like Google, doesn't have a group doing that.
Something like the cxx crate[1]? You specify your shared objects between C++ and Rust, and it spits out code for both sides.
The guy who maintains it said in the reddit thread[2] about this same topic that the Google people have been sending him good PRs, which is presumably related to integrating Rust into Chrome.
I see this claim all the time, but can you give some examples of large C++ projects that don't constantly struggle with memory safety issues? (And are looking for them, of course.)
I would argue that the closer you stay to C while using the good features of C++, the safer the code is.
And obviously well written and debugged C code is nearly always more robust than C++ code. But for ideological reasons most people here are unable to acknowledge that, perhaps because they cannot do it.
Manageable in C. Integer conversions are only a tiny fraction of all the other implicitness in C++.
> decays from enums to integers
Compiler warns. Recent real compilers like gcc even have exhaustiveness checks like OCaml.
> implicit conversions from integers to unexisting enums.
Compiler warns.
- no bounds checking
Reason about that and implement your own scheme. Or prove.
> implicit conversions between pointers and arrays
Have not seen a single bug due to that in more than 1000000 lines of C.
> null terminated strings, that occasionally aren't terminated
Have not seen a bug due to that, this is the canonical example of an overblown hypothetical threat.
> abusing null terminated strings with clever algorithms, e.g. strtok()
Prove the algorithm or don't use it. Hint: As far as proofs are concerned, NUL terminated strings are like Lisp lists terminated with NIL, hence a well-founded data structure that is easily amenable to proofs (unlike C++ constructs).
- the preprocessor
Rarely introduces anything and is still required for C++, especially in sane test suites.
I don't think all that came from C, things like typedef being an alias rather than a separate type seem much older.
You are again just throwing dirt at C, mocking all people who write actually robust buzzword free software.
You are ignoring that C code is much easier for formal proofs that C++ code (the kind that you advocate).
You are ignoring 32 years of proven C exploits since Morris worm introduced in 1988, UNIX kernel exploits, including those that Linux collects every year, which lead to Microsoft (Azure Sphere), Google (Android 11), Apple (iPhone X) and Oracle (Solaris SPARC) enforcing hardware memory tagging.
Which happen to be written in C with several layers of code review and static analysis.
So by your reasoning those 32 years have not happened, in spite of being so easy to prevent exploits in C code.
Here is a little tip for you, Microsoft has been acknowledging security issues with C and C++ since the XP SP2 days.
Which is why Windows happens to have plenty of mitigations that only recently FOSS UNIX clones are catching up to.
Yet they have come public that hasn't been enough, hence the migration effort away from C, enforcing programming guidelines with C++ and coming up with plans to migrate to safer systems programming languages.
Guess which OS vendor is now having first party support for writing GUIs in Rust?
But I can also rephrase what Oracle, Apple and Google have stated in the same vein regarding OS security.
Or maybe you prefer the statements of an UNIX hero instead?
> It only works when everyone plays balls and doesn't do C style coding, ever.
If that’s the problem, and solving that would solve all of C++s memory-issues, why have no-one made a compiler option to simply make that code illegal? A -EUNSAFE or whatever?
Because what got C++ adopted in first place was its copy-paste compatibility with C.
C++ was also born at Bell Labs, and due to that, all major C compiler vendors quickly started shipping C++ on their boxes as well.
If you take away copy-paste compatibility you might be better off doing D, C# or whatever safe variant already exists.
Which is what many of us have done, to move to type safe languages, and only use C and C++ at the boundaries, in small pockets of unsafe code.
In fact if you look at mobile OSes, that is the reality for app developers, C and C++ are no longer the full stack languages they were 20 years ago, rather used for the kernel, drivers, compositor and shading languages, but everything else happens in safer languages.
And the SDKs only allow you to write libraries, not full applications.
Naturally are clever developers that subvert the workflow and transform the libraries into the actual application.
Because this would refuse to compile 100% of real-world C++ code, so nobody would use it.
One problem is that C++ can directly include C headers of the operating system, while languages that aren't copy-paste compatible with C have to create some kind of wrapper library for them... this is a major reason for the success of C++, but it also makes improvements of this kind far harder to deploy in practice.
I think you'd have to do the same thing that you'd do in non-C++ languages: write an abstraction layer that encapsulates the low-level OS access (or hardware access, in embedded software) and whose implementation is unsafe, but whose interface is "safe", whatever that means.
Once you have that to build on top of, such a compiler flag could make sense... if it were possible in C++, which I'm not sure about.
So in about half a decade the C++ camp has made no improvements wrt to tooling and statically verified safe code, it’s been all talk and no show. And in the meantime Rust has improved massively.
It’s obviously still too early to declare a winner, but to me this sounds like a turtle slowly but surely overtaking a rabbit.
I design high-scale database kernels, large code bases, written in modern C++. I can’t remember the last time we had a memory safety issue. It isn’t as though we don’t have plenty of bugs during development, just not those kinds of bugs. Competent idiomatic code simply doesn’t leave much room for those kinds of bugs to occur. If you “constantly struggle with memory safety issues”, you are doing something fundamentally wrong. That’s not something you can blame on the language.
I am always baffled by the people that supposedly write modern C++ professionally and constantly have memory safety issues. Most serious projects won’t hire you if you aren’t capable of writing memory safe code in your sleep, it is a basic skill.
Is it possible that you haven't found memory safety issues because you haven't been looking for them, haven't been fuzz testing, or haven't had adversarial users of your codebases?
These are database kernels used for mission-critical things with extreme workload profiles. It has to survive longevity testing at saturating workloads with continuous fault injection. If there were memory safety issues, I think someone would have noticed by now.
The reality is that there isn’t much opportunity for memory safety issues to occur anyway, the type system and scheduler do most of the heavy lifting. Similarly, concurrency safety isn’t much of an issue because threads barely interact. Most high-performance server software looks this way these days.
Bugs tend to be of the boring logic variety that can happen in any programming language.
> These are database kernels used for mission-critical things with extreme workload profiles.
> Similarly, concurrency safety isn’t much of an issue because threads barely interact.
I think the domain you're working in isn't as susceptible to memory safety issues, but that doesn't mean they're not present. It also sounds like they domain you're working in is trivially parallelized if threads "barely interact", which limits your exposure to those memory and data race issues.
If you gave an adversary access to the API of your kernel however, how long do you think before they found a use-after-free, double-free, stack or heap overflow, etc? Days, weeks, or hours?
If you haven't run a fuzzer yet, I wouldn't be so confident.
If you are able to structure your code so there is only ever one construction order for all objects your program uses, and destruction is the reverse order, you're pretty safe for most memory safety bugs.
Sadly as codebases get more complex, that vision gets further and further from reality.
Do you fuzz your code or use something like asan/msan (https://github.com/google/sanitizers)? If not, how do you know that your code is really bug free?
Far more exhaustive testing than that, these are database kernels. Formal verification of the design and code where it makes sense. Extensive use of the C++ type infrastructure to verify several aspects of correctness at compile-time that would be impractical in any other systems language. The Linux kernel is bypassed for almost everything, most of the schedulers, I/O memory management, etc is in user space like most modern high-performance databases, which actually reduces the bug surface and increases verifiability. It doesn’t have to just be functionally correct, the performance has to be consistent under diverse and adversarial workloads.
No one can know if it is bug free, that is impractical. But it also isn’t like this is a weekend hobby project either. Most of the bugs that get out are in unimportant peripheral code and integrations.
Ok then a followup question: do you use pointers in your code and allocate objects on the heap?
I'm quite curious what tools you're using to formally verify your C++ code if you are. My understanding is that in general you can't, which is why msan/asan exist, to get a first approximation of verifying things that can't be formally verified for most C++ code.
(In general, I'm dubious of these claims that "Most serious projects won’t hire you if you aren’t capable of writing memory safe code in your sleep", because I think if you asked the majority of the members of the C++ committee if they could do that, they'd say no).
Database engines typically operate within a giant block of memory allocated at bootstrap which is used quite differently than a heap. Mutable references to the address space necessarily exist outside the address space, which presents problems for conventional memory safety models. There are also no pointers per se because almost all memory is directly paged -- objects have no fixed memory address over their lifetime. Most shops use a thin wrapper that implements a pointer-like interface (in the style of std::unique_ptr) that hides the paging and scheduling mechanics. The scheduler design, which is where most of the safety and guarantees that resources can't leak happens, is often verified with a model checker. It is perfectly safe to hold an arbitrary number of concurrent mutable references to an object because safe execution will be resolved dynamically at runtime (which is cheaper than it sounds). There are container libraries designed to work seamlessly in this type of memory model. To a developer writing code against this abstraction, it feels similar to a garbage-collected concurrency-safe language. No locking, no resource management. The abstraction isn't general purpose but it works very well for the kinds of data infrastructure applications where people use C++.
In many of these systems it is standard practice to generate arithmetically limited types pervasively. This is almost transparent in C++17. While it is possible to verify much of this at compile-time in theory, it almost never is because it isn't worth the effort (C++20 may start to change this) and testing at runtime has proven to be nearly as good. People underestimate what is possible with the C++ type infrastructure in this regard.
This type of software design was originally done because it allows for exceptional performance but has become popular for safety reasons. It uniquely allows you to make guarantees about runtime behavior under diverse adversarial workloads that would otherwise be difficult to make.
Bugs in practice tend to occur at the interface with third-party code, which requires dropping out of any internal type system, or in the form of performance anomalies due to unexpected hardware behaviors interacting with the scheduler design. Logic bugs in the core bits tend to be found in testing.
Right, so you're working in a very particular style of application. The style of programming you're using doesn't work for most software and dodges the source of some memory safety issues.
Presuming that this style works equally well for all software seems presumptuous and perhaps naive, does it not?
Do you know of any open-source code architected in the way you explain above. Have you looked at Scylladb codebase, do you know if it is architected this way.
"sort of" is the key. If the language doesn't prevent you from shooting yourself in the foot, on any large project bad things will happen.
Safer languages are better tools because they make obvious when you're stepping out of the safe zone, which is not the case with C++, even modern releases.
Non-owning pointers are absolutely a problem if the object graph is large and complex enough. Many objects in Chromium have lifetimes managed by smart pointers, but unfortunately, that doesn't do anything to protect against code that mistakenly violates an invariant like "object X must always outlive Y due to these non-owning pointers". The problem is C++ turns a violation of lifetime invariants into undefined behavior rather than safely crashing.
Chromium's object graph, for better or worse, has a lot of nodes and edges. Operations like tearing down a document that's navigating away are full of complexity. Executing JS is fraught with peril, since it's possible that objects currently on the C++ stack will get destroyed by an operation triggered by user JS. The modern web is just really complicated.
There's actually been quite a bit of work to bounds check accesses for containers implemented inside Chromium, such as span and optional, but it's harder to get these checks into upstream libc++.
GC is one way to reduce the number of memory safety issues, but there are often tricky interactions between GC and non-GC code. Another issue with GC is it's often much harder to reason about object lifetimes.
Do you know of any fully-featured Web browsers that have been implemented with "safe" language, such as C# or Java? Would be a really interesting project to see!
Very early on, Sun had a web browser implemented in Java. Unfortunately, this was dropped before Java got Hotspot and with that the decisive speed up.
(They also had a web server implemented in Java, which sounds like an excellent idea too)
Neither automatic bounds checking nor GC really add slowness. Any reasonably safe program will check the bounds explicitly, so there is no speed gained vs. guaranteed bounds checking via the compiler. Actually any good compiler can remove most bounds checks, if it can proove that the access is in bounds. Also GC can be quite faster than manual dynamic memory management.
Switching to a safe language is, obviously, not an option.
However, it is possible, especially with resources of Google and other C++ shops to develop an advanced lint, which could be just a ripoff of compile time borrow checking from Rust (taking into account complexity of c++ memory model).
Flagging simple things like use after free would already be a big deal.
Nowadays it probably should be a plugin for clang or something similar instead of lint.
> Switching to a safe language is, obviously, not an option.
Forever, never, ever, ever?
Switch, right now, is not a good idea, but NEVER switch is the cobol curse.
And not forget:
C/C++ cause BILLONS of damage and costs in this industry. Is WELL KNOW, for DECADES that is the wrong tool for the job.
Yet, the persisten myth that "lets not rewrite to something better" hold everything back.
Imagine how stupid will any of us look if a customer ask us for build a new version of the software (or port to the web, or to mobile or cloud) and we say "nope, sorry. Is not good. Not rewrite sir!"
Or if the software WE build have be caught, OFTEN with severe security and reliability bugs and we answer equally "we can't do anything better, is not a option".
Dumb, right?
Why rewrite is ok, normal and good for others, but not us?
And why the excuse that the MOST PROFITABLES COMPANIES IN THE WORLD (Apple, Google, MS, ...) can't do rewrite, is too costly.... and then the rest PAY the cost anyway, with not solution on sight...
> We should have more courage and attempt big things.
> Like writing the program we already wrote....again.
That's not named courage, that's named stupidity. Specially when the program you talk about is several millions lines of code and the results of years en engineering of entire teams.
> We take this option off the table. Cannot rewrite. Definitely not in a safe language! Surely we have ceased to dream?
There is people that dreams and there is the one that code.
Evolution are (almost) always preferrable to Revolution.
In this case, Evolution can mean:
- Rewrite progressively the security sensible part in Rust mixed with the legacy (aka the Mozilla way)
- Develop the tooling to make C++ safer/ or even better safe. That has already been done for a subset in the aeronautic industry
- Work with the comitte to make C++ itself safer, which Google is already doing.
Indeed, however Google's usage of C and C++ sometimes is quite interesting.
For example, Android source code is also not a very good example of modern usage of C++ security features.
After 10 years, the NDK keeps having a C only API space, with all the memory corruption issues that it entails, even though the actual implementations are in C++ or Java (via JNI).
So their workaround, starting with Android 11, is requiring hardware memory tagging in all ARM devices, while having kernel fuzzing support for other platforms lacking such hardware capabilities.
The C only interface is an absolute necessity. Only C++ can properly interface with C++ while C can interface with the rest of the world. And system libraries are there for whole world to use. if the system interfaces were converted to C++, all applications need to be written in C++ as well. You can wrap some features in other languages, but definitely not all of them.
Right, but in 10 years they surely would have the time to offer a type safe C++ API in addition, just like in other platforms.
Instead, they have been postponing it since NDK was introduced in Android 2.0 until after the introduction of native packages, which will only arrive with Android 11.
That's why in the early 90ies people stepped back and created Java. Lisp was in its AI winter, but memory safety was still top priority for the industry.
Not at all surprising, but it's not like Rust (and others like it) are going to be a silver bullet against this.
Any complex enough program running on a stored-program computer (where code is data and data is code) will be vulnerable.
You can reduce the attack surface, but never eliminate it.
I gave up C++ 10 years ago and switched to more modern languages. I figured the language would die off, so am a bit shocked its still so prevalent. I really dont think there is any reason to use C or C++ any more, except perhaps some low level layer that interacts with the hardware.
Even if there are 0 lines of code written in C from now on, existing codebase (huge) still has to be maintained. C wont be dead in our lifetime.
Rewriting everything is also not an option. Imagine rebuilding your house with modern materials, because old ones may have some shortcomings. Who will pay for that?
Even if C++ is used less than before it won't die off and C++ experts can demand more money especially because there are fewer of them as most programmers move to more modern languages.
Yep, the same case. Only I read that those old COBOL programs are a huge mess of spaghetti code, so you are well paid, but you have to put up with ugly, hard to maintain code.
I'd rather work on something nicer for less money.
C++ can be used safely without memory safety issues. But it requires expertise with the language, which is a very steep barrier for most programmers, even ones on the Chrome team to pass successfully. In a project as large as Chrome the vast majority of programmers will not have the necessary expertise. And sadly, in this case, what you don't know will hurt you. Is that a reason for not using C++: maybe. But you would be hard pressed to find a more performant language that is mature and stable.
We need to let go of this idea that there are godlike engineers out there who can write perfect code. Everyone makes mistakes. As long as the compiler allows it, such mistakes will continue to happen.
> in a project as large as Chrome the vast majority of programmers will not have the necessary expertise.
Then how do we staff such projects? Hire only people with a track record of never writing such bugs? It’s not possible.
No, even world-class C++ experts can and will make subtle memory safety bugs across abstraction boundaries. Think about millions of lines of C++ codes which have their own assumptions on invariant and contract. Their time is a limited resource and for many cases memory safety is not the most valuable thing to put their efforts.
> In a project as large as Chrome the vast majority of programmers will not have the necessary expertise.
So … you’re talking about Google not being able to pay enough or have a good enough reputation to hire developers who want to work on one of the highest-impact codebases in the world, not to mention contributing to the standards process and popular tools used to improve security for the entire community. Who realistically should look at that and say “no problem, we’ll do better!”?
Interestingly enough Microsoft arrived at the same number. I think it really stresses how hard it is to reason about memory manually. I'm still surprised how much stuff is being written in non-safe languages even if people could get away with a managed language.
One thing that I just recently noticed when scrolling through some github repos is how much new software in the linux ecosystem is still written in C despite it being probably avoidable, like flatpak.