This article is wonderful and I hope as a setup for second article, "How to Rewr...

cliffcrosland · on Oct 23, 2019

> But the biggest reason to RiiR is safety.

100% agree. We had a C++ service that made heavy use of libcurl. A particular release of libcurl introduced some memory safety problems that caused frequent segfaults for us. These memory safety bugs were eventually fixed in another release, but it scared us enough that we investigated rewriting the service in Rust.

After successful experimentation with some prototypes, we eventually rewrote the service in Rust, auditing the unsafe code in our dependencies (of which there was very little). No segfaults ever since.

Side benefit: Since Rust networking libraries tend to have strong async support, and since some of our C++ libraries performed synchronous networking operations, we saw a big improvement in performance. The number of threads needed dropped by 5x.

pron · on Oct 23, 2019

> But the biggest reason to rewrite it in Rust is safety.

This is not a good reason if there are cheaper ways to get safety (at least for existing codebases). Indeed, there are sound static analysis tools (like TrustInSoft) that guarantee no undefined behaviour in C code. Using them is not completely free, and it may even require adding annotations to the code or even changing the code, but it does seem significantly cheaper than a rewrite (in any language). Such sound static analysis tools are already being used for safety-critical systems in C, and I believe they are more popular than a rewrite in Rust in that domain (where there are reasons not to rewrite in Rust other than just cost, though).

kragen · on Oct 23, 2019

Undefined behavior is not the only kind of bug in C programs, and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it. Consider that one of the worst security bugs in history was a result of Kurt Roeckx eliminating an undefined-behavior bug from OpenSSL.

pron · on Oct 23, 2019

It's pretty much the only kind of bugs that Rust can prevent, too.

> and it's far from clear that fixing all the bugs, or even all the undefined behavior, in an existing C library will be less effort than rewriting it

It's pretty clear to me.

> Consider that one of the worst security bugs in history

I'm not sure what bug you're referring to, but if it's Heartbleed, than that was an undefined behavior bug. Of course, a functional bug can be introduced at any time, including during a rewrite in another language.

kragen · on Oct 23, 2019

No, I'm talking about the Debian OpenSSL bug Luciano Bello discovered. It was a lot worse than Heartbleed. Kurt Roeckx didn't introduce Heartbleed, and Heartbleed wasn't introduced by removing undefined behavior, so there is no plausible reason for you to infer that I was talking about Heartbleed.

As for the cost of rewrites, there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it; see Glass's Facts and Fallacies of Software Engineering for details and references. Also, though, it should be intuitively apparent (though perhaps nonobvious) that this is a consequence of the undecidability of the Halting Problem and Rice's Theorem — it's impossible to tell what a given piece of software will do, which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

pron · on Oct 23, 2019

> so there is no plausible reason for you to infer that I was talking about Heartbleed.

Except that Heartbleed is the only OpenSSL bug I've heard of :) Also, I don't know who Kurt Roeckx is.

> there's a lot of evidence from software project metrics that the cost of modifying software can easily exceed the cost of rewriting it

But we're not talking about arbitrary modification, but about, at worst, fixing undefined behavior, which requires only local modifications (or Rust wouldn't be able to prevent that either). As an ultimate reduction, you could choose to rewrite the software in C and still use sound static analysis to show lack of undefined behavior.

> which means that the cost of reproducing its existing behavior in well-understood code is unbounded.

Yes, but that still doesn't mean that a rewrite is cheaper. Also, while your conclusion is correct, your statement of Rice's theorem is inaccurate: it's impossible to always tell what ever piece of software would do. It's certainly possible to tell what some software would do, at least in some cases, or writing software would be impossible to begin with.

BTW, if you're interested in the theory of software correctness, you might be interested in this blog post of mine, that lists relevant results: https://pron.github.io/posts/correctness-and-complexity

kragen · on Oct 24, 2019

I appreciate your clarification! Indeed, I didn't mean it was impossible to tell what any software would do in any situation, only some software (in practice, nearly all) in some situations. The contrary would imply that not only writing software but also running it would be impossible.

Your blog post looks very interesting indeed! I will read it with care.

I do think there's a subtle point about modifying software. Not just any modification of the software that lacks undefined behavior will do; we want a modification that preserves the important aspects of the original software’s behavior. Not only is this easy to get wrong—as shown spectacularly by the OpenSSL bug (which you've presumably looked up by now), but also, for example, by the destruction of the first Ariane 5—but there is no guarantee that it can be done with purely local modifications, even if the final safety property you wanted to establish can be established with chains of local reasoning.

I do agree that sound static analysis of C that is written to make that analysis tractable is just as effective as rewriting in Rust. Not only can such analysis show the absence of undefined behavior, it can show arbitrary correctness properties, including those beyond the reach of Rust’s type system. Probably the strongest example of this kind of analysis is seL4, although now its proofs verify not only the C but also the machine code, thus eliminating the compiler from the TCB.

pron · on Oct 25, 2019

Yes, I looked up the OpenSSL bug you referred to, and I think it's quite unusual. I'm not sure what those lines were exactly, but from the description it seems like it was intended to read uninitialized memory, something that (safe) Rust won't let you do, either. Also, it's probably wrong even in C, but it worked. So yeah, touching code in any way is not always 100% safe, but my point was just that sound static analysis is still cheaper than a rewrite, as it requires far less modification.

As to seL4, it isn't exactly similar to sound static analysis, as the work was extremely costly. All of seL4 is 1/5 the size of jQuery and it's taken years of work. But it also includes functional verification, not just memory safety. In fact, it is among the largest programs ever functionally verified to that extent, and yet it was about 3 orders of magnitude smaller than oridnary business software, roughly the same verification gap we've had for decades. We don't yet know how to functionally verify software (end-to-end, like seL4) of any size that's not very small.

Anyway, Rust offers a much more limited form of assurance, and sound static analysis tools offer the same, and at a lower cost for existing codebases.

jfsszuj · on Oct 24, 2019

We can almost always tell what a given piece of software will do, we just can't tell what all software will do in all cases.

andrewflnr · on Oct 24, 2019

That's a pretty significant almost. Regardless, kragen's point was not about whether figuring it out was possible but how expensive it would be.

jfsszuj · on Oct 24, 2019

It's not that significant. We can tell what the vast majority existing software will do in an automated way. Compiling a program is the equivalent of encoding it's semantics in another language which implies knowing what it will do - at least that's one way of 'knowing what it will do'.

andrewflnr · on Oct 25, 2019

You can write down the physical laws that apply to a given system, but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output. The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant. You can't solve the halting problem with compilers in the same way that Newton's laws don't solve the three-body problem.

Also, did you catch the part where the point is about how expensive it is?

jfsszuj · on Oct 26, 2019

> but we don't usually call that "knowing what it will do", unless you can actually predict the state, or in the case of a program, the output.

Who is 'we'? And yes, we can predict exactly what the output of a given program for a given input is, for the vast majority of cases. All you have to do is run the program.

> The mere fact that compilers exist is a meaningless form of "knowing what the program will do", only superficially relevant.

You think static analysis, type checking, intermediate representation, optimization, the translation of the program with exact semantics into another language, etc. - is 'superficially relevant' to understanding a program?

> You can't solve the halting problem with compilers

Now that's pretty irrelevant.

> Also, did you catch the part where the point is about how expensive it is?

Did you catch the part where I was only commenting on a specific part of the comment? But tell me, how expensive is it?

kragen · on Oct 24, 2019

It would be hard to overstate how incorrect this statement is, if it is read with the implicit qualifier "for all possible inputs", without which my comment above would be obvious nonsense. Of course we can tell what most programs will do for some inputs—we can just run them!

jfsszuj · on Oct 24, 2019

Yup, we can tell what most existing software will do for all inputs. Rice's theorem states that we can't tell what all software will do, not that it's impossible to tell what a given piece of software will do.

kragen · on Oct 25, 2019

If this were the case in practice, most software would have no bugs.

jfsszuj · on Oct 25, 2019

The fact that we can determine what a piece of software will do, doesn't mean we always do that kind of analysis, or that the programmer fully understands his own code. That's why we have type systems, constraints, verification tools, etc.

erik_seaberg · on Oct 23, 2019

In practice, how much does “not completely free” cost? TrustInSoft doesn’t even publish prices, which pretty strongly suggests I couldn’t afford it nor persuade a manager to expense it.

pron · on Oct 23, 2019

The main cost either way is not licensing but effort.

chillfox · on Oct 24, 2019

That's not always true. While I have no idea of what this particular solution costs, I have seen licensing costs that would easily pay for a 20+ person team. The size of the software is important to consider when you are looking at the cost of rewriting.

pron · on Oct 27, 2019

Effort is the cost I was referring to.

safercplusplus · on Oct 24, 2019

While not as mature as Rust, static enforcement of C++'s (memory and data race) safe subset is coming along [1].

Iiuc tools like TrustInSoft are for situations where you need not just safety, but "reliability" (i.e. no crashes, no exceptions). That doesn't really scale so well to larger applications.

[1] https://github.com/duneroadrunner/scpptool

minraws · on Oct 23, 2019

I would like to add my 2 cents,

TBH I don't see a future where RIIR every piece of software would make sense, and I think you sort of put it in there somewhere in your comments.

But yes I do see that for many it would and for those, RIIR would IMHO be an incremental process of oxidizing your project to the point that there's nothing left but Rust.

Writing something from scratch and waiting for it to finish is the biggest problem we face, cause eventually the will to continue with the effort just dies. But creating a meaningful ground not just some "safer" bindings does help in inviting others to help share the effort as well.

So hopefully people will be smart and identify what they should do for their projects, should it be a rewrite or should it be a new feature that you write in Rust. :)

And no I don't think bindings are the solution to anything, they serve no purpose in Rust community as a long term measure, you shouldn't have to sacrifice safety and/or performance incase of high level language.

nicoburns · on Oct 23, 2019

> TBH I don't see a future where RIIR every piece of software would make sense

It doesn't necessarily have to be Rust, and it's a process that may take decades. But I really do think that absolutely everything should be rewritten in a memory-safe language (i.e. not C/C++).

The vast majority of security issues are either stupid misconfigurations or memory-safety issues. And currently most security initiatives are undermined by the fact that they're resting on insecure foundations. Imagine if our computing platforms were truly secure. It would be a revolution, and IMO it's a revolution that's coming.

sitkack · on Oct 23, 2019

Exactly, when we talk about RiiR, we are saying, "I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.

That said, I think containing C/C++ to a Wasm sandbox that can be integrated transparently with Rust would be a gigantic win for security, correctness and the Rust ecosystem.

carlmr · on Oct 24, 2019

>"I would like to have the guarantees that Rust provides". If the Rust ecosystem depends heavily on C/C++ libraries, then the system has the properties of the union of all the flaws and the properties of the language do not translate into a quality of the ecosystem.

Of course ideally you get rid of all C/C++, but that may not be feasible. Combining Rust with some C/C++ still adds the benefit of Rust safety to the new code you write. The contained C/C++ is like the unsafe section of Rust code, a place where you have to be more vigilant, but you're slowly constraining the space where these issues can arise. And you can iteratively only replace those parts that cause a lot of issues.

minraws · on Oct 23, 2019

TBH I still don't see it happening or being the case atleast in the near foreseeable future...

Why you may ask..? The answer is simple cause there is always a cost. So the question is what is easier or when can one say the cost of using a "safer" language out weighs a "less safer" one..

Cause otherwise Rust is also too damn unsafe, just move to something completely safe Idris could be a good starting point. /s

Putting the sarcasm aside. Rust is does not guarantee Memory safety, it tries to help with it. (Check Reference Cycles) The safest feature of Rust would be it's protection against Data Races.

So at the end it's all about the trade-offs and deciding what you are willing to sacrifice and what you aren't...

nicoburns · on Oct 24, 2019

Over the medium term (years to decades), there is no question that rewriting in a safe language is worth the cost. It's a one-time project, and security issues cost our economy billions on an ongoing basis.

Rust isn't perfect, and perfect automatically-checked safety probably isn't possible, but it's dramatically safer than C/C++, and cuts down the amount that would need to manually audited to the point where it would be feasible to do it comprehensively.

jturpin · on Oct 23, 2019

I agree that everything should be made in a memory safe-by-default language like Rust and I really need to see the Rust ecosystem really commit to this. Too many times will I come across a library with some very questionable uses of `unsafe`, which in my opinion has no place in things like HTTP request libraries or web frameworks.

pjmlp · on Oct 23, 2019

Yep, on my domain it has been Java and .NET.

minraws · on Oct 23, 2019

Java and .NET are Safe..... O.o

Edit: For the uninitiated, I am baffled by the fact that people think Java and .NET are memory safe.

Also if you meant that Java and .NET should be replaced by something memory safe.. I am absolutely for it. :)

AgentME · on Oct 23, 2019

When was the last time a Java or .NET codebase had an RCE from writing past the end of a list? The only RCEs in safe languages generally only happen in uncommon explicit code-loading calls or explicitly unsafe memory access calls.

pjmlp · on Oct 23, 2019

More than Rust in multithreaded scenarios, actually.

Not only they have automatic memory management, they have an industry standard memory model for hardware access, adopted by C and C++ standards, used as inspiration for std:atomic<> on CUDA.

While you keep being baffled, where is Rust's memory model specification?

dbaupp · on Oct 23, 2019

There's a good argument to be made that Rust is safer in multithreaded scenarios than the JVM or .NET.

Rust statically prevents inappropriate unsynchronized accesses for arbitrary APIs (for instance, the compiler will emit an error when attempting to mutate a non-concurrent object/data structure from multiple threads). Those VMs make unsynchronized mutations of individual memory locations work just enough to not be unsafe, but still allow arbitrary unsynchronized operations. These will likely end up with an incorrect result, even if there's no memory unsafety, and may completely violate any internal invariants of the object or data structure. This latter point means one cannot write a correct abstraction that relies on its invariants without explicitly considering threadsafety (it is opt-in safety), whereas Rust has this by default (opt-out safety).

Rust has essentially adopted the C/C++11 concurrency model.

pjmlp · on Oct 24, 2019

Yes, it is one less failure point to worry about, but in my experience using the respective task libraries in distributed applications, most of the multithreaded access bugs lie on external resource usage, and those Rust does not prevent.

Not saying that it isn't important though.

dbaupp · on Oct 24, 2019

Ownership/affine typing does allow modelling that, within a single program.

For instance, the type representing a handle to the external resource can have limited constructors and limited operations, that will enforce single threaded mutations or access (including things like "this object can only be destroyed by the thread that created it").

pjmlp · on Oct 24, 2019

Assuming you are accessing the distributed resource from multiple threads instead of multiple distributed processes, which is quite common in distributed architectures.

Borrow checker doesn't help at all with process IPC.

Plus all ML derived languages have good enough type systems to model network states, while enjoying the productivity of automatic memory management.

minraws · on Oct 23, 2019

Not sure where I said, Rust is memory safe.... Though.

Just scroll up, I explicitly mentioned how it isn't. I just want to say, Java and C# isn't as safe as most people think.

Define industry standard, cause nothing points me towards industry standards actually being better than not industry standards.

Has everyone forgotten these http://cs.oswego.edu/pipermail/concurrency-interest/2017-Dec...

Nevertheless who knows I might not know anything, and I love being proved wrong when it makes most of the world a safer place (quite literally)... :P

pjmlp · on Oct 23, 2019

At least in what concerns C++ there are several ways to try its safety instead of bearing the cost of a full rewrite.

Using standard library types, actually integrating sanitizers into CI, enable bounds checking (even on release builds) and above all avoid C style coding.

Naturally this doesn't work out for third party libraries that one doesn't have control over.

So from a business point of view it boils down how much one is willing to spend re-writing the world vs improving parts of it.

chc · on Oct 23, 2019

You can write safe code in C++, but it requires you to go way out of your way and exercise constant vigilance, and it's not always obvious when you mess it up. This is the approach we've been trying for 20 years, and it generally hasn't worked well, because people are both flawed and lazy. This is the thing that's fundamentally different about Rust — it provides strong guarantees by default and requires you to specifically call it out when you're doing something potentially unsafe, so to some degree it turns our laziness into a force for good.

pjmlp · on Oct 23, 2019

> Naturally this doesn't work out for third party libraries that one doesn't have control over.

makapuf · on Oct 23, 2019

Also, adress sanitizer but iiuc it's not advised to keep it in production code.

pjmlp · on Oct 23, 2019

Android ships a subset of it enabled in production.

makapuf · on Oct 23, 2019

Interesting, I could not find any links do you have some ?

pjmlp · on Oct 24, 2019

Sure,

https://source.android.com/devices/tech/debug/asan

https://android-developers.googleblog.com/2019/05/queue-hard...

Also FORTIFY has been being enabled across the codebase and will become default going forward.

https://android-developers.googleblog.com/2019/10/introducin...

Future Android devices on ARM will make use of memory tagging.

https://security.googleblog.com/2019/08/adopting-arm-memory-...

The_rationalist · on Oct 23, 2019

BTW there are garbage collectors for c++

pjmlp · on Oct 23, 2019

Yep.

C++/CLI, C++/CX, GC pluggable API introduced in C++11, C++ Builder VCL in ARC mode, Unreal C++ managed classes.

The_rationalist · on Oct 23, 2019

Do you know how much c++/cli is automatically compatible with c# code? And how much does it retain compatibility with standard c++?

pjmlp · on Oct 23, 2019

Everything that obeys to CLS.

https://docs.microsoft.com/en-us/dotnet/standard/language-in...

https://docs.microsoft.com/en-us/cpp/dotnet/managed-types-cp...

Then you can also use regular low level stuff, but in that case you will get mixed mode Assemblies (other forms are now deprecated).

https://docs.microsoft.com/en-us/cpp/dotnet/mixed-native-and...

C++/CLI is basically a set of language extensions, just like clang and gcc have theirs.

Right now it supports up to C++14, if I am not mistaken.

Many don't seem to realise that CLR started as the next evolution of COM, with the required machinery to support VB, C#, J# and C++, alongside any other language that could fit the same kind of semantics.

The_rationalist · on Oct 24, 2019

Thank you, this is very interesting!

cellularmitosis · on Oct 23, 2019

I agree, and I see a lot of analogies between this situation and the ObjC -> Swift transition which many code bases have gone through. When there are significant semantic benefits in the new language, a slow war of attrition against the old language is a good move.

otabdeveloper4 · on Oct 23, 2019

There's no such thing as a "C/C++".

pjmlp · on Oct 23, 2019

Apparently there is when reading ISO, Microsoft, Apple, Google, IBM, Microsoft,... documentation.

Guess what, we even used to have a famous magazine called "The C/C++ Users Journal".

makapuf · on Oct 23, 2019

Agreed. There Are differences between c and "c in c++" but no more than python 2 and 3 or c++11 and c++17

dmit · on Oct 23, 2019

Alternatively, you could have C/C++ as a third, discrete category somewhere in the middle of the C <---> C++ spectrum. For people who like namespaces, or overloading arithmetic operators for mathy vector/matrix/etc stuff in order to make the code more readable.

rytill · on Oct 23, 2019

pnako · on Oct 24, 2019

C+=0.5

rytill · on Oct 28, 2019

keldaris · on Oct 23, 2019

While you can essentially write C in almost any language, C++ is particularly well suited to it.

UncleMeat · on Oct 24, 2019

The abi meaningfully binds them.