A Usable C++ Dialect That Is Safe Against Memory Corruption

devit · on March 15, 2018

This works, although the downside compared to Rust is that soft pointer validity is checked at runtime, meaning that a program that compiles can still randomly fail at runtime and that performance is worse due to the checks.

The key idea and massive difference from standard C++ is that object destruction is delayed until a "quiescient state" happens in what is a reframing of RCU [https://en.wikipedia.org/wiki/Read-copy-update], allowing to freely use raw pointers as long as none survive across a quiescient state.

[note however that this system allows to take pointers to stack variables, so they have to restrict raw pointers to function arguments only - it would be better to also introduce a "heap-only" pointer that can be freely returned/stored on the heap/etc. but can't be stored in types that live across a quiescient state, from which stack-or-heap raw pointers can be derived]

This also results in the downside that things like mutexes can only be safe if they are kept locked until a quiescient state happens, since that's the only lifetime that the system understands.

Likewise, you can't do this like prevent updating a collection while iterating unless you are fine with freezing the collection until a quiescient state happens.

In general, you are much better off using Rust (or an equivalently expressive language, if it existed), since that allows to statically check for correctness, not have to delay freeing memory, and allows to use lifetimes and linear types to secure mutex locking, collection iteration, and other things where lifetimes are essential.

hedora · on March 16, 2018

I don’t understand your comment about how rust pointers are safer than soft pointers. The article explains how to implement a wide variety of pointer semantics, all of which are memory safe (throw an exception on explicit use after free, use the type system to have the compiler statically check the pointers are live, use dynamic cast, etc). Looking online, I see that people implement all the same primitives in rust, with exactly the same safety caveats.

Also, the container and mutex tricks you mention sound interesting, but I don’t see why they can’t also be used in C++ (which has a turing complete type system / checker).

masklinn · on March 16, 2018

> use the type system to have the compiler statically check the pointers are live

It doesn't explain how it would statically ensure that a moved-from unique_ptr (or equivalent) can not be used. In fact the only mentions of moves are that owning pointers can only be moved and soft pointers can be moved or copied, but C++'s move does not remove any access, it just moves the content leaving the moved-from object in a "valid but unspecified state".

Note that valid != safe. Dereferencing a moved-from unique_ptr is unsafe for instance.

Rust's affine types solve this issue, a moved-from type (Box included) simply can't be used, its scope ends when it's moved.

> Looking online, I see that people implement all the same primitives in rust, with exactly the same safety caveats.

Rust's (safe) pointers and references don't throw exceptions on explicit use after free because such code doesn't compile at all, and its equivalent to dynamic_cast has to be very specifically opted in: https://doc.rust-lang.org/1.19.0/std/any/trait.Any.html#meth...

nobugs · on March 16, 2018

> Rust's (safe) pointers and references don't throw exceptions on explicit use

Which essentially goes at the cost of having Java-style semantic memory leaks (very generally, _any_ kind of keeping-an-object-as-long-as-at-least-one-reference-exists suffers from it) => we still have to pick our poison (personally, I _strongly_ prefer to avoid refcounting, and it does work like a charm in a few very serious million-LoC/billions-transactions projects, but I do agree that opinions may differ).

masklinn · on March 16, 2018

> Which essentially goes at the cost of having Java-style semantic memory leaks (very generally, _any_ kind of keeping-an-object-as-long-as-at-least-one-reference-exists suffers from it)

Rust references work the opposite way. References don't extend the lifetime of their source, and a reference outliving its referent is a compile-time error.

> we still have to pick our poison (personally, I _strongly_ prefer to avoid refcounting, and it does work like a charm in a few very serious million-LoC/billions-transactions projects, but I do agree that opinions may differ).

I have no idea what the hell you're talking about, but you seem to suffer from pretty significant misunderstandings.

nobugs · on March 16, 2018

I'm still speaking about reference-counted RC<T>, which inevitably suffers from memory leaks. And moreover - _any_ implementation which avoids throwing an exception, in quite a few use cases has no other choice than to resort to keeping the stuff until the last reference to it is killed, inevitably causing Java-style semantic memory leaks.

P.S. FWIW, Rust's references ~= OP's "naked pointers" (NOT 'soft pointers'), and SaferCPP's 'scoped pointers'. A useful tool, but is not sufficient in quite a few real-world use cases.

catnaroek · on March 16, 2018

“Throw an exception on explicit use after free” is “memory safe” in exactly the same sense “throw an exception on explicit use of an operation on an argument of the wrong type” is “type safe”. In other words, not at all.

BeeOnRope · on March 16, 2018

That's not a strong analogy. Memory safety almost always involves some type of runtime checks: even memory safe languages usually (always?) have runtime checking of array bounds, for example.

So it's understood that a memory safe language will generally be composed of syntactic and semantic features which help at compile time, and runtime checks to close any remaining holes. If some particular implementation happens to have a few more things in the latter category compared than is usual, that doesn't prevent it from being "memory safe", it just makes it more awkward to use and prone to bugs.

In principle, the same distinction applies even to type safety.

catnaroek · on March 17, 2018

> Memory safety almost always involves some type of runtime checks: even memory safe languages usually (always?) have runtime checking of array bounds, for example.

Indeed, array manipulation is completely unsafe in most languages, as array indices are effectively unityped. (This may be a unitype of everything, as in Python, or a unitype of indices for all arrays, as in Java or ML.) There are several ways to fix this issue, with various tradeoffs between convenience and expressive power. None of them has become mainstream, but it's good to remember that they do exist.

> So it's understood that a memory safe language will generally be composed of syntactic and semantic features which help at compile time, and runtime checks to close any remaining holes.

Safety is any means by which you can establish that every operation a program may perform is meaningful. Now, I don't know about you, but at least to me, it is never meaningful to dereference an invalid pointer or use an invalid array index. Whether the error is trapped at runtime is neither here nor there.

nobugs · on March 16, 2018

> that performance is worse due to the checks.

I'd argue that use cases for 'soft pointers' are about the same as that of Rust's RC<T>, which also incurs runtime costs (very briefly - there is no magic here, neither with Rust).

> The key idea and massive difference from standard C++ is that object destruction is delayed until a "quiescient state" happens

If you're speaking about OP - clarification: it is not "object destruction" which is delayed (destructor is still called synchronously when the variable goes out of scope, so all the crazy finalize()-like problems don't occur), it is memory deallocation which is delayed (and this is generally ok as deallocation is not observable, or at least garbage-collected languages tell us so <wink />).

> things like mutexes can only be safe

Whether C++ or Rust or whatever-else, mutexes at app-level are evil ;-) (it can lead to a very long discussion, but long story short - finally, by 2017, most of the opinion leaders started to converge to this IMO-very-obvious observation: ASYNC RULEZZ! <wink />).

> since that allows to statically check for correctness,

The idea behind OP is to have a tool which will do the same thing (where possible, see above re. 'soft pointers' and RC<T>). Whether such a tool materializes - is a different story, but well - first we have to agree that such a tool is a Good Thing(tm).

> have to delay freeing memory

In practice, it is never an observable problem in (Re)Actor-like contexts ((Re)Actor use cases are about highly interactive systems ranging from games to stock exchanges, where typical input is processed in milliseconds, and amount of allocated memory until the 'quiescient state' is reached, is single-digit kilos; in extreme cases, it goes up to single-digit megabytes, still nothing by modern standards).

> you are much better off using Rust

Really really depends. It is still C++, and being C++ has its own virtues (alongside with its own quirks); just two things to illustrate this point - (i) recently it was revealed that modern GPUs are designed with ISO C++ standard in mind (specifically C++, not Rust or anything else); (ii) developer availability is also a major factor for real-world projects, and so on, and so forth. In an ideal world - well, probably Rust does look as a more to-the-point language (though even with Rust I'd create an own dialect, in particular, outlawing thread sync to simplify things), but given real-world considerations - the choice is certainly not that black-and-white.

masklinn · on March 16, 2018

> I'd argue that use cases for 'soft pointers' are about the same as that of Rust's RC<T>, which also incurs runtime costs (very briefly - there is no magic here, neither with Rust).

The article's "soft" pointer does not own its contents and depends on an owning pointer, so the semantics are much closer to Rust's references. In fact, the article draws an analogy between soft pointers and weak_ptr. And of course dereferencing "soft" pointers can fault.

> Whether C++ or Rust or whatever-else, mutexes at app-level are evil ;-) (it can lead to a very long discussion, but long story short - finally, by 2017, most of the opinion leaders started to converge to this IMO-very-obvious observation: ASYNC RULEZZ! <wink />).

Async does jack shit for concurrent safety. You can have either shared-memory concurrency or isolated concurrency.

And explicit asynchronous API (à la JS or C#) are dreadful.

nobugs · on March 16, 2018

> so the semantics are much closer to Rust's references.

Not really; I see Rust references (enforced in compile-time) ~= OP's "naked pointers" with limitations on scope. 'soft pointer' is an alternative to long-living pointers (can be replaced with a refcounted ptr a-la RC<T>, but TBH I don't like refcounted stuff for several reasons).

> You can have either shared-memory concurrency

You can, but it doesn't work, suffering from all kinds of problems (from being _crazily_ error-prone, via being _fundamentally untestable_!, and all the way to being fundamentally unscalable, and sucking Big Time performance-wise - the last one unless we're speaking about RCU etc., but probably we aren't).

> And explicit asynchronous API (à la JS or C#) are dreadful.

Did you see C#'s await? And BTW, as practice shows, synchronous alternatives are MUCH more dreadful than even OO-style async handling (fundamental non-testability of shared-memory stuff, even if taken alone, is already enough to rule shared-memory stuff out for good - which BTW is already happening; there is a Really Good Reason for Go's philosophy of "share memory by communicating, not communicating by sharing memory").

masklinn · on March 16, 2018

> You can, but it doesn't work

Sure doesn't, I mean it's only the vast majority of concurrent systems which are built on shared-memory concurrency.

> Did you see C#'s await?

Yes, it sucks. Also, it's shared-memory concurrency.

> synchronous alternatives are MUCH more dreadful than even OO-style async handling (fundamental non-testability of shared-memory stuff, even if taken alone, is already enough to rule shared-memory stuff out for good

Async does not make things more testable.

> BTW is already happening; there is a Really Good Reason for Go's philosophy of "share memory by communicating, not communicating by sharing memory").

Go only pays lip service to that, Go is shared-memory concurrency to and through, it barely allows and doesn't enforce shared-nothing concurrency.

Shared-nothing concurrency definitely is already happening. In unices (where processes are historically the standard unit of concurrency) or Erlang (where you can't share memory) or Clojure or Haskell (where you do share memory but almost everything is immutable).

nobugs · on March 16, 2018

> I mean it's only the vast majority of concurrent systems which are built on shared-memory concurrency.

...which doesn't mean they work (~="they pretend to work, but happen to fail much more often than they should"). Just one example - one system which has 50% of one multi-billion-dollar industry and was built as shared-nothing concurrency (not Clojure or Erlang FWIW), has 5x lower downtimes that industry average; and I don't even remember how many times I've run into all kinds of shared-memory bugs in standard! libraries (my first article on such bugs was back in 1998, the last one was about a standard proposal made in 2015 IIRC); and so on and so forth.

>> Did you see C#'s await? > Also, it's shared-memory concurrency.

Nope; await (and C++ co_await BTW too) is single-threaded concurrency (nothing is shared between threads, and overall the concept is close to fibers with a bit more intuitive semantics), which makes it an extremely good building block to build a shared-nothing concurrency (the one which doesn't suffer from all the troubles of shared-memory one).

> Go only pays lip service to that, Go is shared-memory concurrency

You're somewhat right, but actually there are two different things there - one is language as such (and indeed, goroutines are shared-memory :-( ); however, Go "best practices" effectively say "don't use it", and LOTS of ppl from different projects were telling me that they do work in shared-nothing way (enforcing it is a different story though <sigh />).

> Shared-nothing concurrency definitely is already happening.

...in particular, with Node.js (which is ugly but still better than synchronized nightmare), C# await, and C++ co_await. As I told in one of my presentations (I think it was the CPPCON one on "8 different ways to do non-blocking"): you don't need to use Erlang to have reasonably-good concurrency :-). Overall, this sync-vs-async question is rather orthogonal to the programming language (heck, out of those 8 ways - which are largely equivalent to each other differing only in syntactic sugar - at least 2 will work even in asm).

> Async does not make things more testable.

It does - and very strictly too. Very briefly, async can be made reproducible, and whatever-is-reproducible, can be made testable; OTOH, making a non-trivial shared-memory program reproducible is next to impossible in any realistic environment (reproducibility ~= determinism, and thread context switches are non-deterministic at least for our purposes; heck, even VM guys weren't able to make them deterministic - which got lots of interesting implications which won't fit here). For discussion on determinism and testability - see my other presentation (ACCU 2017 one, on deterministic distributed systems or something).

steveklabnik · on March 16, 2018

Do you happen to have a citation for (i)? That's very interesting!

pjmlp · on March 16, 2018

Yes, NVidia specially designs their new GPGPUs for C++ execution.

"Volta and Cuda C++" - http://cppcast.com/2017/09/olivier-giroux/

CppCon 2017: Olivier Giroux "Designing (New) C++ Hardware”

https://www.youtube.com/watch?v=86seb-iZCnI

nobugs · on March 16, 2018

Yep, this one. On CPPCON17, Olivier Giroux has said: "[when designing Volta,] we were literally quoting C++ standard to each other".

steveklabnik · on March 16, 2018

Neat. Thanks to both of you!

btilly · on March 15, 2018

I have a question about this.

Articles like http://blog.llvm.org/2011/05/what-every-c-programmer-should-... have convinced me that even if C or C++ reads logically like it is safe, there is a possibility that the compiler can rewrite your code in an acceptable way according to the standards such that the checks that are clearly visible in your code disappear, opening up the very problems that you thought you were protected against.

Is there any possibility that after an aggressive compiler gets done with inlining and optimization that that could happen here in some way? Can it be proven that if the compiler works according to the standard that this won't happen..even if the programmer accidentally trips on undefined behavior?

kbenson · on March 15, 2018

The best thing I've ever encountered that encapsulated this was a Cap'N'Proto vulnerability, and the discussion here about it was enlightening as well.[1]

The highly condensed version is that the compiler optimized away an if block that was responsible for throwing an error as impossible to reach in correctly functioning code, when the whole purpose of that if block was to check that condition and error so that the program did not continue in an invalid state.

Specifically:

  word* target = segmentStart + farPointer.offset;
  if (target < segmentStart || target >= segmentEnd) {
    throwBoundsError();
  }
  doSomething(*target);

(a simplified version of the actual code) was used to detect if target had overflowed (and thus target < segmentStart). The bug report goes on to explain:

However, as it turns out, pointer arithmetic that overflows is undefined behavior under the C standard. As a result, the compiler is allowed to assume that the addition on the first line never overflows. Since farPointer.offset is an unsigned number, the compiler is able to conclude that target < segmentStart always evaluates false. Thus, the compiler removes this part of the check. Unfortunately, in the case of overflow, this is exactly the part of the check that we need.

The post that's from (the sumbmitted article to the HN discussion I linked) is fairly accessible in my view. I highly recommend reading it.

Another discussion that probably has good info is this one.[2]

1: https://news.ycombinator.com/item?id=14163111

2: https://news.ycombinator.com/item?id=14785867

nobugs · on March 16, 2018

-fwrapv should fix it (no warranties of any kind, batteries not included).

kbenson · on March 16, 2018

IIRC this particular case was only seen on a specific version of one compiler for MacOS that was provided with extra patches, so it's at first glance less of a problem than it originally looks. That said, I don't think it was doing anything illegal according to the standard (no an expert on this), and as compiled software and a library, the reach of that case may be larger than we might otherwise assume.

There are undoubtedly ways to tell the compiler to be careful, but that isn't always in control of the person writing the code, and even if some level of control exists, compilers change.

nobugs · on March 16, 2018

To the best of my knowledge, the only compilers to exploit overflows, are GCC/Clang (and those commercial compilers I know about, explicitly said that they are NOT going to exploit signed-overflow UB, IIRC I heard it from MSVC and xlC). And for GCC/Clang, -frapw achieves the same thing. Still, I agree that things change, but this kind of behaviour won't be easy to change (at all); OTOH, I am going to campaign to remove this UB from the standard altogether (there is no real reason for this UB, at least for the platforms 99.9% of developers are working on).

EpicEng · on March 15, 2018

Perhaps someone better versed in those compilers can add to/correct me here, but I'm pretty sure that can only happen if you're invoking UB somewhere along the line.

saagarjha · on March 15, 2018

Once you've entered the realm of undefined behavior, the compiler can really do whatever it likes. Before then all it can do is assume you're not doing anything undefined.

masklinn · on March 16, 2018

> Once you've entered the realm of undefined behavior, the compiler can really do whatever it likes. Before then all it can do is assume you're not doing anything undefined.

In practice there is very little difference and you're effectively entering the realm of UB as soon as the program starts, because through inlining and propagation a possible UB may be leveraged before the UB is sequentially hit.

kolpa · on March 15, 2018

That's exactly the problem. C/C++ has undefined behavior as part of the language spec, so it can never be safe unless you use a compiler that promises to reject programs that invoke undefined behavior.

staticassertion · on March 15, 2018

It's more complicated than this. C++ compilers can't just "reject" UB code - they don't have enough information at all times to prove that code invokes UB, the languages is not powerful enough to represent this in all cases.

When people say "just fix this in compilers, don't compile when they run into UB" it's a fundamental misunderstanding of the problem. Yes, in some cases the compiler can prove UB, and uses it to write optimiations, but it doesn't prove that all code is not invoking UB.

gpderetta · on March 15, 2018

> a fundamental misunderstanding of the problem

No kidding! There is one right here:

> Yes, in some cases the compiler can prove UB, and uses it to write optimiation

Compilers do not prove UB , on the contrary they postulate no-UB and then use this axiom to prove other properties of the code.

staticassertion · on March 15, 2018

I suppose that's true, yes. My point was more that your compiler is not hitting a line of code and saying "I know this is or is not UB" because it can not do that in the general case - it is definitely incorrect to say it is 'proving' this. Would you consider that an accurate representation?

masklinn · on March 16, 2018

> My point was more that your compiler is not hitting a line of code and saying "I know this is or is not UB" because it can not do that in the general case

I think it's still a (cause of) misunderstanding. For the compiler, UBs are situations which axiomatically can not occur. "This is UB" is not a concept, because the compiler assumes at all point that UBs can not occur.

The compiler doesn't go "oh you're dereferencing a pointer which may be null, fuck you", it goes "you're dereferencing a pointer so it can't be null, and thus I can remove anything assuming possible nullability".

staticassertion · on March 16, 2018

> The compiler doesn't go "oh you're dereferencing a pointer which may be null, fuck you", it goes "you're dereferencing a pointer so it can't be null, and thus I can remove anything assuming possible nullability".

:) This is what I'm trying to say.

kllrnohj · on March 15, 2018

int get_value(int* pointer) { return *pointer; }

How is a compiler supposed to reject this for undefined behavior? It'd be absurd to demand the compiler someone knows all possible usages for get_value to find if any of them pass a nullptr, so what is it supposed to do?

And this is why the spec says things like that deref'ing a null pointer is undefined behavior, so that the compiler can take that function and do the thing you'd expect it to do and transform it into a simple memory read.

fjsolwmv · on March 15, 2018

The compiler shouldn't rejected UB, it should define all behavior, and reject all code constructs that can't have behavior fully defined (like accessing raw pointers as arrays, so bounds- checking is impossible). It can throw an exception on null de-reference at runtime. That's slow, but safe.

adrianN · on March 16, 2018

Slow but safe is not why people choose C++ for their software.

wruza · on March 16, 2018

There is a compiler option for that, just run javac alias instead of g++

kllrnohj · on March 15, 2018

The generally recommended thing for catching & fixing this is: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html

CppCon has also had a lot of great talks on UB and why it is what it is, such as https://www.youtube.com/watch?v=yG1OZ69H_-o

jcoffland · on March 16, 2018

This is only a problem if you are using threads or shared memory and making up your own misguided locking mechanisms or in embed code on a processor with interrupts but no locks. Normally, if you use the tools correctly you will never have to worry about compiler reordering messing anything up.

The article you linked discusses what happens when you do things you should not, like fail to initialize a variable before using it. Modern compilers, when used correctly will let you know when you do this. With the right compiler options, it won't allow you to make such mistakes.

btilly · on March 16, 2018

The example in the case that I linked was single threaded code having a check for a null pointer unexpectedly removed.

So no threads, no shared memory, no locking mechanisms, no interrupts. Just a compiler making valid optimization and removing a necessary guard condition.

Try again.

saagarjha · on March 15, 2018

As far as I'm aware, if you stay within the confines of smart pointers (and don't drop down to the raw pointer it owns) you will never encounter undefined behavior. You may have crashes if you try to double free something, but these are defined to crash rather than letting the compiler optimize out checks.

masklinn · on March 15, 2018

> As far as I'm aware, if you stay within the confines of smart pointers (and don't drop down to the raw pointer it owns) you will never encounter undefined behavior.

Deref'ing an empty (e.g. moved-from) unique_ptr is still UB.

saagarjha · on March 15, 2018

Yeah, there's that. It would be nice if C++ did a emptiness check for you, but I guess that this wasn't in the cards…

EpicEng · on March 15, 2018

Then you have a check on every deref, which means you incur a relatively large performance penalty for using a smart pointer. If you want stuff like this there are other languages out there.

mcguire · on March 15, 2018

Would it be possible to write the check so that the null test overlaps with the rest of the instructions? If the test is anyways assumed to pass, you should only get a 1-instruction overhead, right?

(That'll only work for null.)

kllrnohj · on March 16, 2018

1 instruction != 1 clock cycle. In particular that would utterly kill the usage of unique_ptr on things like small embedded processors or microcontrollers that either lack speculative execution entirely or do not have the level of branch predictor & speculative execution capabilities of a high-end x86 or ARMv8 CPU.

nobugs · on March 16, 2018

FWIW: in general, simpler controllers (especially those in-order ones) tend to be much more friendly to branching (exactly because they're not out-of-order). NB: I am not arguing whether unique_ptr<> should check or not: if I want checked version, I will write my own wrapper, it is not a rocket science.

RossBencina · on March 16, 2018

While we're on "nice to haves," it would be great if there was a way to explicitly terminate the scope of objects. Then you could have an atomic move-from-and-remove-from-scope operation. Which removes any possibility of accessing the moved-from object again.

staticassertion · on March 15, 2018

> As far as I'm aware, if you stay within the confines of smart pointers (and don't drop down to the raw pointer it owns) you will never encounter undefined behavior.

No. Even with smart pointers it's possible (move out of unique_ptr, deref). But even with no pointers it's possible - index into an array without checking the bound, signed int overflow, etc.

yoklov · on March 15, 2018

UAF is still possible. Iterator invalidation, and such.

zach43 · on March 15, 2018

Wouldn't you have a problem with reference cycles in C++ smart pointers? Not sure if they do anything special to prevent this.

nobugs · on March 16, 2018

std::unique_ptr<> won't allow you to have 'owning' reference cycles; neither 'owning' reference cycles are really necessary in real-world programs.

saagarjha · on March 15, 2018

std::weak_ptr is there for resolving reference cycles.

pdpi · on March 16, 2018

Which means you have to design around it. You can definitely run into problems with cyclical shared_ptr dependencies

saagarjha · on March 16, 2018

Well, you have to do this for every language that does reference counting.

pjmlp · on March 16, 2018

Either that, or have a cycle collector in addition to the normal reference counting optimizations.

pcwalton · on March 16, 2018

This technique is mostly a garbage collector, as I see it. Postponing memory destruction until the stack is empty is a special case of deferred reference counting [1], where sweep can only happen with an empty stack. If the "soft pointers" are implemented with reference counting, that's also a type of GC.

On the other hand, the tagged pointer implementation strategy for "soft pointers" isn't really garbage collection, but it does have much of the same overhead. Pointer reads must check the tag ID and throw, which is like a read barrier [2]. Writes through a pointer must do the same, similar to a write barrier [3]. And that's not getting into the overhead of multithreading; I see no reasonable way to implement this scheme in a multithreaded world. I expect that a fast GC without read barriers will significantly outperform this scheme. As much as everyone complains about the speed of GC, garbage collection is hard to beat!

[1]: http://www.memorymanagement.org/glossary/d.html#term-deferre...

[2]: http://www.memorymanagement.org/glossary/r.html#term-read-ba...

[3]: http://www.memorymanagement.org/glossary/w.html#term-write-b...

nobugs · on March 16, 2018

> Pointer reads must check the tag ID and throw, which is like a read barrier

Usually, "read barrier" is understood as a multithreaded stuff - and OP has nothing to do with MT. In other words, no "read fence" is necessary (simply because it lives in a perfect single-threaded world). And from this POV, it is extremely difficult to beat this schema with any popular-multithreaded-GC. As a side note, proposed schema DOES allow 'naked' pointers, so relatively-expensive (costing ~4CPU cycles, which is not much to start with) conversion from 'soft' into 'naked' has to be done only _very_ occasionally, and after the conversion, we're working with good old plain pointers, which just happen to be safe due to the way they're used.

duneroadrunner · on March 15, 2018

If anyone is really interested in this sort of thing, I suggest you take a look at SaferCPlusPlus[1]. It is "A Usable C++ Dialect That Is Safe Against Memory Corruption" (including data races). And it already exists.

And I think it's better than this proposed dialect in that most of the (safety) restrictions are enforced without requiring extra tooling, and it's much less restrictive. Most existing C++ code can be converted directly. And the run-time overhead is kept to a minimum. Btw these advantages apply versus the Core Guidelines[2] as well.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus

[2] https://github.com/duneroadrunner/SaferCPlusPlus#safercplusp...

nobugs · on March 16, 2018

I happen to like quite a few things from it, but... there is a Big Fat Hairy Difference(tm) between "safe" and merely "safer". Make it "guaranteed to be safe" (which will most likely require tooling) rather than merely "safer" - and I will be the first one to promote it myself :-). Also - it would be gr8 to reduce the number of different concepts developer needs to remember about while programming. In OP (assuming that tooling does exist) it is quite simple: there are only 3 concepts, with 2 of them ('naked' and 'owning'=unique_ptr<>) being already very familiar; OTOH, current implementation of SaferCPlusPlus reminds me of ALGOL68 - where it was possible to specify _everything_, but choosing the right thing was so time-consuming that it never really flew.

blub · on March 16, 2018

SaferCPlusPlus has two big issues which prevent me from using it: confusing class naming and too many concepts.

I do like the ideas it builds on, and I will probably implement a simplified version for my needs...

duneroadrunner · on March 16, 2018

Yes, documentation and class names are not SaferCPlusPlus' strong points at the moment. Perhaps the easy way to get started is just to use the elements in the "mse::mstd" namespace, like vector, array, string, string_view, etc. which are just safe, compatible implementations of their namesakes in the "std" namespace.

As for the pointers, there's a slightly out-of-date article[1] that tries to explain them with examples. But a simple option is to just replace all your raw pointers with "registered" pointers. It's not performance optimal, but it's safe and simple.

But yes, better introductory documentation and examples are needed. There is not yet a forum for those picking up SaferCPlusPlus, but for now you can post any questions or suggestions in the issues section[2].

[1] https://www.codeproject.com/Articles/1093894/How-To-Safely-P...

[2] https://github.com/duneroadrunner/SaferCPlusPlus/issues

duneroadrunner · on March 16, 2018

> "guaranteed to be safe" (which will most likely require tooling) rather than merely "safer" - and I will be the first one to promote it myself :-)

Well I hope so, as SaferCPlusPlus could use a little promotion :) At this point, despite the name, I think the dialect itself is "safe", not just "safer". Potentially unsafe features are relegated to the "mse::us" namespace and are generally only needed for interop with legacy code.

Of course there's no way to enforce that a programmer stick to the "dialect"/subset without some tooling. But I think the "rule" for adhering to the SaferCPlusPlus subset is pretty simple and generally intuitive. That is, "any potentially unsafe element of C++ should be avoided". And I think most of us have a pretty good sense of which C++ elements are potentially unsafe. The (few) unintuitive ones might include things like the implicit "this" pointer (which is a native pointer, so it should be avoided). A tool to enforce adherence would be fairly straightforward to write. When there is sufficient demand it will be provided. At the moment an "auto-translation"[1] tool has higher priority.

> quite simple: there are only 3 concepts, with 2 of them ('naked' and 'owning'=unique_ptr<>) being already very familiar

It's the same with SaferCPlusPlus, but the 3 "concepts" (pointer types) are "reference counting" pointer (=std::shared_ptr<>), "registered pointer" (= safe, unrestricted naked pointer), and "scope" pointer (basically a naked pointer that is restricted to stack allocation (i.e. a "local variable") and can only point to objects guaranteed to outlive it). The Core Guidelines also has three "concepts"/pointer types (shared, unique, and naked). I suspect that some experience will reveal that SaferCPlusPlus has the better set of pointer types.

...

Oh wait, you're the author of the article? In which case, I think it would be worth your time to get more familiar with SaferCPlusPlus. Even if just as research for your own dialect. But with some experience, I think ultimately you might be convinced by the SaferCPlusPlus solution. In fact, there's no reason why your (Re)Actors couldn't be implemented on top of SaferCPlusPlus, which would automatically provide the safe "collections" that you need.

Don't hesitate to post any questions in the "issues" section[2].

[1] https://github.com/duneroadrunner/SaferCPlusPlus-AutoTransla...

[2] https://github.com/duneroadrunner/SaferCPlusPlus/issues

Jeaye · on March 15, 2018

Even when you use RAII, const-by-default, shared pointers, type-rich APIs, and the like, you're still using C++. That means you're still tied to C's legacy defaults (of UB) and that also means you're still using C++ value categories. If your "safe" C++ subset uses references, it can't be guaranteed to be safe (since plenty of valid code will lead to UB).

More info in the value category cheat sheet: https://github.com/jeaye/value-category-cheatsheet/blob/mast...

alacombe · on March 15, 2018

TL;DR; use smart pointers, RAII semantic and STL containers/iterators. Though, I'd have a few criticism...

> Rules to ensure memory safety

These "rules" only protect you against object's lifetime issues, not overflow / underflows, and other kind of memory issues.

> ‘owning’ pointers are obtained only from operator new

no, you shall be using std::make_{unique,shared}(...) which will protect you against leaking memory if exceptions are raised.

> Calling a function passing the pointer as a parameter, is ok.

Correct, but you can still shoot yourself in the foot. Best is to pass a [const] reference to the function called.

> This only leaves us with functions such as strchr()

Don't use C API. The STL should provide you with enough API to use the proper C++ types, either std::string or std::string_view in C++17 if possible.

> and also prohibits C-style cast and static_cast with respect to pointers

IIRC, you can't static_cast<> a pointer, you'd have to reinterpret_cast<> it, which the document does mention.

> For arrays, we can always store the size of the array within our array collection, and check the validity of our ‘safe iterator’ before dereferencing/indexing

use std::array.

saagarjha · on March 15, 2018

Mostly agree, but:

> Don't use C API. The STL should provide you with enough API to use the proper C++ types, either std::string or std::string_view in C++17 if possible.

Sometimes you're working with C API that gives you back a char * that they've already allocated. AFAIK there isn't a way to create an std::string out of that without a copy.

> you can't static_cast<> a pointer

You can static_cast a void * into other kinds of pointers.

> use std::array

Do you mean std::vector and at()?

scott_s · on March 15, 2018

> Do you mean std::vector and at()?

No, std::array was added in C++11: http://en.cppreference.com/w/cpp/container/array

saagarjha · on March 16, 2018

std::array has its size fixed at compile time, so there's no reason for you to do have to do bounds checking…

blub · on March 16, 2018

char buffer[100] also has a fixed size and is responsible for most buffer overflow bugs :)

array::at is just as mandatory as vector::at IMO.

nobugs · on March 16, 2018

> You can static_cast a void * into other kinds of pointers.

Moreover, you can use static_cast for downcasts (from the parent class to child class) - without runtime costs of dynamic_cast. static_cast is not safe (it doesn't perform runtime checks), but reinterpret_cast is even worse (it doesn't perform even compile-time checks).

fooker · on March 15, 2018

>AFAIK there isn't a way to create an std::string out of that without a copy.

You can use std::string_view, Google's StringPiece or llvm::StringRef.

nobugs · on March 16, 2018

Well, the point of the OP goes further than that. Two Big Questions are (a) what to do with the non-owning back references (such as backref going up the owning tree) - for this 'soft' pointers are proposed (I _hate_ shared_ptr-like ref-counted stuff, in large projects they tend to cause much more trouble then they're worth, especially memory leaks due to shared_ptr loops are troublesome, causing both syntactic and semantic memory leaks, ouch!), and (b) how to formalize the use of those non-owning ('naked') pointers/references and how to prevent them from being dereferenced when they're pointing to already-deallocated memory locations (and saying "don't use naked pointers/refs, ever" is not really practical IMNSHO).

beached_whale · on March 15, 2018

Using iterators and ranges(think a generalized string_view) handles the overflow/underflow issue in conjunction with algorithms(std ones).

wilun · on March 16, 2018

Dialects are of limited uses, because they are dialects... New dialects are arguably of even more limited uses, because better languages now exist where the desirable characteristics are enforced not by using a dialect, but by the core languages, and safety checking is not optional. (Also, I'm somewhat curious about why the proposed dialect tells about think "similar to unique_ptr, and so over: just use the real think -- at least it would be less a dialect and more of modern standard C++). Dialects enforced by wishful thinking or at beast ad-hoc tools maintained by a too small community will perish in front of well architectured languages maintained by a real community.

They have even been used to ship some important code in big project made of tons of legacy code -- so I'm not even sure an interop argument could be made.

hedora · on March 16, 2018

One point of a dialect (aka “coding standards”) is that you can evolve legacy code bases toward them with a series of simple refactorings instead of by rewriting from scratch.

For me this is the big advantage of C++: it is possible to backport virtually any language feature you want to it, thanks to the combination of modern template programming and low-level C-style bit twiddling.

mcguire · on March 15, 2018

Interesting article. I hope the author has a chance to take a look at the Pony language, which he's described the core of. Now all it needs is a capability system to statically ensure that the data in sent messages is safe without copying. (And to move those runtime checks into the type system.)

sriku · on March 17, 2018

This is pretty close to the "autorelease pool" concept in objective C - an idea which I copied in C++ for a product around 10yrs ago to good effect.

You wrap an auto release pool around every turn of the event loop, which is the deferred memory release mentioned in the article (I admit I only scanned it). Within the "react" part, this gives you much cleaner way to code even if your code involves raw pointers to objects, as long as they are allocated on the pool and aren't being transferred to another thread or caught in an RC loop - both of which we managed using custom smart pointers.

jasonhanley · on March 15, 2018

I feel like stuff like this is why golang and rust were created.

(edit: Forgot about Rust, sorry!)

ansible · on March 15, 2018

I feel like stuff like this is why golang was created.

Or more properly, why Rust was created.

I've grudgingly used C++ on some projects because of other constraints such as the target platform. Due to the compiler version, we're stuck on C++11, which is... OK. But keeping straight what we can use, and what we can't, and which kinds of pointers we should be using when is a considerable burden.

Still working through "Effective Modern C++" while learning the ins and outs of it in general.

pjmlp · on March 16, 2018

Well, they were mostly created, because the alternatives to C and C++ ended up loosing their market share, so current generations aren't usually aware of what came before.

Go is anything hardly new versus what Algol 68, Pascal or Oberon derivative would offer.

Likewise the best part of Rust is their work on how to make affine types from Cyclone, ATS and others into more developer friendly and productive language features, while following the traditional rules of other safe systems languages.

Since it is easier to introduce new languages than bring back old ones, here we are.

Animats · on March 15, 2018

"Now, we can extend our allocation model with a few additional guidelines, and as long as we’re following these rules/ guidelines, our C++ programs WILL become perfectly safe against memory corruptions."

What could possibly go wrong?

saagarjha · on March 15, 2018

If you're not being facetious, then really, not much. You really can't go wrong with smart pointers unless you explicitly try to access the memory it handles rather than going through its normal interface (e.g., not using get()). Shared pointers are basically reference counted just like many other language handle memory management.

kolpa · on March 15, 2018

> > as long as we’re following these rules/ guidelines

If that's an assumption, it's not worth much. A safe language is one that enforces the rules, not one that hopes the program authors self-enforce.

nobugs · on March 16, 2018

Of course. The point is that (IF there is enough interest in the idea) these rules are simple enough (in particular, they're inherently local, i.e. don't require analysis to go beyond one single function) to be enforced by a tool (say, built on top of Clang-tidy).

jcelerier · on March 16, 2018

By that measure Rust would be unsafe since it hopes that the authors don't just put everything in unsafe blocks.

dcookie · on March 15, 2018

I don't think anyone claimed the _language_ was safe.

Animats · on March 15, 2018

Title: "A Usable C++ Dialect That Is Safe Against Memory Corruption".

Any questions?

alacombe · on March 15, 2018

The author claims that the rules described, "extending" the standard C++, are enforcing memory corruption, and it is this author described subset that is still unsafe, not C++ in general. Think about english vs. americanized english, largely the same, but two distinct entities.

In C++, you are free to shoot yourself in the foot. In Rust, you have a Government inspector ensure that you always point the gun not just in a "safe" direction, but only toward a crosshair target at a designated gun range.

Kenji · on March 15, 2018

This is basically how I program C++. Except that I try to avoid the 'new' keyword too by std::make_unique and std::make_shared. This way there are literally zero 'new' and 'delete' or 'malloc' or 'free' calls in your program.

beached_whale · on March 15, 2018

I would try to go further and wrap them in a class to hide the heap usage and expose the valid uses in the interface. Then it reads like a value and walks like one too. Unless I need virtual inheritance I guess.

ape4 · on March 15, 2018

Likewise, I avoid -> if possible.

saagarjha · on March 15, 2018

Like, the operator ->? Why, and what do you use instead?

kolpa · on March 15, 2018

You could use ".get()." instead of "->", but I don't see how that helps.

fjsolwmv · on March 15, 2018

Correction: "(foo)." or smart pointer (foo.get())." instead of "foo->".

But I still don't see how that would help.

ZephyrP · on March 15, 2018

Well I guess you're safe against UAF! (har har)

jeffreyrogers · on March 15, 2018

There's a pretty good comment on this post from a shadow banned user named Kenji, which I'm reproducing below:

This is basically how I program C++. Except that I try to avoid the 'new' keyword too by std::make_unique and std::make_shared. This way there are literally zero 'new' and 'delete' or 'malloc' or 'free' calls in your program.

saagarjha · on March 15, 2018

I just vouched for that comment, so it should show up now.

jeffreyrogers · on March 15, 2018

Oh cool, I didn't know you could do that.

tuna · on March 15, 2018

[flagged]

sctb · on March 15, 2018

Please stop posting this and comment civilly and substantively instead.

https://news.ycombinator.com/newsguidelines.html