Rust has some very desirable properties to me. Writing Rust programs from scratch is not as scary as I've heard of from the internet either. The documentation is excellent, the compiler diagnostic messages are very helpful and the notorious borrow checker didn't stand in my way that much. And I love Cargo and Cargo.io. I have some projects where Rust is the saner choice than Go or other GC based languages.
That said, there are actually drawbacks of Rust compared with Go, IMHO. When facing a moderately large project written by others, the ergonomics for diving into the project is not as smooth as Go. There is no good full-source code indexer like cscope/GNU Global/Guru for symbol navigation across multiple dependent projects. Full text searching with grep/ack does not fill the gap well either since many symbols, with their different scopes/paths, are allowed to have the same identifier without explicitly specifying the full path. That makes troubleshooting/tracing a large, unfamiliar codebase quite daunting compared with Go.
Hmm, I've had a very nice experience using Rusty Code in VS Code. Some useful refactoring functionality is missing for sure, but a lot of that will become possible quite shortly from RLS (Rust Language Server, a la how TypeScript works in VSC), and if your preferred editor has support for the language server spec (it's a open source common spec, not specific to Rust), it will support it at parity, too.
Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening? I do agree that C has issues (though in my opinion neighter Rust nor Go address almost any of them) i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.
> Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening?
The CVE database. Just because you 'can' write such an array implementation doesn't mean you will, doesn't mean your third party libs will, doesn't mean any of your legacy code uses it, and certainly doesn't mean you will properly test said array implementation correctly.
The number of mitigations added to C compilers and OSes dealing mostly with C and C++ code. ASLR, W^X, /GS, -fstack-protector-all, AddressSanitizer, ... - note the lack of similar tools, or demand for them, for, say, JavaScript - despite it enjoying a similar ubiquity.
I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.
Now, e.g. Rust will have the same problems when interacting with C APIs - and nontrivial programs will end up doing so eventually. However, by virtue of the language itself embracing safe-by-default, you're less likely to run into the same problems when consuming Rust APIs.
You can also use third party static analysis tools to ensure you're using a "safe C subset" (such as MIRSA C), but "nobody" does that.
> I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.
I work on a C codebase that does this, although in the slightly weaker sense that it does drop the abstractions at a few isolated interaction points with external APIs (think openssl, linux system calls, and not a whole lot else). Yes, there is quite a lot of NIH. With essentially-uniform use of checked data structures, and an extremely comprehensive suite of automated tests getting run under ASAN (originally Valgrind), memory safety errors almost never get so far as being committed to the main branch. This is a complex, >1M SLOC distributed system that has seen several years of production use at this point, and as far as I can recall we have not seen a single memory safety related issue in production (a few have managed to to get as far as certification testing). General resouce-leak class issues have struck a few times, but are also pretty rare.
Proprietary, naturally, so I can't actually show you (sorry), but it absolutely can be done in practice. It isn't even really all that difficult, it just needs to be done from the start, and then you just need a bit of discipline to keep it up.
And more power to you. Note the beginning of the parent comment, however:
> Just because you 'can' write such an array implementation doesn't mean you will
So yes, even if the codebase you work on does have these 'mythical', hard-to-achieve properties, that doesn't mean that most or even many C codebases will.
Good engineering entails observing what problems actually occur and working to fix those. Memory safety issues do commonly occur in C codebases. Regardless of whether the fix in C is simple or even trivial, programmers aren't doing it. So, Rust has some value because it forces the programmer to produce code that is largely free from this type of issue.
Enforcing norms like 'be more disciplined when writing C' or 'stop using external libraries' is much harder than simply using a different language.
> I work on a C codebase that does this […]. Yes, there is quite a lot of NIH. With essentially-uniform use of checked data structures, and an extremely comprehensive suite of automated tests getting run under ASAN (originally Valgrind) […]. This is a complex, >1M SLOC distributed system that has seen several years of production use at this point […].
>
> […] it just needs to be done from the start, and then you just need a bit of discipline to keep it up.
Here's a neat idea: wouldn't it be cool and save a lot of time if the compiler did this for you automatically, from the start?
Of course, you say it's easy to do it manually, but something tells you your company might have needed to pay less for development if the compiler did it automatically with no human intervention required.
> Here's a neat idea: wouldn't it be cool and save a lot of time if the compiler did this for you automatically, from the start?
Yes, but when the project started the only existing compiler that met all requirements was C (also C++, although that was not chosen, by reasoning I disagree with). We are in a domain where we derive material benefits from the low-level control C gives us (we have a bunch of highly specialized memory management and I/O), and are not willing to accept GC pauses. There's a common sentiment that we would have used Rust if it had existed when we started, but it didn't so we didn't and so it goes.
But Rust still won't warn about an out of bounds access (when accessing using a variable) at compile time, and your code will panic at runtime. This isn't the "safety" anyone ought to be expecting from a language billed incessantly as safe.
Rust, at least in this regard and probably others too, is no better than C, and for me it isn't enough to justify the horrible and complex syntax.
What do you expect a language to do if you index an array using a runtime-calculated value?
Rust checks at runtime and panics if your program exceeds the bounds. You can opt-in to asking if the bounds are exceeded and fail gracefully if you like, or if you want to promise the compiler you know for sure your bounds are tight, you can use unsafe blocks and act like C. Opt-in to danger.
C lets you do it with no checks. You have to opt in to the safe path of checking and failing. You don't automatically segfault if you exceed the bounds; instead you read arbitrary memory. Welcome to the land of undefined behavior. You may crash, but more likely, you will read some value from an unexpected place, and carry on executing incorrectly for who knows how long. Opt-in to safety.
That's what people mean by Rust is safe by default. And that's just bounds checks. Carry that notion over to pointers, references, threads, lifetimes, ...
What ever made you think "Safe Rust" meant "compile time checks of runtime values are possible" or "C is just as safe because it lets you index outside an array"?
> What do you expect a language to do if you index an array using a runtime-calculated value?
Optimizers already try to prove index bounds to eliminate unnecessary checks, and static analysis tools to demand necessary checks. Turning the latter into compile time errors is a reasonable approach if your language can provide sufficient information to deal with false positives - likely by forcing you to add your own bounds checking to explicitly handle out-of-bounds cases.
A language John Carmack was using or researching at one point comes to mind, which had this kind of thing going on IIRC. I'm afraid I can't find it off hand, so I might recall in error.
Sure, but this can be the same in Rust or C, which was the context of this discussion. New languages may provide more tools to eliminate more checks, but in general, if you have a runtime value and you index an array with it, someone has to check somewhere.
And the point of this thread is that Rust's default (check index bounds and fail at runtime unless the check can be proven unnecessary by the compiler or optimizer) is safer than C's default (don't check anything by default and hit UB if the index is out of bounds).
>You don't automatically segfault if you exceed the bounds; instead you read arbitrary memory.
GCC and clang both have sanitizers either built in or available for them. Sure, it's not default, but let's not act like there is no choice in C but to account for every OOB access while programming or to read memory you don't want to.
Furthermore, I never said that compile time checks of variables are possible, but rather we could move to using dependent typing, or at least a way to judge whether a variable would work as a subscript based on the type of the array and variable.
The Rust designers didn't do that. Instead they put in a feature common in the two most popular C compilers to "panic" at runtime instead of accessing memory. That's nothing. It's rubbish. And if you know to "catch" the panic, why don't you check the value of what you're subscripting with? Saying you can catch the panic is missing the point of unintentional OOB accesses, which is that they're unintentional.
C is just as safe with regard to OOB accessing, and to be honest that's pretty poor in 2017.
If you are using gcc or clang, you have more options. True. But not all C compilers give you those options. However, the point is moot, since I never said you can't catch these things in C; I said it wasn't the default. Which you agree with.
> I never said that compile time checks of variables are possible, but rather we could move to using dependent typing
You didn't say anything about dependent typing. You said "Rust is no better than C". And I'm pointing out that it is. Dependent typing may be even better in some cases; I'm not arguing otherwise.
> Saying you can catch the panic is missing the point of unintentional OOB accesses, which is that they're unintentional.
No one said you should catch the panic. You can use Vec::get() for example if you are using runtime-derived indices and want bounds checking in an ergonomic fashion.
And saying a panic for unintentional OOB is the same as in C is not true, since you get a panic by default in Rust, and to get one in C, not only must you be using a specific compiler or two, you must have the sanitizers enabled for every source file in your program. Not "by default" by any stretch.
> C is just as safe with regard to OOB accessing, and to be honest that's pretty poor in 2017.
It is nowhere close, and saying it is is pretty poor in 2017 as well.
And you are still ignoring pointers, references, lifetimes, threads, ... you know, the other things that also help Rust make "Safe by default" and C "dangerous by default".
If you are not sure whether the index is in bounds or not, you could use `slice::get`, which returns an `Option<&T>`. Sure, it's a little more typing compared to `[i]`, but how would one solve this in any language?
C will actually read arbitrary memory, Rust won't, that's the difference.
We're talking about a situation where the length of the array is not statically known and you access it out of bounds at runtime. Rust checks first if it's out of bounds, and if it is, DOES NOT blindly read the memory anyway (as C would) but exits.
I don't understand how you could think the two situations are at all equivalent.
Panics can be a vector for denial-of-service attacks. OTOH that's still better than remote-code-execution, and unwinding panics can be caught. OTOH I see failing to do so across a FFI boundary is UB, which could get back into RCE territory through sufficiently convoluted shenanigans...
Would you care to point to or write a blog about the principles of this strategy? It sounds immensely useful to the many C developers out there as a potential set of best practices to adopt.
I was thinking about mentioning MISRAble C. Outside the infotainment system, you should never have a buffer overrun on any micro in your car. Part of that comes from a ban on the use of malloc(). There is still a risk of array bounds problems, but those seem easier to avoid and more likely to be caught in testing.
Why no malloc? It was originally due to the small memory sizes for code and data. The less standard library the better, and dynamic allocation may lead to heap fragmentation and a subsequent crash when malloc fails.
I note that Firefox is using some Rust code now - so perhaps that will change at some point, for at least one of the common JavaScript implementations, in the not too distant future. I don't imagine we'll see it for the majority within the decade - but who knows, maybe I'll be pleasantly surprised.
I have less hope for the widespread adoption of OS kernels written in safer languages - given the general unwillingness to even use C++ there (although plenty of toy/'research' kernels in safer languages do exist.) Although maybe we'll see one within the next century? Perhaps a microkernel for use in containers?
Of course, that still leaves bugs in the JITs, compilers, hardware, 'legacy' native interop, unsafe{} blocks, ...
> I note that Firefox is using some Rust code now - so perhaps that will change at some point, for at least one of the common JavaScript implementations, in the not too distant future. I don't imagine we'll see it for the majority within the decade - but who knows, maybe I'll be pleasantly surprised.
Maybe
I think currently there's no plan for it. Maybe after they finish servo to the degree where it supports all modern html features
Historical accident, given that the better alternatives, for various reasons lost their market share to C and C++, while Sun and Microsoft had the dumb idea of not supporting the same AOT compilation to native code on their new languages from day one.
It's crazy that it's not solved above the language level, if people really want zero cost abstraction and architecture friendliness at least tooling should check buffer logic and flag the binary in case Warnings have been ignored.
It is by definition a "zero-cost abstraction." Let's ask Stroustrup, who coined the term:
> C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
Two points:
What you don't use, you don't pay for: if you don't use array indexing, you won't get a bounds check. In addition, you can call an access method without a bounds check as well, so it truly is only if you use the checked version.
What you do use, you couldn't hand-code any better: that bounds check is written the exact same way you'd write it in C.
This is actually a very helpful comment. I used to think "zero-cost" meant "at compile-time", as in `newtype` in Haskell, etc. I'm guessing that's what the parent commenter thought as well, and I'd guess is what most people think when they hear the phrase.
I think that's why Stroustrup says "zero overhead" instead of "zero cost". There are costs to many of these abstractions; some at compile time and some at run time. For me, "zero overhead" conveys this a little better.
Well, you can still sort of view it that way. You can imagine the bounds check being a "compile time generation of the C code you'd write to check the bounds anyway".
The difference is that it's not dealt with entirely in the compile phase. i.e. language features that are checked at compilation and known to be true that are not needed at runtime.
The Haskell `newtype` example I gave was meant to illustrate this, as newtype's are respected by the type system and then are treated as the underlying type at runtime.
This is a misrepresentation of his comment. By your interpretation, you could call GC zero-cost!
Most code doesn't use bounds checking, because the branch is a safety net you should never hit, even in theory. Any code that does hit it is already broken. Correct programs using bounds checked indexing will in general be slower than but equivalent to a program where indexing instead results in undefined behaviour.
Most GCs would violate the "What you don't use, you don't pay for". That is, they add runtime cost (and "the size of the runtime" size) to code, even code that doesn't allocate.
"You couldn't hand-code any better", well, I won't argue on that point, as it sounds contentious. ;)
_Should_ never hit is very different than will never hit...
If you never allocate, there's nothing stopping the compiler from optimizing the GC out. Then you get your first property back, in the sense you originally gave.
My point is that Bjarne Stroustrup wasn't comparing against writing the exact same program the exact same way. He was comparing against what you'd get if you dropped down to ye olde C or Assembly and wrote the same algorithm there, without redundant work or waste.
The comparison shouldn't be the language's GC versus SteveGC, it should be the language's GC versus an ideal, manually implemented allocator. Equally it shouldn't be built-in bounds checked indexing versus manual bounds checked indexing, it should be built-in bounds checked indexing versus an ideal, manually implemented indexing scheme. If you want safety against out-of-bounds, it seems to me the ideal method would be a proof, not runtime overhead.
> there's nothing stopping the compiler from optimizing the GC out.
I don't know of a single language that comes with a GC that does this, do you?
> He was comparing against what you'd get if you dropped down to ye olde C or Assembly and wrote the same algorithm there, without redundant work or waste.
Right. I agree with this.
But basically, we are arguing over an extremely fine semantic, which is "should you even want bounds checks in the first place." If you don't, then don't use a method that has bounds checks. The one that does will have them. They'll both cost the exact same as writing it in C or assembly.
I'm not arguing about whether bounds checks are good, but about whether it's correct to call them zero-cost in the canonical sense. Doing so just devalues the term, and IMO feels like misdirection.
To put it another way, let's say I was a C++ developer on the fence about Rust. If I read this conversation, I'd see that indexing gets called "zero-cost" despite the overhead. Since tons of things in Rust are "zero-cost", like traits, closures, borrowing, etc., all of those things now have doubt cast on them. How can I really trust that these things are actually getting compiled efficiently?
If instead the conversation pointed out that this was one of a few cases where safety took priority over truly being zero cost, but that there were tools in place to mitigate the cost (iterators, unsafe indexing, LLVM), I'd have a much more positive outlook that focussed on what Rust did right.
I would suspect that a C++ programmer would be more familiar with that actual definition I mentioned originally, and so would understand the subtleties here.
That said, I can appreciate focusing on other things when talking about the principle; I only brought them up here because we were literally discussing them. I think there's much better examples when actually attemping to convince someone.
I think you have a good point. It makes more sense to think and talk about as Rust's implementation of bounds checked indexing syntax as being a zero cost abstraction for writing `if` statements everywhere. I.e. Assuming you want bounds checking, you can't do better than what Rust does despite having nice syntax etc.
I think something like this actually happened with one of the first scheme implementations, but that was because they wrote the GC in scheme so they had to do it that way for a very small subset of the system (is the GC code). But yeah other than that I can't think of any examples (and my example is kind of the exception that proves the rule).
Honestly, we need a few AI coders to replace most of the developers in the world and then this won't be an issue. Bounds checking arrays and calloc instead of malloc isn't rocket science. It's a simple formula.
The problem isn't the language it's the developers.
A change in language does not completely solve the problem. Heartbleed was caused by buffer re-use without zeroing in between uses. A high performance network application could very easily do the same in another language.
That's a different "problem", not to mention a flaw in the application code's design rather than an example of one of the language's building blocks being fundamentally able to allow the entire execution path to be subverted.
OK, so in C you can smash the stack and that's bad. True. But I think you fail to see the larger point I'm attempting to evoke: that array bounds are artificial in the first place, and this doesn't just surface in C.
The "heartbleed in rust" example is a great one, and it arises in real life in many high level language APIs for file I/O and sockets. You have an allocation, and you have a count of available bytes coming back from a read() function which may be lower than the allocation size. So you are creating a "virtual" array bound from nothingness. Fail to respect it (without bounds checks) and you will see bugs.
If you reject that this is a valid way to write code, maybe in your API every read() style function will always return the correct size enforced by your JVM or whatever, but you will do too many allocations and over-tax the GC.
If you accept that this makes sense, then you must embrace a more C style way of thinking, where array bounds are created and destroyed at will and must be enforced through your own actions... And suddenly you see the other side of this coin, which reflects valid and true things about the universe, that you may want to chop up a buffer into multiple pieces - and that's OK.
(Now, I wouldn't be surprised if Rust has mechanisms to chop up arrays in the way I describe and enforce the bounds you provide it... Which would be handy. But frankly does not completely destroy the validity of the C approach or substitute for a proper understanding of it. Without that understanding, you will code more heartbleeds.)
His point is that you're not forced to do that. And anyhow, that doesn't solve the issue since you can bungle the creation of the slice with the wrong offset or length.
Not bungle in the sense of overflowing the underlying buffer, but overflowing the logical buffer that is contained within it, i.e. getting the wrong slice.
My point is, a lot of people are spending time on this when it doesn't matter. In the limit that AI starts replacing human developers these subtle differences in language approaches zero.
New languages here and there every day. Replace this replace that. When, in the end everyone is simply reinventing the "wheel" over-and-over.
This sounds a lot like "why clean up my room when the heat death of the universe is coming anyway?", but if you do think that AI is going to supplant all of programming then be the change you want to see in the world! Get building and we'll see which one happens first.
I honestly don't know who I'd put my money on between "AI takes over the world" and "programmers stop writing buffer overflows"
My guess is that if they make a general purpose programming AI then all other jobs will also be nonexistent besides being famous and doing YouTube reviews of movies. My thought is that the problems in the way of AI programming are more difficult and can be generalised to enough other jobs that programming is going to be the last job automated.
People are spending time on it because such a capable AI doesn't exist and language design problems do. If you changed either of those things, you'd be making a strong argument.
The lack of generics means your array implementation is either going to either:
- be implemented with macros and token pasting, and result in a ton of mental overhead because you'll have a pile of types like array_foo for an array of `foo`s, and array_bar for an array of `bar`s, along with a pile of corresponding `foo * array_foo_get(array_foo, size_t)` and `bar * array_bar_get(array_bar, size_t)` functions.
- or, have a runtime cost and lose type safety by storing void* and casting when accessing.
The first case is even worse than it sounds: e.g. I don't know how you handle arrays of types with spaces in them (like `unsigned char`, or `struct bar`) with a macro. And, we haven't even thought about const correctness yet, which would probably require having const_array_foo, const_array_bar (etc.) types defined too.
(And, of course, these only solve one facet of the problems with C's pointers: there's no way to defend against use-after-free or dangling pointers.)
C macros-faking-generics really aren't that bad (your "unsigned char" case is really easy to solve - use another macro). It's a bit goofy having different types floating around like "array_of_int", "array_of_float", but you can create them in a single line when needed, and once you create them, they work, and efficiently.
They're not an ideal solution by any stretch, but it's not the nightmare scenario you envision wrt generic data structures in C.
I would be interested in the specifics of the spaces thing (knowing more about ones tools is good) but "10 minutes" to handle all the various issues and bugs that will inevitably pop up from wrangling macros?
Yeah 10 minutes is quite a bit off, at least for me. The various issues and bugs that arose all came about when developing the data structures themselves - they didn't crop up in actual usage. Though my approach was to use a separate header and source file for each new data type I parameterised them by. So I had an "int_array.c" and an "int_array.h". And inside the header would just be function declarations generated by the macro. I basically didn't like the idea of dumping in a whole implementation of a data structure and all its operations with a macro every single time I wanted to use it. YMMV but I found it worked well.
That approach of a single invocation was what I would personally assume would be done (or put it in the header that defines the data type, for a custom one), but I'm not sure how that solves spaces?
Having worked in a code base like this (prior to migration to C++11), I'd agree that they're not necessarily a "nightmare", but they lead to code that's so verbose that it tends to really obscure the actual logic of a piece of code. (Especially anything more complicated than a simple array, e.g. associative maps or multi-dimensional arrays, and iteration constructs can be hellish.)
I actually did this¹⋅² some time ago. Since then I've moved on from a hardcore C developer to C++ and Ada³, but it wasn't necessarily a bad solution. I think given how little C gives you to work with, it's the best solution you can do.
It does make debugging a chore. I ended up with a Makefile rule to run the test suite through the preprocessor (and did some hackery to exclude #include of system headers), format it with clang-format, and build that. Not exactly pretty or easy, but it got the job done.
No, you just allocate enough space to store an extra int at the start for the length, and return a typed pointer to the actual data. Then you need an accessor that checks bounds, if you want safe access. Both of these problems are solved by simple macros.
So you want the array to have type foo * ? Ignoring that this doesn't let the compiler help the programmer with arrays (you still have to manually remember to use the accessor, not []), you also have to manually remember which pointers are pointers and which are arrays, and this representation doesn't work for pointing into subsections of an array (a similar problem to C-style strings), nor does it work well for putting arrays on the stack, which means one is forced to allocate every array (both of which mean the safe C is likely slower than the equivalent in Rust or even C++).
I agree that having to remember is a problem, it's one of the many shortcomings of C that it doesn't let you differentiate between types at compile time.
Pointing into subsections works fine. You just have to create a type for it. This solution doesn't have the same problems as strings because you don't rely on a terminating entry, and it's what languages like Rust or Java do as well.
You can allocate dynamic arrays on the stack in C just fine with alloca(). The only performance cost is when checking bounds, but since it's a dynamic array, it's the same cost you'd pay in Rust.
Creating a type for it, for each type of array, will require exactly the macro array thing I was talking about. And see the sibling comment for how dynamic arrays/alloca isn't relevant, I'm just talking about static arrays. (Dynamic arrays on the stack do have a performance cost, as they get in the way of the compiler's optimiser/code generator: having non-fixed stack frames makes accessing locals annoying.)
I'm not even talking about variably sized arrays, just creating a statically sized one and passing it into functions that take dynamically-sized one. For instance, a read function that fills an existing buffer doesn't care if the buffer is on the heap or on the stack, it only cares that it doesn't overrun the bounds.
alloca-style variable arrays is a whole other can of worms of danger and complexity.
Arrays and pointers in C already have that int. That's why sizeof() works. The issue is an extra if statement on every single array and pointer access.
They don't, sizeof is a compile-time constant. On a pointer, sizeof() just reports the size of the pointer itself (i.e. 4 or 8 bytes on most modern platforms), not the size of the data to which it points (and sizeof(*pointer) reports the size of the type to which pointer points, it doesn't know anything about how many values of that type are stored). For an array, the length is known statically (i.e. it's in the type), and so the computation can be done at compile time.
While "unnecessary"/extra generated code is a trade-off one has to consider when choosing to use the specialize/monomorphise-everything implementation of parametric polymorphism (it isn't the only one), it isn't a problem in this case: the types themselves are a compile-time abstraction and don't exist at runtime, and the functions are all tiny (a branch, a memory access and a function call/abort).
Additionally, all the functions should be inlined anyway because the function call overhead will likely be as much or more than the actual code, and, more importantly, inlining enables other optimisations (removing the branch, vectorising the memory access, etc.). Once inlined, the code will be the same as the manual/macro-based approach of writing `if` statements around each array[index] access.
Because your 'safe' implementation will certainly have a performance cost, and won't be the default. This is why, despite C++ providing std::array, you'll still find buffer overflows in C++ code. C++'s std::array provides the safe 'at' function but you're opting into a performance penalty and it's not the more familiar [] syntax.
Rust arrays/ vectors are safe-by-default. To use the unchecked, unsafe version requires using the 'unsafe' keyword.
let v = vec![0, 1, 2];
unsafe {
let x = v.get_unchecked(5);
}
This means you can basically grep audit for vulnerabilities, and the above code should be very rare.
In addition, the Rust compiler can also remove the built-in indexing checks if it can prove the code is safe. So, say, iterator loops over an array won't have any index checking.
As others have said, this is a fairly standard optimisation: compilers (Rust, C, C++, Java, PHP, whatever) will remove branches if they can see that it can never be taken. Iterators are slightly different in that they don't have any indexing checks at all: an array iterator yields a plain reference, and these are always dereferenced like plain pointers.
Iterator loops are different. They don't elide bounds checks automatically, like frequently claimed. Instead they merge it with the loop branch, so there's no redundant checking.
Neither C nor C++ knows, at the language level, the size of an array, unless that size is fixed. The subscript checking variants of C and C++ have to use "fat pointers" which carry along size information. The overhead for this is large and nobody uses that. Fat pointers used to be a feature you could turn on in gcc, but it's somewhat abandoned now.
> Neither C nor C++ knows, at the language level, the size of an array, unless that size is fixed.
Neither does Rust.
> The subscript checking variants of C and C++ have to use "fat pointers" which carry along size information.
So do Rust's Slices.
> The overhead for this is large and nobody uses that.
People use std::vector all the time for this purpose in C++. It has about the performance you'd expect, with very little overhead except where you want it in bounds-checking.
I don't think there's actually a performance difference here. Rust's default is safer because it requires dropping to unsafe code to do something dangerous, but the same optimizations are available in both.
One thing that I've heard might be a difference, but haven't confirmed yet: Rust's lack of move constructors. So you have a vector, it's full, you push one more. It has to reallocate. How do you copy all of the elements over to the new allocation? In Rust, it's a straight memcpy of T * n bytes. But due to move constructors in C++, IIRC they must be moved one at a time.
Again, I haven't actually dug into this; maybe someone more knowledgeable about this can point me in the right direction here?
This is correct. I suppose if you could get folks to mark all memcpy move ctors explicitly with a macro instead of relying on the default you could specialize std::vector's move with sfinae. Bit hacky. It already specializes for pod types though.
Lack of move and copy ctors in rust greatly simplifies things like this, and makes it very explicit when code is running, but the trade-off is that intrusive data structures are hard to do on the stack in rust.
Well, for trivially copyable types[1] the reallocation can be a straight memcpy. For the rest, I don't know that having or not having a move constructor is the important distinction; it will be preferred over the copy constructor if it is declared as not throwing exceptions, but either way some constructor of the object must be called if it exists (though it might be inlined and optimized away).
I imagine Rust does something similar, copying bytes if the underlying type has the `Copy` trait and calling some actual code if not, but I'm not familiar with the details.
No, all moves in rust are memcpy if not optimized out entirely. Rust has affine types so moves don't need to "invalidate" the source value at runtime, the compiler just doesn't allow you to use the source variable after a move.
Rust's answer to copy ctors is Clone, which is always explicitly called. Variable use in rust is a move. Trivially copyable types (Copy) will be copied without compile-time invalidating the old type.
> copying bytes if the underlying type has the `Copy` trait and calling some actual code if not,
It does not. Moves and copies are both "memcopy these bytes", the only difference is if you can use the previous copy or not. (This is also, of course, subject to the optimizer, which may elide the copy.)
> either way some constructor of the object must be called if it exists (though it might be inlined and optimized away).
Yeah, this is what I was getting at; this has to happen in C++, but not in Rust. You are right to point out that this only matters for things that aren't trivially copyable.
Interesting. How does Rust handle types that want to do interesting things when copied, like bookkeeping or updating internal pointers? Maybe you just can't, which would preclude some kinds of intrusive data structures, or owning non-copyable mutexes for example. Does `drop()` get called on objects that have been copied from?
You pretty much just can't; that's what Manish was referencing above about this kind of thing being awkward.
It would be cool to have those things, but it also means that there's less "magic" stuff going on, which is nice. And it makes the semantics of stuff like this a lot simpler.
> Does `drop()` get called on objects that have been copied from?
Nope. In fact, Copy types can't have a Drop at all, but types that move don't have their Drop impl called when they move.
Thanks for all the responses, I'm learning a lot. This explains to me why iterators and other references to the internals of a data type have to take ownership of the whole data type, which is something I ran into several times during my (brief) explorations with Rust.
Totally. There's one other interesting subtlety you might find interesting here, and that's self-referenceing structs. So for example,
struct Foo {
s1: String,
s2: &str,
}
where s2 is always intended to point at s1's backing storage. What's unfortunate here is that Rust will disallow this, as it doesn't understand that s2 is pointing to some data on the heap, with a stable address, not the parts of the String struct in s1 that are part of the struct itself. So what this means is, in plain Rust, this type isn't movable, Rust is concerned about the invalidation.
However, you can get around this restriction with some unsafe code to teach Rust about it; this is the premise of the "owning-ref" crate.
This isn't the case. They only need to borrow it. A borrowed value isn't allowed to move (the borrow itself can be copied around and shared within the scope of the borrow), so that works out.
Most Rust iterators only borrow the container or iterator they operate on. It's only explicitly moving iterators like .into_iter() (which extracts elements by-move) that don't.
This is everyone's favorite excuse to trot out. But in reality the vast majority of projects I've seen never actually measure WHAT the performance cost would be and whether or not it's acceptable. So in the usual case the answer tends to be a mix of laziness and "that's how it's always been done."
How do you suppose runtime bounds checks are done in Rust? They certainly also incur a performance penalty in not-trivial cases.
Also, "safe by grep audit" means "safe according to a human." The argument of course is that it lowers the surface area of what a human must be trusted to verify. I'm still not convinced by that argument, because human error is a thing. And for actual systems programming, "very rare" may not be true.
> How do you suppose runtime bounds checks are done in Rust? They certainly also incur a performance penalty in not-trivial cases.
Certainly. I didn't intend to imply otherwise.
> Also, "safe by grep audit" means "safe according to a human."
Again, totally correct here.
> The argument of course is that it lowers the surface area of what a human must be trusted to verify. I'm still not convinced by that argument, because human error is a thing. And for actual systems programming, "very rare" may not be true.
Well, given a codebase where both safe and unsafe code exists, the amount of unsafe code is strictly less than the amount of both safe and unsafe code. So it does reduce the amount of code needed to audit, even in a very atypical case where a ton of the code is unsafe.
It's true that a project may use egregious amounts of unsafe. That would be unfortunate. Rust is still safer than C in that case, since it just defines more behavior (like arithmetic overflow), but I certainly wouldn't pretend that the rust code should be trusted.
When writing rust one should certainly strive to write less unsafe code, and to always document the invariants required for unsafe code to be safe.
Rust is not 100% safe 100% of the time, I'm only arguing that safe defaults are critical, and that grep auditing is a powerful tool.
I wrote a little tool to help check Rust crates on GitHub. It's been really interesting seeing how different libs use unsafety. https://github.com/alexkehayias/harbor
Rust is safe by default in terms of memory usage. It is not, however, strictly memory safe. It is trivial to overflow a buffer in Rust, for example. I haven't discovered a trivial way to hide it, though.
If you're relying on any random third-party Rust crates you haven't audited yourself, don't you lose the safety guarantee? A given crate might turn out to have implemented operations on some data-structure using unsafe blocks, and then to have failed to mark its own API functions as unsafe in turn (like the Rust stdlib does, but without the "extensive manual auditing" that the stdlib gets).
AFAIK, cargo doesn't have any feature to point out when a crate contains unsafe code—so you pretty much need to grep the source of every crate you consume for "unsafe".
There's a lot more unsafe code in Rust crates than there should be. That's a fixable problem. Some stuff from the early days predates the optimizer getting smart enough that unsafe code isn't needed. I wrote on this a few days ago in a Rust topic.
While I now mostly agree with you that there is more unsafe code than there should be, I still maintain that the frequency of unsafe in a deptree is usually still small enough to be practically auditable, ignoring FFI. It could/should be much less, but it's not too bad. I've done such audits a few times and it's not been too hard and taken very little time.
Auditing FFI is a whole other challenge, however :(
Well, yeah, but you don't really download Rust binary libraries yet :)
You do have C libraries which you access through FFI. This is inevitably unsafe. We should be auditing more there. Though IMO it's still manageable, for most crates.
Right, I was just trying to put a finer point on your use of "using any unsafe" here. It sounds like you mean "using" in the lexical sense (writing the token "unsafe" in your code), but you mean it in the dynamic sense (having an unsafe block in your control flow graph.)
Let me clarify: in Rust it is trivial to use memory unsafely. It is not, so far as I have found, trivial to hide that fact because it is required to use "unsafe" syntax decoration" to do so.
The issue here is that that definition of "safe" language basically excludes all practical languages, including languages like Python, because FFI is possible.
In general when talking about safety in a language it's about the level of explicitness required to trigger unsafety.
I like the distinction made in the nomicon (https://doc.rust-lang.org/stable/nomicon/meet-safe-and-unsaf...) -- Rust comprises of two distinct languages. You have everyday Rust, which is completely memory safe, and "unsafe Rust", which looks similar to everyday Rust but is not safe. `unsafe {}` blocks are your FFI between the two. Looking at unsafe blocks as FFI is IMO a very useful mental model especially for understanding the changes to invariants involved.
"Actual systems programming" mostly does not involve unsafe code.
For example, most OS code is _not_ interrupt hooks or malloc but the rest of the OS. Most of Postgres is not reading data quickly, but higher-level abstractions.
Large-scale systems programming will always be mostly higher-level abstractions, because that's the only way to write large programs. Name any "systems programming" OSS project, choose a random C file, and you'll see that most code does not require pointer arithmetic except because that's how you do things in C.
You don't bit-twiddle for 200k lines of code, so being able to limit dangerous stuff like that to the 5k lines that actually need it makes work an order of magnitude easier.
> "Actual systems programming" mostly does not involve unsafe code.
True. Or slightly rephrased: Most of the unsafe code is centralized to some core pieces.
From the POV of a PG dev: The big problem using something like rust for something like PostgreSQL is its it's portability, stability and uncertainty about where things are going. We do five years of back-branch releases (and for many that's not even enough!). Language and tooling around the new crop of languages simply aren't mature enough for that yet.
To be fair the [] syntax is an overridable operator and you could just point it to use the "at" method. Not sure why they've implemented it as an usafe function.
If people start to consider bound checking branches as overhead (which, in some extremely rare limited cases, they are in the right to do so), they should as well understand what happen with e.g. some largely used calling convention such as the one of Windows. Even in optimised builds, a unique_ptr<> for example can have an overhead compared to a raw pointer, IIRC because it forces going through the stack. If I remember my typical latencies correctly, for modern Intel CPU and probably a lot of high perf CPU, that can actually in some cases have a greater impact than an extra bound check...
So if you are in a so performance critical section that you start to care about the "zero overhead" kind of stuff and the cost of your (predicted) bound checks, you might be impacted by this non-zero overhead... (that is falsely widely believed to be zero!)
And don't get me started about debug builds with all mainstream compilers. The performance is then complete utter shit. This is made worse by the fact such an unsafe language needs debug builds more...
C++ was an interesting experiment in its domain, and has been and still is a success in some aspects, but honestly given core language modifications are needed to significantly extend the behaviour of core types (like vectors, unique_ptr, strings, etc. -- e.g. with the introduction of rvalue references and all the associated machinery and default constructors and so over), I'm starting to believe there are little advantages of this approach over integrating such fundamental types/concepts in the core language (I'm not advocating to do that for C++, I'm thinking about other/future languages). Then you can have more true "zero overhead" stuffs, whatever that means.
> if you are in a so performance critical section that you start to care about the "zero overhead" kind of stuff
If I don't have this kind of performance needs, I have no reason to use C++ to begin with. I mean, why use such a monster if I don't need the crazy performance it may offer when used properly? And even then, C, D, or Rust may be viable alternatives.
If a call is statically dispatched, calling convention doesn't matter because static dispatch gets inlined away in hot functions. If a call is dynamically dispatched, you have much bigger things to worry about. Bounds checking is a much bigger problem for hot code.
Not 100% true when using the standard library and compilers like VC++, which do provide such checks in debug builds, while allowing to selectively turn them on, on release builds.
Would you care to provide a short rust implementation of reading an arbitrary length string from standard input?
I have rust installed. I would be curious to benchmark it and see if is indeed faster than the same thing in C, using a user-defined bounds checked array.
I'm not going to make any performance claims, but in the interest of sample code, https://doc.rust-lang.org/stable/std/io/struct.Stdin.html#me... has a very small program that does this in the most straightforward way. (You'll have to wrap it in `fn main()`, we don't show that in examples in the docs)
If you want a true speed comparison, you'd have to define more than just "read an arbitrary string". For example, encoding requirements. Your C is going to treat it as just a bag of bytes, and you can do that in Rust too, but it's not the most natural way; you'd convert to UTF-8, which isn't free. Stuff like that.
It's not so much a question of benchmarks, it's that one of the standard C tools for reading an arbitrary string from standard input is gets(). And if you reach for that from the standard toolbox, you've failed before you've started.
gets() isn't part of the C standard anymore (it was entirely removed in C11, and prior to that was marked "obsolescent and deprecated" 18 years ago in C99).
Many rust advocates talk about the performance cost of doing things safely in C, and inform me that rust has "Zero-cost abstractions". So it's fairly natural I ask for something I can benchmark.
> Many rust advocates talk about the performance cost of doing things safely in C
Who's saying this? It doesn't really make sense, taken out of context as it is here.
> inform me that rust has "Zero-cost abstractions"
IMO at best the array bounds checking thing is an extremely poor example of a "zero-cost abstraction"; at worst it's not an example of it at all. As I understand it, the term refers to things like static dispatch on closures and trait methods, iterator fusion, and generally any place where the compiler can transform high level abstractions into really efficient low-level code where in other languages/implementations you might incur a performance penalty, for instance by dynamic dispatch to heap-allocated closures, or allocation of an intermediate vector at every "link" in an iterator method chain.
If Rust solves the kinds of problems you have, then by all means use it. I don't think there exists a language that's in all aspects better than any other. For me personally, C is the language that gets in my way the least without forcing me to write ASM, and that happens to be highest on my list. If there was a language with the same idea of minimalism behind it as C, but with all the quirks and bad syntax choices out, i'd switch in a heartbeat.
Rust (which I have thus far only tinkered with) looks pretty appealing to me, though. Do you suggest using C instead of Rust? If so, why?
I'd probably suggest a higher level language if you can get away with it (portability, performance, etc), but I am not really advocating for using one language over another here.. What I am advocating is writing a thin layer of abstraction to avoid buffer overflows in the already extant code base. I suspect it would be a better choice in terms of cost/benefit than re-writing the code.
I just keep hearing about how rust handles safe bounds checked arrays with (and I quote) "Zero-cost abstractions", which to me implied it would be faster than C, since C pays some performance penalty by branching.
"Zero-cost abstraction" really means zero additional cost beyond what is required. So Rust still pays a cost for bounds checking, but it's the same cost as if you optimally hand-coded the bounds check in C.
Adding the cost of bounds checking is not actually "zero cost" relative to a C default of not bounds checking. There's really no reason to speak in this misleading way except marketing.
Of course, bounds checking is worth paying a cost! As are many other common foot-guns like remembering to free memory.
Which starts to raise the issue of when it would actually be necessary to reach for Rust instead of (arbitrary example) Java. Because a language with enforced bounds checking is not really the same kind of thing as C, and we've already had languages that are safer than C for decades.
"Zero cost abstraction" means that the abstraction doesn't impose an additional cost. It is the same cost of branching that C will have, modulo optimizations.
(Languages with dependent types can help move bounds checks to compile time, but Rust does not have those)
thanks, I do. Only thing left is explain it to everybody else to avoid panicking in 3-party crates. I really hate when people use panicking just because it's easier than error handling. So it was "surprise" to see such behavior from "[x]" construction.
Like in C, you are not supposed to pass out-of-bounds indexes to the [] operator. If anyone does so, it's a bug. Rust converts it to a panic, which safely terminates the thread/program, instead of C's undefined behavior.
You can also catch this panic at the thread boundary, so other threads in your program can keep running, unless the program was compiled to call abort() on panic.
And then people would simply call .unwrap() on the return of the [] operator, leading to the same situation. It's extremely common for an array dereference to always be within the bounds, by code construction (I'm iterating over the indexes of the array, or I have an index into the array saved inside some other structure, and so on). The programmer knows it will never be out of bounds. The assert!() within the [] operator is only to protect you when the programmer gets it wrong, and in a correctly written program, will never trigger.
(The alternative instead of .unwrap() would be to propagate the error, polluting the whole program with code to handle errors which never will happen. And since they will never happen, many programmers would simply start ignoring them - and in the process, they would by accident end up ignoring errors that can happen. Not a good situation.)
There is HUGE difference between "unwrap" (which can found by grep) and mutual dangerous behavior. Programmer can only assume, not know, without additional checks. "Polluting with error" is much better than panicking.
Besides of memory safety, what is different in panicking? I'm not trying to argue - I don't know it. For me result is the same - process failed. What is difference in consequences for memory?
Not all violations of memory safety result in immediate program termination. There's an intervening period of execution during which your program simply performs unintended operations such as overwriting unrelated memory (which could be a memory-mapped file), terminating "correctly" (and performing whatever operations that entails) with corrupt state, and generally an unbounded set of other potential consequences. An immediate segfault is the absolute best hope for a program without memory safety, since you both see no unintended effects performed on the program's behalf and you learn that your program needs fixing.
There are tools you can use with C compilers that don't access OOB memory on OOB access. What's your point?
I think I'd rather not have it panic at all. They were making a language from scratch and still didn't solve the crash-at-runtime problem that C has. Instead they just made a common C compiler option default and called it 'safe'.
It's worse than the fact that the Go creators ignored years of research after the year 1970 and didn't implement generics, for no good reason.
Rust's safety is a joke, and so is the idea that a lot of it is anything 'new'. They had the chance to fix it and they didn't. Boo.
Which common C compiler has a production quality implementation of array bounds checking?
Before you say Address Sanitizer you should be aware that its authors intended it as a debugging tool and recommend against using it in production as it introduces new attack surface, as described in the "Address Sanitizer local root" mail.
Something crashing, so an error can be reported and a watchdog can restart it (etc.) is far better than just forging on with corrupt memory and wrong answers.
People do not use the sanitisers nearly as much as they should, and nor are they designed to be used in production. There's has been CVEs issued for them caused by the testing mindset in which they are written.
In any case, how much a programming language helps the programmer get a correct program (I.e. the most general form of safety) is a thing of degree, not all or nothing. Rust does a lot more than C, pushing many many errors to compile time, meaning they're fixed before the code even runs. Sanitisers only catch these when they actually happen, and will presumably result in the runtime crash you're so concerned about, if used in production. The fact that Rust isn't dependently typed and so can't catch OOB at compile time is unfortunate in this respect, but don't throw the baby out with the bath water.
> A panic isn't safe, and if you are going to claim that it is, you can get the same safety in C using gcc and clang features.
There's one important difference. As far as I know, the bounds checking of gcc or clang can either print a warning and continue, or terminate the process.
A panic in Rust, however, will safely unwind the stack (similar to a C++ exception, in fact it's the same mechanism) and terminate only the thread. The rest of the program can continue running, and even start a new thread to replace the terminated one.
You can see this in action when running "cargo test". A panic in a test (the assert!() and assert_eq!() macros, often used in tests, do a panic in case of failure) will not terminate the whole process; the rest of the tests still run.
(panics are for "non-recoverable" errors, and so don't _have_ to unwind the stack: you can abort on panic as well. The important difference is that you get well-defined behavior in all cases.)
This sounds like the "fail fast" philosophy in Erlang. Crashing is the standard behavior for any unexpected data, but the process model makes it straightforward to manage those crashes.
> i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.
False. Buffer overflows in C can overwrite the program's memory, so it can be hijacked and supplanted with the attacker's code. This cannot happen in Rust (unless unsafe code has the vulnerability), or any memory safe language.
Sure you can implement a safe array/buffer abstraction and use it in your C programs that abort on invalid indexing. Now how many actually do this? Very few given the prevalence of C programs on vulnerability disclosure lists.
It doesn't help as much as you'd think. With a buffer overflow on the stack, you can overwrite the stack's return address to point wherever you want, and overwrite a bit further to set the arguments to whatever you want, as well as any local variables on the stack. That depends, obviously, on what code is already there, but there's a lot you can do. For instance, if the system() libc function is linked, you can overwrite the return address and arguments to call system("some arbitrary shell code").
You don't need to execute the stack. Just use ROP to fill the stack with return addresses that point to "gadgets". Gadgets are basically a single assembly instruction at the end of a function like XOR EAX,EAX and then a RET. Every time a RET is executed the CPU will jump to the next gadget. There are usually enough gadgets inside a program to basically do anything you want.
DEP only makes attacks harder, not impossible. It's not a magic bullet for a couple reasons: it's opt-out on non-*BSD operating systems so software might not be using it, and it can be circumvented with things like return-to-libc attacks.
I would like to add that there are starting to be mitigations against attacks modifying control flow, for example CFguard under Windows (which will probably be extended to cover ROP also) but like DEP an ASLR those are only mitigations, not something that would make it useless to care about buffer overflow anymore.
In order for DEP to render such attacks impossible it would need to have permission bits at the byte level. DEP on most OSes is page-level, so the stack page is marked as non-executable, which turns an attack from executing code on the stack to executing code from the text segment - either through ROP gadgets or return-to-libc.
I think he means to work with raw memory so using unsafe keyword and in this case he is right. And you can't implement certain things in Rust if you are on the quest for maximum efficiency without using unsafe.
> And you can't implement certain things in Rust if you are on the quest for maximum efficiency without using unsafe.
Trading efficiency for safety for unaudited code should be difficult. C does not make this difficult. Rust makes this difficult.
Auditing should be supported for this kind of code. C does not support auditing in any meaningful way. Auditing much easier in Rust since it explicitly identifies code that may be unsafe.
In conclusion, I disagree emphatically that C and Rust are somehow equivalent even when you are dropping down to unsafe code.
Working with `unsafe` code in Rust has the same potential issue to be sure, but very little (if any) of your Rust code should be `unsafe`. You can usually accomplish what you want without having to drop down into raw C pointers. And everywhere that you do have to do this is explicitly called out with the `unsafe` keyword so it's very easy to audit.
This is throwing the baby out with the bathwater though.
You need unsafe to do some things in Rust, sure. But usually this code is isolated and auditable, and nowhere near the linecount of the rest of the application. Even the operating systems written in Rust have pretty conservative use of unsafe (redox, Phil's OS, etc). Most of your code won't be working with raw memory operations, most of it will work with zero-cost abstractions on top of raw memory.
> You need unsafe to do some things in Rust, sure.
And that's what I wrote.
> Most of your code won't be working with raw memory operations, most of it will work with zero-cost abstractions on top of raw memory.
For some web applications sure but industry is not constrained only to writing web applications especially if you want to get into C market space. In HPC or time/space constraint environments this matter.
I'm not talking about web applications. I'm talking about systems programming. Even the operating systems written in Rust use unsafe pretty conservatively. Servo uses unsafe mostly to talk to native libraries. Rust is used more for lower level programming than it is for webapps as far as I can tell.
You might want to define further what you mean by "raw memory here". Rust lets you work with arrays and vectors and the heap safely just fine. These are designed as zero-cost abstractions over raw memory (slices, Vec<T>, Box<T>). The equivalent C (with relevant bounds checks) wouldn't be any faster. Rust does not let you do things like call out directly to malloc/free safely. But that's okay. The existence of these abstractions means that you rarely have to do this.
> And that's what I wrote.
.... huh? that's exactly why I put "sure" there, that means I agree with that statement, what followed was why I disagree with the conclusions.
To add to this, I wrote a high a performance packet generator with 100% safe Rust code for a past internship. With the right backend, it could generate line-rate for a 10 Gbit/s NIC.
You can't implement certain things in C if you are on the quest for maximum efficiency. Thankfully, one rarely is, because efficiency isn't binary. It's about trade-offs.
I could say exactly the same for Rust, that it's not enough even using unsafe keyword and I need to go into assembly. Someone could even say that assembly is not enough and we need FPGA and then ASIC etc. But it's not the point here. The point is that for ex. you can't get safety from out of bounds access without bound checking, doesn't matter which language you use. And bound checking for certain hot paths is not acceptable.
> And bound checking for certain hot paths is not acceptable.
I think the history of exploits proves that it's always acceptable for server code. The only possibly justifiable place to omit bounds checking is isolated, high performance numerical code, as used in scientific simulations. But this code isn't exposed to the public, so its vulnerabilities aren't important.
It won't work in all cases :) IIRC Idris will move type info to runtime if it can't prove it at compile time.
Basically, in Idris I can have a function concat which takes a Vector<N, T>, a Vector<M,T>, and produces a Vector<N+M, T>. You can have arbitrary expressions there, and even things like a function which returns a different type based on its boolean argument.
So most signatures will just carry dependencies through, but some things are dependent on runtime input and that's when the "N" part of the type will be tracked at runtime. At least, that's how I think it works.
Dependent types help eliminate large swaths of bounds checks. Not all of them, and I'm not sure how much better it does than a good optimizing compiler.
> I'm not sure how much better it does than a good optimizing compiler.
It can be done by static analysis by compilers in other languages also. Knowing size at compile time is easy most of the time and we are not talking at all about this case. Your answer that Idris solves that is wrong in the context that this discussion started (dynamically allocated memory/not knowing the size compile time). You still need to trade time/space comparing to solution without bound checking, doesn't matter which language you use.
I am getting info that I am submitting too fast? so answer to your other comment will need to wait.
Yes, it can be done by static analysis, but by making size dependencies part of the API it makes the static analysis much easier and able to cross API boundaries without needing to inline, on the contrary most optimizers will not handle such invariants across a few layers of function calls. Like I said, Idris doesn't solve this problem completely, but it is able to eliminate bounds checks in way more situations because the default is effectively no bounds checking.
The nice thing about dependent types is that in all the cases where you don't need a bounds check -- where a program invariant enforces that you are within bounds -- your code will not contain a bounds check regardless of optimization. Only in the cases where there are runtime dependencies that can't be resolved will there be issues, and in thee cases you need a bounds check anyway.
It's not perfect/complete, and like I said optimizers can get most of the way there anyway. I just wanted to note that there are better solutions for bounds checking.
Obviously, one can program C to do anything, and write all the provably safe abstractions wished. But, that's not really the point. The point is that doing such is not the default. It requires engagement and knowledge of the programmer, especially on distributed projects with loose communication, such as many open source projects. And it only takes one programmer mistake to bring the whole house of cards down.
Why allow programmers to make mistakes? That was fine in the 70's when resources for compiler execution were limited. I don't see any reason for it today.
I mean, just look at the underhanded C contestants and especially winners for ways in which your program can completely blow up for extremely subtle reasons.
> Isn't this also true of Rust, with its unsafe keyword?
> None of these languages are completely safe
No, but the first time I encountered an unexplained segfault in Rust was the first time I've ever found & fixed the cause of a weird segfault message in just a few minutes, because the mistake was in the 12 lines I had written inside a clearly marked unsafe block rather than the other ~3000.
Without something like unsafe blocks, finding known bugs and auditing code for other "weird" memory issues means looking much wider (and often, screaming "what the hell are you talking about?!" at the screen on a routine basis).
You are equating the unsafe keyword in Rust with entire C codebase.
Yes, carelessness in either could amount to trouble. No, they are not close to the same amount of risk because the relative amount of code in each is orders of magnitude different, and unsafe blocks can be heavily reviewed.
This is true of every language: e.g. Python has ctypes, Haskell has Foreign, Java has JNI, etc. It's a matter of defaults/conventions rather than something being literally impossible to screw up.
Totally. But in the case of rust your vulnerabilities are grep'able. For all of the code you have in a project you only have to search for the unsafe keyword when you want to audit it.
That's less of a mistake, and more of turning off the footgun's safety. Yes, it still only takes one programmer. But they have to purposefully enable such actions, as opposed to neglect to perform actions through the "proper" abstractions.
Yes. It's trivial in Rust to write unsafe code. It's less trivial to mask the unsafe code and the (unproven, in my opinion) argument is that the explicit "unsafe" keyword makes it prohibitively difficult.
Is the use of the unsafe keyword a default? I don't know Rust, but from a user interface perspective, it sounds like it has an affordance of "Hey! Pay particular attention to this bit because it is risky!"
Oh, yes, as a UI it is "be extremely wary of this code" Often folks write long comments around unsafe code explaining why it is safe. Not always, sadly.
For a philosophical counterpoint: Why allow anyone to do anything that might possibly be incorrect, harmful, or otherwise perceived by some to be negative?
I've looked at a lot of the talk surrounding "safe/secure languages", "safe/secure programming", etc., and yet every time I've heard people preach about the benefits, I feel like I just vehemently disagree. At a very deep and fundamental level, I feel like somehow we are sacrificing something more important in the pursuit of this "safety", this seemingly overpowering desire to make everything completely safe, mindlessly constricted, and stifling. It's not just software; the whole "war on terrorism" irks me in the same way. I imagine a "completely safe" world, the ones these "safe software" proponents appear to be striving for, would be rather dystopian.
A quote that immediately comes to mind is: "Freedom is not worth having if it does not include the freedom to make mistakes."
People rely heavily on software in many aspects of their lives. They entrust it with their personal information, their money, and in many cases their physical safety. Engineers building software and companies selling it are ethically obliged to make a good faith effort to prevent defects that might betray their users' trust and cause harm. Languages designed to enhance the safety and security of software written in them are one tool that can be used in this effort.
Are engineering standards for public buildings evil because they stifle architects' freedom to design whatever crazy structures tickle their fancy? Should power tools not include safety features like blade guards because their users' freedom to accidentally kill or maim themselves must be held sacrosanct? Probably to both questions, the reasonable answer is "no".
If you're building something for yourself, and offer it to others only with clear warnings, then go nuts and make all the mistakes you want. Nobody's saying you can't do that; that sounds rather dystopian, a society where pointer arithmetic is illegal! But you can't treat a project people are meant to trust and rely on as your personal art project.
Are engineering standards for public buildings evil because they stifle architects' freedom to design whatever crazy structures tickle their fancy?
The physical analogy is good because even there one can see that there are different standards --- and, unlike what the "safe software" community seems to promote, engineers are not doing the equivalent of making every building strong enough to withstand a nuclear war and calling anything less "unsafe".
This also brings up another difference with software: the "absoluteness". In the physical world, no security is perfect. Locksmiths exist, and with enough determination, essentially anything can be broken into. But in the software world, with good encryption, that can never occur. Provably correct software can be employed for effectively unbreakable DRM and un-rootable/jailbreakable user-hostile devices, of which there is (fortunately) no similar real-world analogy I can think of.
Nobody's saying you can't do that; that sounds rather dystopian, a society where pointer arithmetic is illegal!
Given that there are attempts at even prohibiting Turing-completeness[1][2], I would not be surprised if that eventually happens. As it is, I'm sure there are already people who would consider you suspicious if you write software in an "unsafe" language, and from there it is not far to complete prohibition.
So... you're saying it's unethical to invent tools that can be used to write software that works as intended and contains no exploitable bugs, because that software could then be used as a tool of oppression?
You can extend that absolutist argument to say that inventing anything that could be used as a tool of oppression is unethical. Like, inventing plumbing may have done great things for human society, but it was ultimately unethical because when the police came to take away your general-purpose computer they used a pipe to hit your kneecaps until you told them where it was. Or, how about this: developing any society beyond the level of the most primitive hunter-gatherer tribe is unethical, because what are governments if not the agents of oppression themselves?
This is the sort of abstract position that can't really be argued with in a vacuum, so I'm not going to try. But it's also not a useful ethic for building a modern society free of large-scale oppression, because it completely ignores the practical realities of doing so.
In the mean time, buggy, exploitable software is out there in the real world hurting real people every day.
> ... making every building strong enough to withstand a nuclear war and calling anything less "unsafe".
Yes. Except the equivalent in software is using something like Coq, where your intention is proven to be correct. (You even mention provably correct software in the next paragraph...) And yes, it would be awesome if we could prove every program, because all we would be doing is proving that the programmer's intent is correctly coded into the software, which doesn't limit freedom of expression at all. (I would argue that such is actually an even stronger form of expression, as it's afterwards impossible to misconstrue your intent.)
> Given that there are attempts at even prohibiting Turing-completeness[1][2], I would not be surprised if that eventually happen...
Rust is not one of these, and no one but you seems to have this in mind. So it's a bit of a strawman to be throwing it out here, no? But yes, obviously such a language would limit your freedom of expression.
Except we're not talking about limiting your freedom of expression. I'd challenge you to give me an example of a program that you can't express in Rust, with the obvious exception of silly and obviously unsafe things like "read from a random memory address". Which, by the way, was deemed so unsafe that the operating system itself will probably kill that program dead. (I can think of one excellent answer off the top of my head that requires `unsafe`.)
We're talking about eliminating a class of programmer error through static analysis. (If you're writing a large C program, you arguably should be using static and runtime analysis tools on it anyway.) This is no different than using a strong type system. Yes, it limits your "freedom", but sending a "banana" to a `sine` function is nonsense anyway. Well, it also ends up that reading 10 bytes into a five byte array is also nonsense, and we now have tools to detect that nonsense and tell you that it is, in fact, nonsense.
You can still be clever in Rust. You just have to try harder and explicitly say "I'm trying to be clever" here. You might make a mistake, of course. But if you need it, the option is there.
There is no philosophical beauty in typos or off by one errors. You are not being stifled when your compiler points out to you that, no, "coutn" is not a variable currently in scope. If we have a static proof that what you just did will never work, why would you still want to do it?
And if you had a reason, you still have a way to do it.
If you're trying to be clever, you need to prove that you know what you're doing.
That's a great quote, and it's one that Rust---as a a memory safe language---completely embraces. You have the freedom to do anything you want. Some of those things simply require you to type "unsafe."
Forgive me for being cheeky, but just as Rust requires you to type "unsafe," C also requires you to opt in by typing "cc".
My serious point is that in practice, the example of C shows that if it is available and people understand that it is "performant" then you will see it all the time, including in libraries you are forced to use.
> Forgive me for being cheeky, but just as Rust requires you to type "unsafe," C also requires you to opt in by typing "cc".
This isn't an accurate analogy at all because I can compile thousands of lines of Rust code without even thinking about needing to use `unsafe`. How many lines of C code can you compile without `cc`?
> My serious point is that in practice, the example of C shows that if it is available and people understand that it is "performant" then you will see it all the time, including in libraries you are forced to use.
Except this is already demonstrably false in the Rust ecosystem.
Freedom is not worth having if you squander it by making easily avoidable mistakes, either. The point is to make mistakes that lead somewhere. I.e., mistakes that we can't yet write programs to easily prevent.
And there is no shortage of mistakes available for you to make, so don't worry about that.
There is no need for another unsafe fast language. There is also no need for another safe slow language. We already have enough of those. Rust's purpose is to find a middle ground between those extremes.
And once you have written a five line safewrite() function, a hundred-line saferecvfrom() function, a 1500-line safeioctl() function, and all of that, and you have a third-party static analysis tool to prove that you're never calling the unsafe read() or ioctl() functions except from the wrappers, what was the advantage of staying in C in the first place?
I can answer: safe read/write from a simple system call can be "easy." Safe read from the network, possible but no longer easy. Safe call to ioctl, which can literally do anything with any device driver... is a "safe" version of that even possible?
ioctl is a marshaling mechanism that can represent a library API of almost unlimited proportions and variety. A safety wrapper basically has to target the specific functions, rather than ioctl itself. That has happened within Unix, with some library functions having originated as wrappers around ioctls.
safeioctl may have to contain a redundant switch (or more OOP-style dipatch across multiple functions) on the command code in order to convert the safe style arguments to each specific low level ioctl call. That sort of thing can easily explode in line count. I can see exactly what geofft is talking about.
Ok - it's true that ioctl has a generic interface (hadn't remembered that). We can think of it as hundreds of functions, not just one. So wrapping ioctl "completely" is correspondingly a lot of work.
However the point was of course to make a safe-land layer which contains the needed functionality. The point was never to have a function "safeioctl" in the first place.
The advantage is that you're depending only on yourself and C. That third party static analysis tool can just be "grep". Or some mild text processing on the output of "nm" to validate that no object files (other than the allowed ones) have external refs to those symbols.
All the way through the 80's up to the early 90's, C toolchains only mattered to those that were working on UNIX workstations and servers. On home computers it was just, yet another language, with compilers generating slower code than junior Assembly programmers.
So of course Rust has to catch up a few decades of market use.
Again, if you're disallowing platform headers and writing tooling to make sure you're never calling libc, what's the advantage of writing in C? You have all the headaches of switching to a new language, with none of the features.
And, honestly, one of the features is "vibrant community of developers." Even if Go and Rust were bad languages, which they're not, they'd still be better choices than C-with-custom-in-house-restrictions-and-libc-wrappers, simply because of the communities around them. If you're writing in Rust, someone else has already written the safe C wrappers, and if they haven't, there's a community of people who will code-review your wrappers for safety and merge them into a centrally-maintained project, which is extremely useful.
I'm not advocating use of C, just pointing out a simpler way to check dependencies.
To answer your question though, the advantage of using C is certainly not the notorious bad-habits standard library. C is fun and productive exactly where there's just you and some bits and bytes to bang around. Coding in the small. Not platforms and architectures.
You can easily access external symbols in C without including a header. The external refs in an object file are the accurate info about what is referenced.
1. Buffer overflows aren't considered the most insidious issue in C nowadays. That award would probably go to use after free, which is not so easy to fix.
2. In C, it is easier and faster to do the wrong thing. Compare "char buf[256]; strcpy(buf, foo); ..." to "array_t buf = array_create(strlen(foo) + 1); strcpy(buf.ptr, foo); ... array_destroy(buf);"
3. Buffer overflows do not in fact come up routinely in Rust the way they do in C.
I'm no C expert, but those do two different things (fixed vs. dynamic length, stack vs. heap alloc). And the first one is safe with strncpy, right? Though I don't know why you'd ever have code like this unless you like undefined behavior and want to return the array. The second example seems fine and not particularly difficult or verbose.
> Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening?
That it's not a by-default and forced language-feature and that most developer aren't going to spend those 10 minutes when they need an array.
They'll just use the language-provided array-implementation instead. Which in C is very, very unsafe.
Compiler vendors have been resistant towards putting in such features. Bounds checking slows things down, and the performance race is very much a thing in C compiler implementations -- a compiler that can deliver a few percentage points better code can be a big win to teams working on compute heavy problems. C11 has Annex K which has a lot of safety features, like memory safe arrays. Unfortunately, none of the vendors have implemented it even as an option. Which is a shame because it would solve a lot of problems, with requiring minimal rewrites for a lot of code.
I think it breaks down when that array has to interact with system libraries or the C stdlib in any way.
A lot of C string functions have weird gotchas related to terminators and sizes, and any IO you're doing will involve raw buffers being passed into or out of a system IO function that doesn't understand custom array types.
Its useful to be able to remove safety checks for speed. I have a C++ code where all data is in array objects. Bounds checking is a compile time option, and it makes the overall code 2X slower. I can do testing with bounds checking on, but once it gets to a supercomputer that needs to be removed. Address sanitizing by compilers is an even more effective tool for this, especially for C. Bounds checking is critical for security, but if you're only concerned with correct execution then a segfault is not much different from an exception.
technically you probably could limit yourself to using a "safe" subset of C - basically no pointer arithmetic, no strcopy() etc - but that would defeat the purpose of using C in a first place.
Or perhaps just some helper functions in C that wrap array and pointer allocation/access to provide sanity checks. Seems like moving to a new language is rather extreme....
If you want to do this well, as other commenters pointed out, your entire C library interface is going to have to change to accept bounded arrays instead of pointers and lengths as separate arguments, or worse, just pointers and implicit lengths by looking for the NUL character. But then you've given up the ability to directly call into existing code; everything is at best using a little translation layer to check bounds before/after passing buffers to outside functions. So the advantage of staying in the language is minimal (note that you've given up on "C strings" entirely), and if you're going to have all of this, you might as well use a language with compile-time checks (types, etc.) that you're doing all of this right.
Rust and Go are not your only choices here. C++ with aggressive use of modern features and aggressive non-use of everything inherited from C is also a fine choice, but enforcing that discipline is hard, and the compiler won't help you a lot. Honestly, a dynamic language like Python or Ruby is also a fine choice in many cases, although NTP might be too latency-sensitive for that to work. (But it might not be! Premature optimization and all that.)
Sure, glib for example contains all the wrapped network / file / string handling you need. But then you need to make sure you use it correctly. And that no code goes around that API. And that you didn't make mistakes that invalidate the safety of your abstraction.
Moving to an abstraction would potentially solve some overflow issues and requires constant attention. Moving to a safe language likely solves all of them and you get checked by the compiler instead.
Why? Rust includes index checks by default, so you have zero chance of programmer error. And, to boot, it removes the checks when the compiler decides that the code is provably safe.
I never stated whether the checks were static or runtime. In this case, they are runtime checks.
What you are talking about is value range analysis. Rust does a limited amount of it, and if it can prove safe indexing it will remove the runtime checks.
A properly-designed Rust API will not allow code without error handling to compile, so 1) should be much less relevant in Rust.
2) I can't remember if a panic in Rust calls destructors, which would clean up that memory. Can someone answer that please?
3) is only relevant in FFI scenarios in Rust, and everywhere else is irrelevant because Rust does not require the use of such footguns for basic string manipulation.
A panic _may_ call destructors, but it cannot be relied upon. For example, aborting is a perfectly reasonable panic implementation, and your destructors won't get called. If you have unwinding panics, they will call destructors though.
You'd generally use Result<T> for recoverable errors (which won't have any destructor issues), but in some cases you might put a panic::recover at the event loop level (never seen this being done before, but I can imagine it happening) which catches the panic (only calling destructors up to the catch point -- this is done by the unwind mechanism) and moving on to the next event in the loop.
Final word on the subject - what we have is people who are trying to make Open Software Reputation Points by finding a problem and fixing it, rather than waiting to find a real problem and fixing that.
When the figure of "ten minutes" was used - that's really what it should be as a mean or median figure, with some long-tail outliers for knotty cases.
While I am (somewhat) sympathetic, I don't miss what it is - it's make-work. There's not some body of material on language, O/S and software design available today that wasn't available 40 years ago. People new better back then, and didn't do better.
But the endless, circular, Utopian arguments just don't cut it. It ain't the crate, it's the pilot. I am sorry that oh so many are put upon to actually - gasp - test their code but that's what this is.
The economics of a Rust or Go or whatever make perfect sense - in the long run. Just recall what Keynes said about the long run.
I see where you're going with this. I normally would agree. In this case, we have a large, C program that evolved over time for a complicated feature set. That they trimmed that much fat out might already be saying something about how much risk might be in that code. It's also been around a long time in important infrastructure plus will continue to be. Converting it to something that's safer in most ways, esp dynamic memory & types, to reduce number of vulnerabilities in near and long-term makes sense.
So long as one can convert it in a straight-forward way that preserves the current meaning of the program's statements. If not, the port might introduce new problems.
Edit: Btw, send me an email (see profile). Got something you might be interested in. One or two things.
Yep. It's always about "the space between the buttons" in code bases. That's not pretty; I know that. It's a huge source of human suffering. But I wonder how reducible it truly can be. As they say, its (still) early days yet.
That (email thing) would be awesome, Nick. Thanks.
This is how I feel. You cut the code to 27%. Great. You C99'd the code. Grand. Now you're going to rewrite the whole danged thing in a new language. Wonderful. This is the sort of stuff I cared about so much as a junior dev. But a user asks, are we there yet?
But adding functionality grows in effort as more code is added. At some point it makes sense to improve a code base so you improve the rate at which you are getting "there". In fact there are a lot of code bases, perhaps most, which simply cannot get there in any sane way and fall short of achieving their goal. You might not care how goals are achieved as a user but as an engineer it's worth exploring and discussing different approaches.
Just to be clear - I don't suspect that the things that pretend to replace C are bad, or no good. I've just seen multiple "pretenders to the throne"[1] and what would appear to have happened is that we just moved the pathology around.
[1] please excuse the horrible metaphor.
After a few iterations of that, one begins to think that perhaps this is a distinctly human problem that is not necessarily addressable by improved language systems.
My greatest concern is that I keep seeing people learn the same things, over and over, on the job. There's no real repository of literature to actually address any of this - each engineer appears to have to learn it mostly from scratch.
There's a famous bon mot from pubic choice economics - "something must be done; this is something; this must be done." I just hope all the new crop of languages are not that.
Finally: If you can, and if you can learn the right patterns ( isn't that true of all languages, though?), the thing I have gone to, again and again that is better than test equipment in terms of reliability continues to be Tcl.
A 62 KLOC secure NTP server seems like an ideal project for this kind of experiment. I imagine it would be self-contained enough to actually use Rust or Golang instead of just treating them like FFI scripters.
> One such cleanup: we’ve made a strong start on banishing unions and type punning from the code. These are not going to translate into any language with the correctness properties we want.
Really? This sounds like idiomatic rust to me (heavy with enums).
C unions have no discriminant. But yeah it's a pity—I'd try to hack up discriminated unions then. Or convert to Rust unions and then enums and remove unsafety.
C union declarations should correspond pretty closely to rust enum declarations.
It's a surprisingly important feature. Every large codebase I've worked on has clunky workarounds for storing heterogeneous types in collections.
Haven't seen a silver bullet; dynamic languages are great at this until you have to scale (either in lines of code, number of types or dataset size) and then they become unmanageable. Static languages force you to type a lot and build in-memory ETL-style transformations but scale better. Using SQL is another solution, but that separates you from the language's type-checker and can be another source of error.
Type safety at the serialization boundary was the promise of CORBA and protobuf but we need better tools. I hope we see more focus on ser/des types in the next generation of industry languages.
I think they meant that the use case corresponds closely, not the representation. Generally C unions get used the same place Rust unions would in a rust program (not vice versa though), unless they're being used for type punning, which is pretty rare anyway.
If the goal is to translate automatically to a rust enum declaration, that will be (a) possible to do with an automatic tool and (b) will give enhanced safety because it will force the type to be checked everywhere these values are used.
Outside of the language war bubble it's really great to see a post like this. Practical concerns, reasonable advantages/disadvantages of each language, a real project dealing with real timelines. Thanks!
Was going to say the same. In the past ESR has come across as a patently arrogant gun maniac, but the first part of this post is great for all the reasons you mentioned.
(What irritated me though was the switch to first-person narrative at the end).
I'm excited to see where this goes because it could go a long way towards providing concrete data for the large "work to replace old infrastucture C code with (Rust||Go||Modern C++)" discussion that has been taking place.
More data points will help to inform discussion, or at the very least add structure to the flame wars.
This is literally the only thing I can think of that "NTPsec" can do that would result in the project having any relevance. I understand why some very specific sites are chained to the ntpd codebase, but the vast, overwhelming majority of the ntpd deployed base not only isn't tied to ntpd, but also doesn't need 99% of what ntpd does. Trying to "secure" that codebase always seemed to me like a very silly windmill to tilt at.
I wish more mention of D would happen. It is compatible with C and C++ libraries and features GC without sacrificing the good things of C and C++. I always loved the idea of Rust and Go but they are nowhere near C or C++ where it matters to me. D fits the bill, otherwise I just use Python. I like being able to design software in my own way as opposed to being told how to do it.
The truth is the Rust leadership did the hard work of building a community around the language, making high-quality tutorials and introductions to the language all before it stabilized for 1.0. Many developers claim to dislike "marketing", but Rust did its marketing/evangelism and D didn't.
From a purely technical standpoint, Rust's borrow checker is able to catch data races, while D has no such functionality (unless I'm not caught up), which is a huge advantage in today's world of multithreaded applications.
Same thing with Ada. If you're looking for a safer low-level language with a long track record of successful use by some major players, Ada is the natural first candidate.
But as someone else said, Rust has had way better marketing (and is newer, which probably catches some people as well.)
I personally don't know any, but I guess avionics, trains and things like power plants might be candidates where one could look.
Automotive has unfortunately settled mostly on C and build a whole ecosystem around it (Autosar). With all that talk around self driving cars safer programming languages and tools would make a lot of sense.
I very much enjoy using D and want it to succeed but this statement along with finding "everyday uses" for things (e.g. using zeromq) are huge drawbacks to buying into it. I'll still advocate but it's not easy sometimes
I think three things attribute to a programming languages success or failure: tooling, documentation and standard library. Look at Go, great out of the box library, same with Python, but the tooling for Go isn't the greatest (debugger? a real IDE? etc) then you look at C#: tooling (Visual Studio, VS Code, debugger), documentation, and even out of the box the .NET Framework is a decent "standard library" with plenty included. If only D had at the very least a well dedicated IDE. I hope more people notice D so that the community around it builds solutions, it has a lot of great potential yet to be tapped, and plenty of groundwork has already been done with D.
What do you mean by "they are nowhere near C or C++ where it matters to [you]"? Specifically with Rust (because I already know why Go isn't necessarily a good replacement for C or C++). I know very little about D, one of the few things I know is that it is compatible with C++, which is neat, but that's surely not a concern when talking about C.
As far as being able to write code in a paradigm I'm comfortable in as opposed to being forced into one when the purpose of a language is to replace C++ (which is what Rust and Go both were aimed at). As well as being forced where to put the opening bracket like in Go, which at first didn't bother me, but after a while it became a code smell of itself. There's plenty of talk about Go programmers copying and pasting code, which violates the DRY principle. I like having the option to use GC in D and still being able to compile my code. I even converted some C# code over to D that a project of mine heavily relied on and it compiles perfectly fine regardless of distro or OS I compile it from, granted it was very small / minimal code.
Me too, but although the community is great, lack of focus on how to go forward in memory management, runtime and deprecated packages, seems to keep hurting the language's appeal.
How much concern would non-standard architecture support matter for ntp? Given how many architectures Linux supports, I would think that C would still be the best choice, until these other languages gain support for those missing architectures.
Or perhaps it's a good opportunity for a language which offers transpilation with ANSI C as the target?
In addition to squiguy7's point, I'd add that A: it may not be clear if this is your only contact with ntpsec.org [1], but this is actually a fork of the classic ntp, so they may be more willing to abandon some older architectures to produce a more secure product going forward than the core NTP project would and B: the older versions will still be around even so.
Also, it has been suggested by some experiences that older architectures often end up getting supported past when they really should be stopped out of sheer inertia, though I'm not having luck digging up the articles that prompt me to say this. Are there that many systems out there running ntp that can't run Go and/or Rust, and if there are, are there enough to be worth bending the project around? If the people running those things care, perhaps they should fork the project and maintain it themselves. Which is less harsh than it may sound, because they can still pull from upstream, and they probably just need to tread water rather than stay up with the latest & greatest.
(I should emphasize that the operative question is are there enough to be worth bending the project around, rather than whether there are any. Because there certainly are non-zero numbers of systems running ntpd that can't run Rust or Go. But who's going to pay to maintain them? Especially if classic ntp is still available?)
Well, it's not just older architectures which are not currently supported, it's architectures used in IOT devices, mainframes, and other non-commodity hardware. Specific to LLVM based languages, if it's not a priority to Apple (or the other big LLVM players), it infrequently gets done.
Specific to IOT devices, I would personally love to see secure everyting. NTP, TLS, SSH, etc.
There is a lot of community interest in Rust with regards to running on embedded devices, so I wouldn't be so quick to point to IOT devices as a reason for not using Rust. That said, I don't know what the current state of support for these less popular architectures is. The official page for platform support is https://forge.rust-lang.org/platform-support.html.
Yeah it should. AIUI Rust uses a fork of LLVM, but they should be regularly pulling in upstream changes (but I don't know how often this happens) as well as submitting their own changes back upstream.
I expect that most of these older devices don't receive much in the way of software updates anyway, so the point is moot. For things that do receive updates, I would much rather see a secure NTP implementation that the majority of the world can use, even if a few parts of the long tail get excluded... and they still have the old versions of the C-based NTPsec they can use; I would also expect there to be a niche market arising for releasing security updates to it after the transition.
(Specifically addressing IoT devices, IoT security is so abysmal in general that there are much easier ways to compromise one than targeting its NTP daemon, if it's even running one, and if it's even "bothering" to run a secure one like NTPsec.)
Dropping support for things is certainly not a decision you make lightly, but sometimes the greater good demands it.
If it's a business requirement, they should hire engineers and contribute. Expecting free support seems unreasonable. There's a high cost for supporting and maintaining features, so somebody ends up having to pay for it.
To be fair, I imagine most of those arch manufacturers simply think "embedded developers use C, we support C, job done."
LLVM support would benefit the developers more, but they have their own budgets and constraints, which are probably not conducive to submitting and maintaining a LLVM plugin.
We are getting some piece of it with companies that still sell safe system programming languages (e.g. MikroElektronika, Astrobe), expose mostly via Java or its variants (e.g. Oracle, Aicas, Aonix, IBM, IS2T, Google), .NET (Microsoft).
No, LLVM support does not automatically mean Rust support, though it's a prerequisite for it. There are several small things which have to be defined in the Rust compiler. For instance, which registers are used for stack unwinding, and other ABI details.
LLVM is generally usable on sparc, although a significant amount of optimisation work still needs to be done, and there are some specific variants/platforms that sparc is used by that need better support.
That includes one line per architecture / platform (i.e. OS) combination. I only count about 14 architectures, though my rustc is old (GCC has 23 major, 24 minor, and 30 legacy as a point of comparison).
Last I checked, some of the popular IOT architectures (atmega, etc) were not included without some 3rd party plugins.
What arduino-class (that is, low power microcontrollers) devices are moving towards ARM? Seems like a very different use case from portable computers (like the Raspberry Pi) - the power consumption differences are pretty major.
> Nobody is going to be running NTPsec on it
Never say never. :) There are wifi, ethernet, and clock shields for Arduino, all of which are running microcontrollers, and many applications which would benefit from using NTP.
Microcontrollers aren't moving towards application level processors like the Raspberry Pi uses, they're moving towards low power ARM cores like the Cortex M0/M3/M4.
It's not clear to me why you'd choose an AVR for a new product unless you really, really needed a specific feature. Current generation ARM Cortex M0 devices are available at a similar cost and with significant performance gains, with the benefit that if you realise you need a more powerful core you can scale up to hundreds of MHz with effectively the same code base.
I dunno of many Cortex M0+ devices that can source/sink 40mA like an AVR while only sipping 200uA (microAmps) like the AtMega328pb. That's active mode btw, AtMega328pb drops down to 1.3 uA in sleep mode.
Cortex M0+ devices are "larger" than AVRs. You have more RAM, CPU power and unfortunately... complexity and power usage.
But its still got an order of magnitude more power usage and complexity over AVR's AtMega328pb. The ATMega's "active" mode is comparable to the LPC811's "sleep" mode.
Most of the LPC811's pins only can sink / source ~4mA, while the "biggest" pins can only sink / source 20mA. In contrast, all of the AtMega328pb pins can sink/source 40mA. Easily driving LEDs with only a resistor (instead of having to hook up a transistor or external buffer of some kind)
The LPC811 also shows what happens to cheap ARM chips: they are missing ADC converters, PWM, Real-time Counters (or a deep-sleep mode that still keeps the 32kHz clock active). Sure, you can buy these externally... but the AVRs and PICs have superior integration.
> It's not clear to me why you'd choose an AVR for a new product unless you really, really needed a specific feature.
Running an LED with just a resistor on any pin is a pretty nifty feature IMO.
------------
The AtMega328pb is a "bigger" and more expensive part though. Perhaps its more "fair" to compare the LPC811 against the AtTiny44A (which is also $1.50ish). ATTiny44A has similar power specifications as the 328pb.
I don't have a single source about this. But some hints are: 10 years ago every µC manufacturer had it's own core and instruction set and basically sold them exclusively (maybe apart from some 8051 stuff). They were AVRs, PICs, V850, PPC, etc. Now when you look that the portfolio of those manufacturers they nearly all have at least one product with an ARM core in their portfolio. Some might even already stopped releasing new chips with their proprietary cores. Others have done that not yet but are planning to do this in the future (I've heard of at least one).
Another hint is when you actually get in touch with a lot of complete product designs. The amount of ARM cores that you will see in there seems steadily to incline.
I dunno if its a 'major' migration, but NXP is basically doubling down on ARM-only architectures.
Microchip / Atmel (same company now btw) are doing an "everything and anything" strategy. PIC, ATTiny, ATMega, UC3, AND ARM chips are available from them.
It seems like the smaller 8-bit microcontrollers use less power and have better features for embedded engineers (ie: ADC converters, PWM, Real-time clocks, deeper sleep modes).
While ARM has the general benefit of being much faster from a computational perspective. But if you want to read the voltage from a simple thermo-resistor and then output it on an I2C bus... I'm thinking a classic 8-bit AVR is going to be superior over any ARM.
--------
I'm still seeing 8051 stuff pop up everywhere, and that thing was supposedly dead years ago.
> Never say never. :) There are wifi, ethernet, and clock shields for Arduino, all of which are running microcontrollers, and many applications which would benefit from using NTP.
FYI: Microcontrollers really ought to just be interpreting the WWVB radio signal (aka: the 60,000 Hz Atomic Clock radio signal throughout the entire continental USA). Alternatively, Microcontrollers easily connect up to GPS modules for an alternative radio signal / alternative time source.
If anything, Microcontrollers are a great interface to the 60kHz Atomic Clock signal and are therefore would probably be the best NTP-server.
In any case, the "real" best architecture is probably a microcontroller doing Radio Logic / digital signal processing for WWVB, and that is connected through a simple connection (ex: I2C that says the last announced time from the WWVB signal) helping a "bigger" Raspberry Pi. And the Raspberry Pi can handle the ethernet / server stuff
I would start by getting the code to compile with g++, then begin migrating the dangerous C constructs to safe C++ constructs. IMO, that would be a safe, reasonable thing to do.
Project leadership, coding standards, automated analysis and code reviews.
But realistically speaking it might be easier to change tools for a significant number of teams, because the software development community is in general poor at leadership and process. A rewrite seems more approachable for the average (not necessarily average skills-wise) dev compared to a change in attitude, self-reflection and incremental improvement at a department or company level.
Then the rust compiler will pummel everyone into submission. This can be a succesful strategy. :)
Which I seldom see in enterprise projects, specially in companies whose business is unrelated to software development and IT is nothing else than a cost center.
After reading this post the idea of a C-to-C translator that injects bound checking, etc. comes to mind. Such translator could be used by OS distributions to provide safety in the least intrusive way and possibly completely automatically for many C codebases they have in their repositories. Translating into Go or Rust, on the other hand, cannot scale beyond some individual projects, that decide to undertake such efforts. Mainstream C compilers could implement safety features too, but realistically it cannot happen, as it's not something most people care about. So, C-to-C translator might be a best bet with the most impact.
As we've discussed, one issue is the big performance cost of these solutions. Anyone interested in working on an automatic C to SaferCPlusPlus[1] translator? It should address the performance issue[2]. And should be much more straightforward than these C to Rust/Go translators.
I thought you have an interesting solution to improving safety of C++ code. So, your techniques work with C code done in the old and new C styles w/ no work? Just fire and forget translation of C code written in arbitrary styles to to be completely memory-safe?
Yeah, the idea is that C/C++ has a finite set of "dangerous" elements. SaferCPlusPlus attempts to provide safe compatible substitutes for those elements. In C, basically the only dangerous elements are pointers and arrays (if you consider them separate things), right? SaferCPlusPlus provides safe compatible replacements for pointers (and new/malloc and delete/free). There is a "general" safe pointer type that can be used as a direct substitute for native pointers in most situations, but can sometimes have a noticeable performance cost, depending on usage. Faster safe pointers are also provided, but cannot be used in all situations.
Replacing all the pointers and arrays in your C code with the safer substitutes will eliminate the possibility of invalid memory access, in your code. Of course this doesn't prevent them from occurring in any unsafe libraries you use, including the standard library.
Also, there are certain behaviors in C that may not translate well to SaferCPlusPlus. Like exotic pointer arithmetic. Or, for example, you could imagine some C code that compares two pointers that point to items that have already been deallocated to see if they were previously pointing to the same item. Is that valid in C? Anyway, that kind of thing is not supported by the safe pointers.
There is not yet an automatic translator from C to SaferCPlusPlus, but the translation for most code is straightforward and direct. No new paradigms or "Rust borrow checker" type restrictions. You can check out the benchmark code I linked to in my comment to see examples of C++ code before and after conversion to SaferCPlusPlus. A little code reorganization sometimes helps to achieve optimal performance. But isn't that always the case? :) And of course, SaferCPlusPlus requires a modern C++ compiler and has dependencies on the standard library.
It's not yet ready for primetime, but Scala Native (http://scala-native.readthedocs.io/en/latest/) might just make a splash in the systems space. I don't think it has anything like ownership yet, but I wouldn't be surprised if it eventually develops that capability. I think you can get it to run without GC, too, but using C Stdlib memory management. Although, that largely defeats the memory-safety.
Just throwing it out there as something to keep an eye on!
Looking at the current new and coming languages
I would take a hard look at NIM. It may be is not there yet
but it looks highly appealing, is as fast as the often
mentioned Rust and compiles significantly faster.
http://nim-lang.org/
I didn't understand why rust and go are natural alternatives to C. Wouldn't C++ be a more natural option? (Despite the fact that both go and rust are developed by third party companies)
I did a lot of C++ years ago, so maybe things have changed since then, but I think Rust and Go addressed a lot of the design flubs of C++.
My experiences getting things to compile across gcc and visual c++, dealing with strings (especially Microsoft's WCHAR), reliable integer sizes (pre stdint.h), and debugging templates were not things I would wish on anyone.
Re-doing some of my side projects in Go and Rust was a lot more enjoyable. I could focus on what I was doing instead of trying to work around deficiencies in the language and its libraries.
I still don't get it. Can't it be done in c++ via appropriate data structures? I mean, it's not like Go is magic, at the end I suppose Go would be doing just that. As far as I know the main reason people opt for Go over C++ are the compiling times...
You can write safe C++, if you're careful, and everyone you work with is careful, all the time. Judicious use of features can make your code more safe, but they can never make it actually safe.
Rust and Go, in various different ways (and to various degrees), make it actually safe.
If C++ is "(nearly) a superset of C" why is the C++ standard twice the size of C standard? Of course if we take that 0.505.. ≈ 1, then it is indeed a (nearly) superset.
My experience is that using C for main(argc,argv)-style programs is rarely a problem. Trouble comes when using long running single-address space containers for service-like abstractions with pthreads etc.; in that kind of environment, malloc() and co. don't cut it because even if you get memory allocation right, unless using pooled memory allocators, memory fragmentation is becoming a serious (ie. unsurmountable) problem.
It's been said over and over since at least the Java times that creating OS processes for individual service invocations is bad for performance, but I've never seen proof for this statement in the form of a benchmark.
Even the OpenBSD developers (who know a thing or two wrt. security of memory allocation schemes) diss process-per-service-invocation architectures in their httpd implementation (eg. calling their CGI bridge "slowcgi" and favouring fcgi over it).
Isn't that inconsequential? I mean if there's a performance problem with CGI-like process-per-service invocations, why not target these problems at the OS level (or via pooling of network connections or whatever the bottleneck is)?
Rust is surely fine, and an improvement over C, but its main advantage is that all the rust code is written now, when everyone takes more care about security.
It doesn't have to deal with 40 years of bad legacy code written by sloppy developers.
You can obtain similar quality in a C modern code-base, using tools like static and dynamic analyzers. In fact, today the hardest issues came from multi-threading. I won't even dare to write multi-threading apps without helgrind/TSAN.
And Rust doesn't help in this regard. From: https://doc.rust-lang.org/nomicon/races.html
'So it's perfectly "fine" for a Safe Rust program to get deadlocked or do something incredibly stupid with incorrect synchronization.'
Which higher-level abstraction to use is a question and depends on the problem at hand; for throughput computing in C, Cilk is perhaps the best today (I guess you could call my own project checkedthreads a bit of a Cilk knock-off.) AFAIK, Rust developers are more interested in higher-level abstractions than Go developers (Go's built-in goroutines being on par with threads in terms of ability of automated debugging tools to flag bugs; perhaps there's work in Go on higher-level abstractions but the vibe I felt, perhaps mistakenly, is that these are non-problems, which I disagree with: http://yosefk.com/blog/parallelism-and-concurrency-need-diff...)
Data races are mostly prevented through rust's ownership system: it's impossible to alias a mutable reference, so it's impossible to perform a data race. Interior mutability makes this more complicated, which is largely why we have the Send and Sync traits (see https://doc.rust-lang.org/nomicon/races.html).
Similar quality in a modern C codebase? No. Much better than in the past, yes, but not similar. Even if the quality _were_ similar, it'd be disproportionally more expensive to write.
I just left a job in which C code was being written from scratch, in 2016. The code was awful. Coverity didn't prevent it from being awful.
Some people take more care about writing safe & secure code today. Most only care just enough to switch to Java or Python in order not to care any more.
The big question for me is what will happen when those same developers writing insecure C and C++ code start writing rust? Will they find ways to subvert the safety checks? Will they even want to subject themselves to them in the first place?
I like go, I'd love to write libraries in it but as far as I can tell you can't really create a C compatible shared library from it.
That it still the common denominator if you want to call into it from other languages.
I'd love to write Python programs with performance critical stuff in Go
I do not contest on the opinion that Rust is a good language, but it slightly hurts me when people club C and C++ together. One can easily write correct by construction code using modern C++. Use of meta-programs allows you to create typesafe constructs. It provides you with zero cost abstractions to specify ownership of resources and ..... <I can go on> One has to just strive to not use the C baggage that comes with it.
> One has to just strive to not use the C baggage that comes with it.
This is why they get clubed together, regardless how much I like C++, I am yet to see the use of C baggage being successfully forbidden in enterprise teams, let alone if there are third party dependencies (which is always the case).
So far I have only seen modern, safe C++ being used successfully on a big project I was part of at CERN, where everyone on the team actually cared to write proper C++.
C++ is much better, but still not safe. Null still exists. Moves are runtime moves, so you can still attempt to access the value at compile time. Iterator invalidation is still an issue.
Of course, C++ might be "safe enough" for your use cases.
> Moves are runtime moves, so you can still attempt to access the value at compile time.
What do you mean by that ? Are you saying the standard decided to make it non-destructive moves ? Then yes, it is a possible scenario for error but also a performance advantage in some cases.
> What do you mean by that ? Are you saying the standard decided to make it non-destructive moves ?
You can still access a value after it has been moved out. use-after-move is allowed by the compiler. It places (stdlib) types in an unspecified but valid state (I've seen C++ code reusing these types and assuming that the state after move is something in particular -- it's not).
Most optimizations you can do by reusing a moved value could be done automatically by the compiler (by using the same stack space since it statically knows that it's been moved out).
Linear types are a pretty well known pattern and could have been used as a part of the design of moves. This would have the additional benefit of removing explicit move constructors from 99% of all types out there. This has not been done (and can't be now).
Edit: The pretty rare optimization potential of runtime moves is pretty much negated by the fact that common things like reallocing a vector can't be made into a memcpy for most types, just because they have nontrivial move ctors which wouldn't exist in the compile-time move scenario.
Use-after-move is allowed because of existence of value categories, if you use a return value from a function its creation is usually elided or it is moved, but you can't access that temporary without directly referencing it. When you move an existing object (by specifically saying std::move), what should happen with it? Destroying the object is not a solution because its variable might be still accessible in existing scope (something like dangling a reference in the middle of a scope), which imply use-after-move should be prohibited be language and all variables should have a "not-a-value" state and throw exceptions on use.
> When you move an existing object (by specifically saying std::move), what should happen with it?
You don't destroy it, you treat it as an actual "move", where the compiler will not consider the original variable as accessible after this point, and not allow accesses after the move. The memory it took up is free for reuse, and no destruction code is run on the side of the code doing the moving (in the case of a conditional move this gets a tiny bit more complex, but not too much). "It's variable will be still accessible in local scope" is exactly what I'm getting at; you can enforce at compile time that this isn't the case by simply disallowing access.
You shouldn't have local references active, just like how you shouldn't have local references to the contents of a vector when you push to it. You can already invalidate local references to a part of a struct when something gets moved in modern C++. Being wary of invalidating local references is an established concept in C++; this doesn't exacerbate that problem.
It's even better if you track scopes in references which gets rid of the dangling reference problem entirely but at this point you've reinvented Rust :p
To be clear, I'm talking of a completely different model that could have been used in place of the rvalue reference and move model. Making C++ have linear types now would be tough, but it could have been done before. There are different tradeoffs there, but I suspect it would have been safer.
> all variables should have a "not-a-value" state and throw exceptions on use.
You can make this a compile time error. Like I said, linear typing is a pretty well established pattern.
> Under Linux, some SECCOMP initialization and capability dances having to do with dropping root and closing off privilege-escalation attacks as soon as possible after startup.
I was under the impression that these specific things were actually quite hard to do in Go. I believe that both setuid/setgid and seccomp_load change the current OS thread (only), and since Go multiplexes across multiple threads and gives programmers very little control over which ones are used for what goroutines, I'm not sure how you would, for example, apply a seccomp context across all threads in a Go program. setuid/setgid are currently unsupported for this reason, with the best method being "start a subprocess and pass it file descriptors" (https://github.com/golang/go/issues/1435).
I'd be interested to hear if others have found ways to actually do this reliably for all OS threads underlying a running Go process.
I did this once, by writing a small C program which sets up the seccomp context before exec'ing the Go binary. Unfortunately Go's runtime makes a huge number of system calls in the background, and the whitelist kept growing.
Switched to Rust and there was only had one hidden system call left, getrandom used to initialize the hashmap
The leap second thing is a big issue for a server. OpenNTPD serves the wrong time on the day of the leap second. They don't propagate the leap second flag properly either. It's not correct and doesn't belong in a public server pool IMHO. This issue isn't just theoretical: it was responsible for a lot of the bad time being served by the NTP Pool last week.
You're right that the article I linked argues the leap second isn't a big problem to client machines. I guess I sort of believe it, if you don't care if your client is a second off from true time once every couple of years.
Why do they have to use a low level language ? Why not use NodeJS ? One argument would be time management, but even with C you can not be sure that no interrupts happens in between ...
Honestly, why not do it once and for all in a strongly typed pure functional language, validate it, and then tweak the GC parameters to get the performance? Use the safest and most powerful language that you can, if you can.
From my understanding by reading the OP, it's not really performance that's the problem, it's just that there are some (fortunately very small) time-critical sections in the NTP code that cannot tolerate a GC pause. If the language runtime supports disabling GC in critical sections (like Go does), then it's doable. If not, the language is likely automatically disqualified.
A more relevant point that would disqualify Haskell is the fact that its runtime (or rather, the GHC's runtime) is nowhere near predictable enough for the realtime guarantees that time synchronisation software would require.
A Haskell DSL that outputs some safer low-level code would be a more likely choice (for example http://hackage.haskell.org/package/atom), but Rust is both more popular and has more commercial support.
When opting for a DSL that compiles to low-level the "Haskell" part is less important.
I've heard the idea of DSLs over and over again, but who actually does that? I know of course of sed, awk, regex, etc but what part of NTPsec is narrow enough in scope and large enough in volume to justify creating a DSL? (just asking -- I'm not familiar with NTPsec).
I'm not an expert in this field, but it's my understanding that this is often the case for C programs executing nowadays. The AMD64 architecture even has the NX bit. There are some other ways that this can be enabled too.
But surprisingly, this doesn't solve all code execution security problems from buffer overflows. Sometimes exploiters can find non-executable memory to change that allows them to change the program behavior to do what they want, such as changing the command string that gets passed to a normal execve call later in the program.
But even if you solve the security issues, buffer overflows are still a huge headache. A one byte buffer overflow can cause your program to crash an hour later, with almost no hope of figuring out why it happened. A developer can easily spend weeks tracking down a single buffer overflow crash.
> Sometimes exploiters can find non-executable memory to change that allows them to change the program behavior to do what they want, such as changing the command string that gets passed to a normal execve call later in the program.
Changing function pointers or the return value on the stack are the normal things to do. In particular if you corrupt the stack you can simply return into system() with whatever arguments you want.
This wouldn't protect from bugs where you can maliciously trick a program from reading from other places in memory so it will leak passwords, keys etc. Think heartbleed
Why does 90% of this huge comments section revolve around Rust? Haven't we had enough of the same already? Yes, we know it provides memory safety guarantees. Yes we know you hate everything written in C with great passion. This has been made apparent and banged on our heads for the past several dog-years. Did anyone bother to look at the sheer amount of refactoring the author & team did? Did anyone realize how difficult this is and what they might learn from this tarrasquesque ordeal they went through? Oh no, apparently it is C and logic mandates that Rust supercedes it so why.the.hell.bother.
The huge discussion about Rust is because that's literally what this post is about. It's titled "Getting Past C" and most of the post is about considering Go or Rust as the future language for the project. And, not surprisingly, a lot of people think Rust is a great choice for this (and I agree).
The title is Rust-bait so let's all chant rust! Rust! RUST! Oh but there are only three tentative mentions of Rust in it and this entire comment section is corroded. Just compare the two pages in a browser with ctrl-f rust
I'm not sure where you're under the impression that the number of times something is explicitly mentioned corresponds directly to the number if times it's being discussed, which is what is implied by your suggestions to search for occurrences of the name. The whole article is about future plans for NTPsec, and the work going into making it so it can be converted to a new language later. The article is about the relative merits of different languages for this task, and Rust is one of the languages. Of course Rust is going to be talked about in the comments. The only odd thing here is that Go isn't talked about more in the comments, not that Rust is talked about a lot.
>> But NTPsec is a lot smaller and cleaner now at 62KLOC of C (that’s just 27% of the original size). It’s been brought up to pretty tight C99/ANSI standards conformance, and the few remaining platform dependencies are either already well isolated or can easily be made so.
Then they have a section about future plans and a short comparison of two possible languages. I'm surprised that you're surprised Go is virtually non-existent in this thread.
Hackernews comment sections are strictly for endlessly typing about Rust's memory safety features to seemingly no end. You will find this behavior under any article that mentions C or C++.
At this point I don't care about downvotes because the whole finding-valuable-comments-to-learn-from experience I've had in the past is getting harder and harder to attain to. So instead of bitching about that, I will just go upvote someone who is downvoted.
>Under Linux, some SECCOMP initialization and capability dances having to do with dropping root and closing off privilege-escalation attacks as soon as possible after startup.
Thanks, Eric. Now, FFS, learn how the Cathedral model WORKS, get OpenBSD, learn pledge(2) and stop spreading old FUD against C.
> One of the medium-term possibilities we’re seriously considering for NTPsec is moving the entire codebase out of C into a language with no buffer overruns
For a small one time fee of 1000 USD I can copy paste you 50-100 lines of C that provides an implementation of an array without buffer oveflows. Cheaper than switching to rust or you know, learning C properly.
Can you pass this buffer to e.g. read() and it will still guarantee no buffer overflow even if the programmer mis-calculate the size argument to read() ?
Sure, if the array index is limited to 32 or 16 or 8 bits I can also give it to you at no performance cost for heap allocated arrays (i.e. no bound check happen at runtime).
There is some memory overhead per array with this method though.
The fact that it doesn't occur to people to build safe abstractions in C to deal with things like buffer overflows still shocks me. C is fairly low level by todays standards. You 100% have to build abstractions using the standard library and then use those abstractions, rather than just using standard library functions everywhere.
This isn't an argument against safer languages, or higher level languages, or languages with built in bounds checking or anything else. This is more an argument against programming in a "close to the metal" language like C so thoughtlessly that buffer overflows are actually a serious issue.
Maybe you just need a pedantic mind. When first learning C, as soon as I figured out that writing "char buffer[BUFFER_LEN];" could cause your program to crash, I immediately set out to write a safe dynamic array implementation I could push chars onto it and I didn't have to worry. And I am by no means an expert C programmer.
Because the fact that you have to write that secure abstraction means the majority of people won't do it, and even if they do, the vast amounts of code that you'll interface with in C that doesn't expect it and will happily index out of bounds if you call it incorrectly means you'll always be fighting an uphill battle. I imagine many people do what you did and write abstractions, and then the more they end up dealing with external libraries, the more they realize they aren't buying themselves as much security as they think they are, and it isn't just a one-time up-front cost, any time they need to interface with code that doesn't use their implementation, they may have to make sure everything is sane afterwards.
> I could push chars onto it and I didn't have to worry
And how did you implement this? Are you stitching together chunks of memory, or are you reallocating and copying? If you are reallocating and copying, what do you do when you have a point to the old address space that still exists? Now instead of manually updating memory and knowing you have to take care of pointers, your dynamic implementation might change stuff from underneath you without you noticing.
Because the fact that you have to write that secure abstraction means the majority of people won't do it
What's your point? My only argument has been that it's very easy and achievable to avoid buffer overruns in C. The fact that you assert most people won't do it is completely orthogonal to that.
And how did you implement this? Are you stitching together chunks of memory, or are you reallocating and copying? If you are reallocating and copying, what do you do when you have a point to the old address space that still exists? Now instead of manually updating memory and knowing you have to take care of pointers, your dynamic implementation might change stuff from underneath you without you noticing.
I use the standard library function realloc. The dynamic array is written in a fairly standard way, IE, I am not returning pointers to the allocated memory of the internal array. I access values inside the dynamic array by value (eg they are copied), not as a raw pointer to a slice of memory that could be realloc'd. It would never even occur to me to do that, so your example seems strange and far fetched.
Realloc may automatically copy the range to a new memory block if the old block cannot be expanded, and when that happens the old range is freed. Any other pointers you had to items in that original array may become invalid every time realloc is called, and if it's automated by your dynamic array code, that could conceivably be any time you push an item onto that array.
This is the waterbed theory of complexity. You can push complexity down in one part, but that just causes it to pop up somewhere else. You can make array size management easier, but the cost is that when it needs to deal with array sizing it's abstracted away to the point where you can't be sure when it happens and when you need to fix problems it might cause, or you can deal with it up front and manually when needed and then when you are manually dealing with the size changes you should remember that the memory might be freed and you may need to deal with that. GC languages deal with this by having all the information to know exactly how to fix all the references needed, and Rust deals with it by requiring you to not have two references to the memory in that circumstance.
I am still perplexed why you think by using a dynamic array of chars for a buffer, it somehow involves me storing copies of the malloc'd memory anywhere. You can simply access data by copying it:
Note that it still prints 42 despite the allocated memory being freed. Line 7 copies the data at a specified index into x - x has no references to the malloc'd memory.
> I am still perplexed why you think by using a dynamic array of chars for a buffer
Because I'm not limiting the case to just core types. You don't only ever need arrays of chars, ints, doubles, floats, etc. Sometimes you need arrays of structs.
As a simplistic example, perhaps you have a large array of structs, and you want to iterate through them and add all the items that match your criteria to a shorter array of matches, which will be pointers to the real data. Adding a single item to the original array could cause realloc to invalidate every pointer in the array of matches. Of course there are ways around this, but someone starting work on the code might not necessarily expect that adding an item to an array would cause pointers elsewhere in the code to become invalid, unless they look at the implementation of your array code to understand what it's doing.
So basically, your argument boils down to "well what if the person who implements the dynamic array doesn't know C properly and provides an API that exposes the internal realloc'd memory?". Because if not, I have no idea what kind of insane dynamic array implementation you have in mind. There is no way you should be able to access pointers to the internal memory from the API of a dynamic array.
Here's how it works - if your array is of malloc'd structs, then when you access the element, you get back a malloc'd struct. At no point can you access the actual memory, only copies of the values contained at a given index.
So yes, I suppose if your bounds checked dynamic array was implemented by a complete novice or someone that doesn't know C, your hypothetical scenario could happen.
None of this changes the fact that it's easy for a competent C programmer (do I really need to specify this?) to completely avoid buffer overflows in his own code, which is the only thing I have argued.
The basic idea is that sometimes you might want a pointer to an array item, if that item is complex, not just a copy of it, as there's no need to be wasteful if it's a fairly large struct. Any pointers to that array might be invalidated if realloc is called on it. Knowing exactly when that happens means you can note that it might be invalid, and do something about it, but if it can happen any time you add items to the array, that means you need to check for whether it was reallocated every time, or assume it's always invalidated whenever you push an item on the array.
In the example here, I'm determining the struct with the smallest num field. As I keep allocating space to the array (which I'm doing explicitly here), I'm doing the incorrect thing, which is assuming I can continue to use the pointer, which may no longer be valid and just checking if any of the new items are smaller than the existing smallest. I should be recomputing from scratch. It's obvious when I'm calling realloc, but if I was just pushing new items to the array, it would not be obvious at what point it reallocated to a new location in memory unless I specifically checked.
What's happened in that case is we've traded the complexity of explicitly controlling memory allocation of arrays for the complexity of either not allowing pointers to array items or having to keep track of the array location with a separate pointer and checking that they are still the same prior to using any pointers to array items we've stored.
You're right, to an extent we are talking past each other. I totally understand how your code can cause a dangling pointer. But you are using a raw C array, which completely goes against what I've been poorly trying to explain in this thread. What I am advocating is something like this:
so TL;DR, if you don't want to deal with copying structs, malloc them then push them to the dynamic array, and deal with their pointers. Else if you push a struct value on the ray, return struct values.
> if you don't want to deal with copying structs, malloc them then push them to the dynamic array
So, manually manage their memory allocation, but allow dynamic allocation of the array of pointers? Sure, there are some cases where that's useful, but if you're already managing memory for the structs themselves, you can probably just manage the memory for the array at the same time.
> Else if you push a struct value on the ray, return struct values.
So, like I said, "not allowing pointers to array items".
You can do this, but you aren't just making array access a little safer, you're also restricting quite a bit of what you can do for efficiency. If I'm going to throw away the ability to use pointers for efficiency, why am I even using C in the first place? I should just write it in some other language from the start. Presumably I used C because there was a need for that efficiency.
So, manually manage their memory allocation, but allow dynamic allocation of the array of pointers?
Yes.
Sure, there are some cases where that's useful, but if you're already managing memory for the structs themselves, you can probably just manage the memory for the array at the same time.
... then you have memory bugs. As your example code clearly shows. What you're suggesting (exposing the internal backing array of a dynamic array) is completely unorthodox and fraught with potential bugs, and I doubt if rust even does this internally.
All I can suggest is that you look up how dynamic arrays are typically implemented in C. The technique I describe is almost universally followed. This is also what happens with std::vector in C++ - std::vector doesn't manage the memory of the elements themselves, just its internal backing array.
> All I can suggest is that you look up how dynamic arrays are typically implemented in C. The technique I describe is almost universally followed.
I think this gets at the crux of people's problem with the idea that you can just work around the problem of manually allocating memory. It's a bolt-on to the language, and the behavior is dependent on the implementation chosen, and it makes the behavior fundamentally different than "native" C arrays, to the point that it might cause problems.
> This is also what happens with std::vector in C++ - std::vector doesn't manage the memory of the elements themselves, just its internal backing array.
Yes, but there's also usage directions for std::vector that specifically state and make very clear what iterators/pointers are invalidated on what actions. Encountering someone's home-rolled array routines may or may not allow you to easily make the same deductions. Are the routines for dynamic arrays, or are they for doing system cleanup at the same time, or have they been combined? Are there comments noting the reason for what's being done, and that certain operations may invalidate pointers, or are you left to intuit that yourself?
These are the problems with having a non-core (and not even a popular implementation to fall back on) way to extend the language. C++ is a step up in that it at least standardizes a bunch of core types so you can learn those and carry your knowledge of how they work around to different projects in the language. C's lack of this means that every project may implement something like this - or not - in their own way, with subtle usage differences.
The main benefit you would get from Rust in a situation like this (ignoring that it would likely either be built in or readily available through a crate), is that on encountering some home-rolled system, you can look for where it uses unsafe to find any problematic behavior you need to be aware of, because otherwise you are fairly protected. Worst case, the whole home-rolled chunk of code is riddled with unsafe blocks, and you know it's definitely something you need to hunker down with to figure out what's going on (assuming you need to use it).
Rust's unsafe is effectively an enforced comment around dangerous code. Put that way, I'm not sure many C programmers would really object.
Its not a winnable debate. You are both different kinds of people. The prior an idealist, the latter a pragmatist. Both philosophies are good, both are correct. Before both of you continue your debate, you should both recognize this difference.
Is this abstraction zero-cost? What's the overhead?
Can you give me a similar set of primitives to manipulate memory in a temporally safe manner as well? What is the cost of that abstraction? How does it compare to a runtime's GC?
I never made the argument that you shouldn't use a safe GC'd language. I merely made the argument (now flagged for whatever reason) that buffer overflows are an easily solvable issue in C, and if you are having issues with them you really need up your game and learn to create abstractions.
As for the cost of the abstraction of bound checked arrays of arbitrary length, I can't imagine it being any slower than rust. I could be wrong. If you'd care to provide an example program in whatever language you are advocating, I can give you an implementation in C using a bounds checked array ADT, and we can compare notes.
Obviously, if you have to check an index length you're going to be doing a branch. However, because the index checks are intrinsic to the Rust compiler, it can remove them when it proves the code is safe. So, for instance, an iteration loop over an array won't have any index checks in the generated machine code.
- being intrinsic is absolutely not required for the checks to be removed. Bounds checks are just a normal branch with a fairly straight-forward condition, and the always-true nature of those conditions is inferred in the same way for both the built-in indexing and non-built-in indexing. This is true in languages other than Rust.
An iterator is designed to just not do any indexing at all (neither using the built-in [] operator or one of the functions that implements manual bounds checks), because it instead just manually (unsafely) walks a pointer along the array.
> An iterator is designed to just not do any indexing at all (neither using the built-in [] operator or one of the functions that implements manual bounds checks), because it instead just manually (unsafely) walks a pointer along the array.
It obviously still has to bound-check that it's not about to walk right out of the array. If you want to call this something other than a "bound-check", I think that's being overly pedantic.
Yes, it checks that it's reached the end of the array as part of the loop, in the same place that `for (int i = 0; i < n; i++)` checks whether it's reached the end of the loop. I think this is different to the indexing bounds checks we've been discussing since the whole process is more controlled (compare and increment a pointer), rather than taking an arbitrary integer.
But yes, strictly speaking you're right, an iterator does do check when it's reached the end of the array, it just doesn't do any indexing nor does it use the indexing checks built into the compiler (which is what you implied the iterators benefit from).
> because the index checks are intrinsic to the Rust compiler
This is false. The index checks are written in Rust code as an impl of the Index trait on [T].
The checks being removed have nothing to do with this -- LLVM can prove that certain checks are unnecessary. C does the same, if you used a library that provided checked indexing.
What's different is that in Rust indexing is overall used much less often, because iterators are the dominant pattern, which completely sidesteps this pattern.
> Is this abstraction zero-cost? What's the overhead?
It will cost a single correctly predicted branch, which is effectively free on modern architectures. Any "safe" language will have to make this conditional branch too (Rust's File::read method will check the size of the slice).
Completely agree. The same is true at a slightly higher level in C++. All of the "dangling reference" problems can be avoided by using a value-based collection library and not creating references. Then upgrade to a unique or shared pointer when copying values is too expensive.
It then follows that reading things safely is therefore a completely insurmountable problem in C. There is no possible way for me thin wrapper around read that works on bounds checked arrays, and not use "read" cavalierly in my code without thinking. Right?
> It's scary how unfamiliar C programmers are with the rules of the language they're using... it's very difficult to write correct / secure C code without undefined behavior even when you know the rules.
[...]
> It's not feasible to avoid undefined behavior at scale in C or C++ projects. It's simply infeasible. They are not usable as safe tools without using a very constrained dialect of the languages where nearly all real world code would be treated as invalid, with annotations required to prove things to the compiler and communicate information about APIs to it.
Not an argument, or even a point. I don't know why this is so difficult for people to understand. Buffer overflows are easy to avoid in C.
I think I recognise your name from some rather aggressive rust advocacy in another thread, so I'll try and break this down in a way that won't trigger you:
- Buffer overflows are trivial to avoid in Rust, AFAIK. I acknowledge this
- Buffer overflows are very easy to code in C, and have occurred many times in the wild. Again, I acknowledge this.
- My argument is - buffer overflows are easily preventable in C if you provide your own thin abstractions. The fact that people don't take these steps doesn't mean that these steps cannot be taken. Take the rust compiler itself: AFAIK it's now implemented in C++, an 'unsafe' language, and YET it manages to provide in abstraction in the form of a language that makes buffer overflows all but impossible. Do you understand now how it's possible create a safe abstractions in an unsafe language?
> Do you understand now how it's possible create a safe abstractions in an unsafe language?
Who cares? What's "possible" is completely besides the point. What matters what is done. And so, yes, you can create safe abstractions in C. But if you need to interoperate with someone else's C code, you have to deal with how your abstractions interact with someone else's, or someone else's lack of them. The point of using a language like Rust or Go or... pretty much any non-C language with lots of traction is that you don't have to provide "your own thin abstractions." You use the same one as everyone else. The point of Rust or Go isn't to make things easier on you. It's to protect you from every other piece of code you interact with. The standard library. Libraries written in the language. Other people working on the same project as you. Yourself five weeks ago. You all use the SAME safe abstraction. Nobody has to roll their own. There aren't 25 safe abstractions for 20 people working on the project. You don't have to reason out or guess whether or not some library (or even the standard library) is doing the right thing to avoid buffer overrun. You don't need to defend your entire codebase against someone else's code not doing the right thing.
I advocate all memory safe systems programming languages, going all the way back to ESPOL on Burroughs B5500, in 1961!
Because I am old enough to remember when C was only relevant to UNIX users and those of us that care about quality code were able to enjoy much better options.
It is not possible, because one seldom works alone, so regardless of whatever laboratory attempts to write safe C code, they all fall down when the team reaches a size of two, or dependencies to binary libraries are required.
C++ is also not free from this. Regardless of the language features we are able to use, there are always a couple of guys that code it like C, thus making an hole under the castle walls.
As for Rust, as any compiler writer knows the implementation language has very little to do with the language being compiled. If LLVM was done in Java, as Chris initially thought of doing, Rust would be using a Java compiler instead.
Uhm, in the context of native languages, without further qualification, it is clear that "compiler" is a program that translates said language into native code, i.e. assembly. To dismiss the translation of llvm IR into assembly as "codegen" is at best using a misleading definition of compiler.
The only fair thing to say at present is that Rust's main/most useful/most performant/most whatever implementation is a mixture of Rust and C++.
> Take the rust compiler itself: AFAIK it's now implemented in C++, an 'unsafe' language, and YET it manages to provide in abstraction in the form of a language that makes buffer overflows all but impossible. Do you understand now how it's possible create a safe abstractions in an unsafe language?
Regardless of this being wrong, it also doesn't prove your point. The compiler and the built code are two separate executables. Your "C plus abstractions" language still allows "C without abstractions" within the same executable. And that's the dangerous part, because now you're talking about audits to ensure that only your "C plus abstractions" language is the one being used.
obviously decades of security advisories means it's not easy. what in the world does 'easy' mean in your current usage? surely "tons of failures" is a better indicator of difficulty than "it feels easy to me / that's 'naturally how my brain works'," etc.
and of course, if lots of people fail at something, it will get proportionally harder with a larger team / larger codebase / different stakeholders demanding xyz, etc etc etc.
not chiming in about the basic disagreements in this thread, but the repeated claim that something that has caused trouble for countless professionals is "easy" just can't be right. if all the qualified people are "unqualified" because they all fail at something "everyone can do," the speaker is the one who's confused, and it's a lexical problem.
I never said everyone could do it. But a competent C programmer definitely can. I suppose I am an elitist in the sense that I don't think novice or careless programmers should be writing production code.
I mean people manage to do a lot of dangerous things due to sloppiness - cause car accidents, for example. I don't think it's hard to avoid that, and I don't think those people should be driving. I suppose others think we need wider roads and bumper cars.
That said, there are actually drawbacks of Rust compared with Go, IMHO. When facing a moderately large project written by others, the ergonomics for diving into the project is not as smooth as Go. There is no good full-source code indexer like cscope/GNU Global/Guru for symbol navigation across multiple dependent projects. Full text searching with grep/ack does not fill the gap well either since many symbols, with their different scopes/paths, are allowed to have the same identifier without explicitly specifying the full path. That makes troubleshooting/tracing a large, unfamiliar codebase quite daunting compared with Go.