Getting Past C

e3b0c · on Jan 4, 2017

Rust has some very desirable properties to me. Writing Rust programs from scratch is not as scary as I've heard of from the internet either. The documentation is excellent, the compiler diagnostic messages are very helpful and the notorious borrow checker didn't stand in my way that much. And I love Cargo and Cargo.io. I have some projects where Rust is the saner choice than Go or other GC based languages.

That said, there are actually drawbacks of Rust compared with Go, IMHO. When facing a moderately large project written by others, the ergonomics for diving into the project is not as smooth as Go. There is no good full-source code indexer like cscope/GNU Global/Guru for symbol navigation across multiple dependent projects. Full text searching with grep/ack does not fill the gap well either since many symbols, with their different scopes/paths, are allowed to have the same identifier without explicitly specifying the full path. That makes troubleshooting/tracing a large, unfamiliar codebase quite daunting compared with Go.

xorxornop · on Jan 4, 2017

Hmm, I've had a very nice experience using Rusty Code in VS Code. Some useful refactoring functionality is missing for sure, but a lot of that will become possible quite shortly from RLS (Rust Language Server, a la how TypeScript works in VSC), and if your preferred editor has support for the language server spec (it's a open source common spec, not specific to Rust), it will support it at parity, too.

Manishearth · on Jan 4, 2017

YouCompleteMe on Rust has pretty good JumpToDefinition support for Rust.

You can also use https://github.com/nrc/rust-dxr to index rust code via DXR.

IIRC ctags also works with Rust.

RLS should cover this pretty well too once it happens.

wocram · on Jan 4, 2017

I haven't had too many issues with Intellij-rust or racer failing to jump between symbols.

There are also many other tools that provide indexing, eg. ide [plugins], kythe, and the rust language server.

steveklabnik · on Jan 4, 2017

RLS will hopefully full that gap.

rattray · on Jan 4, 2017

anuragsoni · on Jan 4, 2017

it stands for Rust Language Server

Github project: https://github.com/jonathandturner/rls

Announcement: https://internals.rust-lang.org/t/introducing-rust-language-...

leshow · on Jan 4, 2017

crates.io

dreta · on Jan 3, 2017

Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening? I do agree that C has issues (though in my opinion neighter Rust nor Go address almost any of them) i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.

MaulingMonkey · on Jan 3, 2017

> Can anybody make a strong case to me as to why are buffer overflows considered an issue in C when it takes like 10 minutes to write and test an array implementation that prevents that from ever happening?

The CVE database. Just because you 'can' write such an array implementation doesn't mean you will, doesn't mean your third party libs will, doesn't mean any of your legacy code uses it, and certainly doesn't mean you will properly test said array implementation correctly.

The number of mitigations added to C compilers and OSes dealing mostly with C and C++ code. ASLR, W^X, /GS, -fstack-protector-all, AddressSanitizer, ... - note the lack of similar tools, or demand for them, for, say, JavaScript - despite it enjoying a similar ubiquity.

I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.

Now, e.g. Rust will have the same problems when interacting with C APIs - and nontrivial programs will end up doing so eventually. However, by virtue of the language itself embracing safe-by-default, you're less likely to run into the same problems when consuming Rust APIs.

You can also use third party static analysis tools to ensure you're using a "safe C subset" (such as MIRSA C), but "nobody" does that.

Nacraile · on Jan 4, 2017

> I ask this in bad faith: I encourage you to share a single nontrivial codebase which actually creates the abstraction you've described and religiously adheres to using it throughout. As to why this is in bad faith: I'm definining "nontrivial" here to mean using 3rd party APIs - which will operate on C style arrays, not your project specific safe wrappers - and thus by definition won't be "religiously" sticking to said abstractions when using said APIs. By these definitions, the codebase I'm asking for doesn't exist - by definition. Even relaxing the "third party" rule, I haven't actually worked on a nontrivial C or C++ codebase without buffer overflow problems.

I work on a C codebase that does this, although in the slightly weaker sense that it does drop the abstractions at a few isolated interaction points with external APIs (think openssl, linux system calls, and not a whole lot else). Yes, there is quite a lot of NIH. With essentially-uniform use of checked data structures, and an extremely comprehensive suite of automated tests getting run under ASAN (originally Valgrind), memory safety errors almost never get so far as being committed to the main branch. This is a complex, >1M SLOC distributed system that has seen several years of production use at this point, and as far as I can recall we have not seen a single memory safety related issue in production (a few have managed to to get as far as certification testing). General resouce-leak class issues have struck a few times, but are also pretty rare.

Proprietary, naturally, so I can't actually show you (sorry), but it absolutely can be done in practice. It isn't even really all that difficult, it just needs to be done from the start, and then you just need a bit of discipline to keep it up.

nxtrafalgar · on Jan 4, 2017

>I work on a C codebase that does this

And more power to you. Note the beginning of the parent comment, however:

> Just because you 'can' write such an array implementation doesn't mean you will

So yes, even if the codebase you work on does have these 'mythical', hard-to-achieve properties, that doesn't mean that most or even many C codebases will.

Good engineering entails observing what problems actually occur and working to fix those. Memory safety issues do commonly occur in C codebases. Regardless of whether the fix in C is simple or even trivial, programmers aren't doing it. So, Rust has some value because it forces the programmer to produce code that is largely free from this type of issue.

Enforcing norms like 'be more disciplined when writing C' or 'stop using external libraries' is much harder than simply using a different language.

kqr · on Jan 4, 2017

> I work on a C codebase that does this […]. Yes, there is quite a lot of NIH. With essentially-uniform use of checked data structures, and an extremely comprehensive suite of automated tests getting run under ASAN (originally Valgrind) […]. This is a complex, >1M SLOC distributed system that has seen several years of production use at this point […].

>

> […] it just needs to be done from the start, and then you just need a bit of discipline to keep it up.

Here's a neat idea: wouldn't it be cool and save a lot of time if the compiler did this for you automatically, from the start?

Of course, you say it's easy to do it manually, but something tells you your company might have needed to pay less for development if the compiler did it automatically with no human intervention required.

Nacraile · on Jan 4, 2017

> Here's a neat idea: wouldn't it be cool and save a lot of time if the compiler did this for you automatically, from the start?

Yes, but when the project started the only existing compiler that met all requirements was C (also C++, although that was not chosen, by reasoning I disagree with). We are in a domain where we derive material benefits from the low-level control C gives us (we have a bunch of highly specialized memory management and I/O), and are not willing to accept GC pauses. There's a common sentiment that we would have used Rust if it had existed when we started, but it didn't so we didn't and so it goes.

ue_ · on Jan 4, 2017

But Rust still won't warn about an out of bounds access (when accessing using a variable) at compile time, and your code will panic at runtime. This isn't the "safety" anyone ought to be expecting from a language billed incessantly as safe.

Rust, at least in this regard and probably others too, is no better than C, and for me it isn't enough to justify the horrible and complex syntax.

jjnoakes · on Jan 4, 2017

What do you expect a language to do if you index an array using a runtime-calculated value?

Rust checks at runtime and panics if your program exceeds the bounds. You can opt-in to asking if the bounds are exceeded and fail gracefully if you like, or if you want to promise the compiler you know for sure your bounds are tight, you can use unsafe blocks and act like C. Opt-in to danger.

C lets you do it with no checks. You have to opt in to the safe path of checking and failing. You don't automatically segfault if you exceed the bounds; instead you read arbitrary memory. Welcome to the land of undefined behavior. You may crash, but more likely, you will read some value from an unexpected place, and carry on executing incorrectly for who knows how long. Opt-in to safety.

That's what people mean by Rust is safe by default. And that's just bounds checks. Carry that notion over to pointers, references, threads, lifetimes, ...

What ever made you think "Safe Rust" meant "compile time checks of runtime values are possible" or "C is just as safe because it lets you index outside an array"?

MaulingMonkey · on Jan 4, 2017

> What do you expect a language to do if you index an array using a runtime-calculated value?

Optimizers already try to prove index bounds to eliminate unnecessary checks, and static analysis tools to demand necessary checks. Turning the latter into compile time errors is a reasonable approach if your language can provide sufficient information to deal with false positives - likely by forcing you to add your own bounds checking to explicitly handle out-of-bounds cases.

A language John Carmack was using or researching at one point comes to mind, which had this kind of thing going on IIRC. I'm afraid I can't find it off hand, so I might recall in error.

jjnoakes · on Jan 4, 2017

Sure, but this can be the same in Rust or C, which was the context of this discussion. New languages may provide more tools to eliminate more checks, but in general, if you have a runtime value and you index an array with it, someone has to check somewhere.

And the point of this thread is that Rust's default (check index bounds and fail at runtime unless the check can be proven unnecessary by the compiler or optimizer) is safer than C's default (don't check anything by default and hit UB if the index is out of bounds).

ue_ · on Jan 4, 2017

>You don't automatically segfault if you exceed the bounds; instead you read arbitrary memory.

GCC and clang both have sanitizers either built in or available for them. Sure, it's not default, but let's not act like there is no choice in C but to account for every OOB access while programming or to read memory you don't want to.

Furthermore, I never said that compile time checks of variables are possible, but rather we could move to using dependent typing, or at least a way to judge whether a variable would work as a subscript based on the type of the array and variable.

The Rust designers didn't do that. Instead they put in a feature common in the two most popular C compilers to "panic" at runtime instead of accessing memory. That's nothing. It's rubbish. And if you know to "catch" the panic, why don't you check the value of what you're subscripting with? Saying you can catch the panic is missing the point of unintentional OOB accesses, which is that they're unintentional.

C is just as safe with regard to OOB accessing, and to be honest that's pretty poor in 2017.

jjnoakes · on Jan 4, 2017

> let's not act like there is no choice in C

If you are using gcc or clang, you have more options. True. But not all C compilers give you those options. However, the point is moot, since I never said you can't catch these things in C; I said it wasn't the default. Which you agree with.

> I never said that compile time checks of variables are possible, but rather we could move to using dependent typing

You didn't say anything about dependent typing. You said "Rust is no better than C". And I'm pointing out that it is. Dependent typing may be even better in some cases; I'm not arguing otherwise.

> Saying you can catch the panic is missing the point of unintentional OOB accesses, which is that they're unintentional.

No one said you should catch the panic. You can use Vec::get() for example if you are using runtime-derived indices and want bounds checking in an ergonomic fashion.

And saying a panic for unintentional OOB is the same as in C is not true, since you get a panic by default in Rust, and to get one in C, not only must you be using a specific compiler or two, you must have the sanitizers enabled for every source file in your program. Not "by default" by any stretch.

> C is just as safe with regard to OOB accessing, and to be honest that's pretty poor in 2017.

It is nowhere close, and saying it is is pretty poor in 2017 as well.

And you are still ignoring pointers, references, lifetimes, threads, ... you know, the other things that also help Rust make "Safe by default" and C "dangerous by default".

martinhath · on Jan 4, 2017

If you are not sure whether the index is in bounds or not, you could use `slice::get`, which returns an `Option<&T>`. Sure, it's a little more typing compared to `[i]`, but how would one solve this in any language?

https://doc.rust-lang.org/std/primitive.slice.html#method.ge...

leshow · on Jan 4, 2017

Do you know what a panic is?

C will actually read arbitrary memory, Rust won't, that's the difference.

We're talking about a situation where the length of the array is not statically known and you access it out of bounds at runtime. Rust checks first if it's out of bounds, and if it is, DOES NOT blindly read the memory anyway (as C would) but exits.

I don't understand how you could think the two situations are at all equivalent.

pjmlp · on Jan 4, 2017

Safety has to do with memory corruption that lead to security exploits and unrecoverable data, not panics.

MaulingMonkey · on Jan 4, 2017

Panics can be a vector for denial-of-service attacks. OTOH that's still better than remote-code-execution, and unwinding panics can be caught. OTOH I see failing to do so across a FFI boundary is UB, which could get back into RCE territory through sufficiently convoluted shenanigans...

bmh100 · on Jan 4, 2017

Would you care to point to or write a blog about the principles of this strategy? It sounds immensely useful to the many C developers out there as a potential set of best practices to adopt.

phkahler · on Jan 4, 2017

I was thinking about mentioning MISRAble C. Outside the infotainment system, you should never have a buffer overrun on any micro in your car. Part of that comes from a ban on the use of malloc(). There is still a risk of array bounds problems, but those seem easier to avoid and more likely to be caught in testing.

Why no malloc? It was originally due to the small memory sizes for code and data. The less standard library the better, and dynamic allocation may lead to heap fragmentation and a subsequent crash when malloc fails.

scott_karana · on Jan 4, 2017

Hate to tell you, but JavaScript implementations virtually all rely on C or C++ as well.

And it's not limited to the VM itself: check out npm "native extensions" like `json`. Not to mention glibc, or the OSes themselves.

By your definition, nothing is safe. And you're right ;)

MaulingMonkey · on Jan 4, 2017

I'm aware.

I note that Firefox is using some Rust code now - so perhaps that will change at some point, for at least one of the common JavaScript implementations, in the not too distant future. I don't imagine we'll see it for the majority within the decade - but who knows, maybe I'll be pleasantly surprised.

I have less hope for the widespread adoption of OS kernels written in safer languages - given the general unwillingness to even use C++ there (although plenty of toy/'research' kernels in safer languages do exist.) Although maybe we'll see one within the next century? Perhaps a microkernel for use in containers?

Of course, that still leaves bugs in the JITs, compilers, hardware, 'legacy' native interop, unsafe{} blocks, ...

rat87 · on Jan 4, 2017

> I note that Firefox is using some Rust code now - so perhaps that will change at some point, for at least one of the common JavaScript implementations, in the not too distant future. I don't imagine we'll see it for the majority within the decade - but who knows, maybe I'll be pleasantly surprised.

Maybe

I think currently there's no plan for it. Maybe after they finish servo to the degree where it supports all modern html features

pjmlp · on Jan 4, 2017

Historical accident, given that the better alternatives, for various reasons lost their market share to C and C++, while Sun and Microsoft had the dumb idea of not supporting the same AOT compilation to native code on their new languages from day one.

MichaelBurge · on Jan 4, 2017

That's the old "formal proofs are useless because the proof checker might have a bug" argument. It's not very persuasive.

scott_karana · on Jan 4, 2017

It's MaulingMonkey's argument ad absurdum is what is is :P

agumonkey · on Jan 4, 2017

It's crazy that it's not solved above the language level, if people really want zero cost abstraction and architecture friendliness at least tooling should check buffer logic and flag the binary in case Warnings have been ignored.

dhsjxhx · on Jan 4, 2017

>if people really want zero cost abstraction

That one "if" is (by definition) not zero-cost.

steveklabnik · on Jan 4, 2017

It is by definition a "zero-cost abstraction." Let's ask Stroustrup, who coined the term:

> C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.

Two points:

What you don't use, you don't pay for: if you don't use array indexing, you won't get a bounds check. In addition, you can call an access method without a bounds check as well, so it truly is only if you use the checked version.

What you do use, you couldn't hand-code any better: that bounds check is written the exact same way you'd write it in C.

Therefore, this is a zero-cost abstraction.

aduffy · on Jan 4, 2017

This is actually a very helpful comment. I used to think "zero-cost" meant "at compile-time", as in `newtype` in Haskell, etc. I'm guessing that's what the parent commenter thought as well, and I'd guess is what most people think when they hear the phrase.

gcatlin · on Jan 4, 2017

I think that's why Stroustrup says "zero overhead" instead of "zero cost". There are costs to many of these abstractions; some at compile time and some at run time. For me, "zero overhead" conveys this a little better.

steveklabnik · on Jan 4, 2017

Thanks! It can definitely be a bit unintuitive at first. After all, everything has _some_ cost...

kqr · on Jan 4, 2017

Well, you can still sort of view it that way. You can imagine the bounds check being a "compile time generation of the C code you'd write to check the bounds anyway".

aduffy · on Jan 5, 2017

The difference is that it's not dealt with entirely in the compile phase. i.e. language features that are checked at compilation and known to be true that are not needed at runtime.

The Haskell `newtype` example I gave was meant to illustrate this, as newtype's are respected by the type system and then are treated as the underlying type at runtime.

Veedrac · on Jan 4, 2017

This is a misrepresentation of his comment. By your interpretation, you could call GC zero-cost!

Most code doesn't use bounds checking, because the branch is a safety net you should never hit, even in theory. Any code that does hit it is already broken. Correct programs using bounds checked indexing will in general be slower than but equivalent to a program where indexing instead results in undefined behaviour.

steveklabnik · on Jan 4, 2017

Most GCs would violate the "What you don't use, you don't pay for". That is, they add runtime cost (and "the size of the runtime" size) to code, even code that doesn't allocate.

"You couldn't hand-code any better", well, I won't argue on that point, as it sounds contentious. ;)

_Should_ never hit is very different than will never hit...

Veedrac · on Jan 4, 2017

If you never allocate, there's nothing stopping the compiler from optimizing the GC out. Then you get your first property back, in the sense you originally gave.

My point is that Bjarne Stroustrup wasn't comparing against writing the exact same program the exact same way. He was comparing against what you'd get if you dropped down to ye olde C or Assembly and wrote the same algorithm there, without redundant work or waste.

The comparison shouldn't be the language's GC versus SteveGC, it should be the language's GC versus an ideal, manually implemented allocator. Equally it shouldn't be built-in bounds checked indexing versus manual bounds checked indexing, it should be built-in bounds checked indexing versus an ideal, manually implemented indexing scheme. If you want safety against out-of-bounds, it seems to me the ideal method would be a proof, not runtime overhead.

steveklabnik · on Jan 4, 2017

> there's nothing stopping the compiler from optimizing the GC out.

I don't know of a single language that comes with a GC that does this, do you?

> He was comparing against what you'd get if you dropped down to ye olde C or Assembly and wrote the same algorithm there, without redundant work or waste.

Right. I agree with this.

But basically, we are arguing over an extremely fine semantic, which is "should you even want bounds checks in the first place." If you don't, then don't use a method that has bounds checks. The one that does will have them. They'll both cost the exact same as writing it in C or assembly.

Veedrac · on Jan 4, 2017

I'm not arguing about whether bounds checks are good, but about whether it's correct to call them zero-cost in the canonical sense. Doing so just devalues the term, and IMO feels like misdirection.

To put it another way, let's say I was a C++ developer on the fence about Rust. If I read this conversation, I'd see that indexing gets called "zero-cost" despite the overhead. Since tons of things in Rust are "zero-cost", like traits, closures, borrowing, etc., all of those things now have doubt cast on them. How can I really trust that these things are actually getting compiled efficiently?

If instead the conversation pointed out that this was one of a few cases where safety took priority over truly being zero cost, but that there were tools in place to mitigate the cost (iterators, unsafe indexing, LLVM), I'd have a much more positive outlook that focussed on what Rust did right.

steveklabnik · on Jan 4, 2017

I would suspect that a C++ programmer would be more familiar with that actual definition I mentioned originally, and so would understand the subtleties here.

That said, I can appreciate focusing on other things when talking about the principle; I only brought them up here because we were literally discussing them. I think there's much better examples when actually attemping to convince someone.

dbaupp · on Jan 4, 2017

I think you have a good point. It makes more sense to think and talk about as Rust's implementation of bounds checked indexing syntax as being a zero cost abstraction for writing `if` statements everywhere. I.e. Assuming you want bounds checking, you can't do better than what Rust does despite having nice syntax etc.

pjmlp · on Jan 4, 2017

> I don't know of a single language that comes with a GC that does this, do you?

Java, .NET, Go, ML and Lisp compilers.

Escape analysis allows to do that, even if just in certain special cases.

Plus the more one uses value types and less heap, the GC needs to work less, specially if we take languages like Modula-3 into this mix.

steveklabnik · on Jan 4, 2017

This is more than escape analysis, this is removing the GC and associated runtime bits entirely.

Escape analysis may not use the GC for those variables, but it is still a pervasive runtime cost in both senses unless it's totally gone.

agumonkey · on Jan 4, 2017

IIUC there's always some tagging or bit depth loss due to the use of GC ?

pjmlp · on Jan 4, 2017

Yes, but making it run less also helps reducing the cost.

Veedrac · on Jan 4, 2017

If you never allocate, escape analysis doesn't have anything to do.

verdax_1 · on Jan 4, 2017

I think something like this actually happened with one of the first scheme implementations, but that was because they wrote the GC in scheme so they had to do it that way for a very small subset of the system (is the GC code). But yeah other than that I can't think of any examples (and my example is kind of the exception that proves the rule).

wangchow · on Jan 4, 2017

Honestly, we need a few AI coders to replace most of the developers in the world and then this won't be an issue. Bounds checking arrays and calloc instead of malloc isn't rocket science. It's a simple formula.

The problem isn't the language it's the developers.

BHSPitMonkey · on Jan 4, 2017

Considering that a change in language completely solves the problem, it's hard to get on board with your thesis.

asveikau · on Jan 4, 2017

A change in language does not completely solve the problem. Heartbleed was caused by buffer re-use without zeroing in between uses. A high performance network application could very easily do the same in another language.

See: http://www.tedunangst.com/flak/post/heartbleed-in-rust

BHSPitMonkey · on Jan 4, 2017

That's a different "problem", not to mention a flaw in the application code's design rather than an example of one of the language's building blocks being fundamentally able to allow the entire execution path to be subverted.

asveikau · on Jan 4, 2017

OK, so in C you can smash the stack and that's bad. True. But I think you fail to see the larger point I'm attempting to evoke: that array bounds are artificial in the first place, and this doesn't just surface in C.

The "heartbleed in rust" example is a great one, and it arises in real life in many high level language APIs for file I/O and sockets. You have an allocation, and you have a count of available bytes coming back from a read() function which may be lower than the allocation size. So you are creating a "virtual" array bound from nothingness. Fail to respect it (without bounds checks) and you will see bugs.

If you reject that this is a valid way to write code, maybe in your API every read() style function will always return the correct size enforced by your JVM or whatever, but you will do too many allocations and over-tax the GC.

If you accept that this makes sense, then you must embrace a more C style way of thinking, where array bounds are created and destroyed at will and must be enforced through your own actions... And suddenly you see the other side of this coin, which reflects valid and true things about the universe, that you may want to chop up a buffer into multiple pieces - and that's OK.

(Now, I wouldn't be surprised if Rust has mechanisms to chop up arrays in the way I describe and enforce the bounds you provide it... Which would be handy. But frankly does not completely destroy the validity of the C approach or substitute for a proper understanding of it. Without that understanding, you will code more heartbleeds.)

Manishearth · on Jan 4, 2017

> I wouldn't be surprised if Rust has mechanisms to chop up arrays in the way I describe and enforce the bounds you provide it

[T]::split_at is probably what you're looking for.

Almost all array handling in Rust is done through slice types which are tagged with sizes.

wildmusings · on Jan 4, 2017

His point is that you're not forced to do that. And anyhow, that doesn't solve the issue since you can bungle the creation of the slice with the wrong offset or length.

Manishearth · on Jan 4, 2017

I wasn't refuting their point. I was just pointing this out.

However, you can't bungle the creation of a slice in rust without using explicitly marked unsafe code.

wildmusings · on Jan 4, 2017

Not bungle in the sense of overflowing the underlying buffer, but overflowing the logical buffer that is contained within it, i.e. getting the wrong slice.

Manishearth · on Jan 4, 2017

Oh, sure. Like I said, I wasn't refuting their point.

wangchow · on Jan 4, 2017

My point is, a lot of people are spending time on this when it doesn't matter. In the limit that AI starts replacing human developers these subtle differences in language approaches zero.

New languages here and there every day. Replace this replace that. When, in the end everyone is simply reinventing the "wheel" over-and-over.

All these languages end up as assembly.

ketralnis · on Jan 4, 2017

This sounds a lot like "why clean up my room when the heat death of the universe is coming anyway?", but if you do think that AI is going to supplant all of programming then be the change you want to see in the world! Get building and we'll see which one happens first.

I honestly don't know who I'd put my money on between "AI takes over the world" and "programmers stop writing buffer overflows"

verdax_1 · on Jan 4, 2017

My guess is that if they make a general purpose programming AI then all other jobs will also be nonexistent besides being famous and doing YouTube reviews of movies. My thought is that the problems in the way of AI programming are more difficult and can be generalised to enough other jobs that programming is going to be the last job automated.

FreeFull · on Jan 6, 2017

AI takes over the world for 5 minutes, and then promptly crashes due to a buffer overflow.

Retra · on Jan 4, 2017

People are spending time on it because such a capable AI doesn't exist and language design problems do. If you changed either of those things, you'd be making a strong argument.

sidlls · on Jan 4, 2017

That is far too simplistic a description of the work required to have safe code--in any language.

dbaupp · on Jan 3, 2017

The lack of generics means your array implementation is either going to either:

- be implemented with macros and token pasting, and result in a ton of mental overhead because you'll have a pile of types like array_foo for an array of `foo`s, and array_bar for an array of `bar`s, along with a pile of corresponding `foo * array_foo_get(array_foo, size_t)` and `bar * array_bar_get(array_bar, size_t)` functions.

- or, have a runtime cost and lose type safety by storing void* and casting when accessing.

The first case is even worse than it sounds: e.g. I don't know how you handle arrays of types with spaces in them (like `unsigned char`, or `struct bar`) with a macro. And, we haven't even thought about const correctness yet, which would probably require having const_array_foo, const_array_bar (etc.) types defined too.

(And, of course, these only solve one facet of the problems with C's pointers: there's no way to defend against use-after-free or dangling pointers.)

lacampbell · on Jan 4, 2017

C macros-faking-generics really aren't that bad (your "unsigned char" case is really easy to solve - use another macro). It's a bit goofy having different types floating around like "array_of_int", "array_of_float", but you can create them in a single line when needed, and once you create them, they work, and efficiently.

They're not an ideal solution by any stretch, but it's not the nightmare scenario you envision wrt generic data structures in C.

dbaupp · on Jan 4, 2017

I would be interested in the specifics of the spaces thing (knowing more about ones tools is good) but "10 minutes" to handle all the various issues and bugs that will inevitably pop up from wrangling macros?

lacampbell · on Jan 4, 2017

Yeah 10 minutes is quite a bit off, at least for me. The various issues and bugs that arose all came about when developing the data structures themselves - they didn't crop up in actual usage. Though my approach was to use a separate header and source file for each new data type I parameterised them by. So I had an "int_array.c" and an "int_array.h". And inside the header would just be function declarations generated by the macro. I basically didn't like the idea of dumping in a whole implementation of a data structure and all its operations with a macro every single time I wanted to use it. YMMV but I found it worked well.

dbaupp · on Jan 4, 2017

That approach of a single invocation was what I would personally assume would be done (or put it in the header that defines the data type, for a custom one), but I'm not sure how that solves spaces?

civility · on Jan 4, 2017

No more spaces:

    typedef unsigned char uchar;
    DECLARE_AND_IMPLEMENT_ARRAY_API(uchar)

lomnakkus · on Jan 4, 2017

Having worked in a code base like this (prior to migration to C++11), I'd agree that they're not necessarily a "nightmare", but they lead to code that's so verbose that it tends to really obscure the actual logic of a piece of code. (Especially anything more complicated than a simple array, e.g. associative maps or multi-dimensional arrays, and iteration constructs can be hellish.)

PeCaN · on Jan 4, 2017

I actually did this¹⋅² some time ago. Since then I've moved on from a hardcore C developer to C++ and Ada³, but it wasn't necessarily a bad solution. I think given how little C gives you to work with, it's the best solution you can do.

It does make debugging a chore. I ended up with a Makefile rule to run the test suite through the preprocessor (and did some hackery to exclude #include of system headers), format it with clang-format, and build that. Not exactly pretty or easy, but it got the job done.

1. https://github.com/alpha123/yu/blob/master/src/yu_splaytree....

2. https://github.com/alpha123/yu/blob/master/test/test_splaytr...

3. Where this sort of thing is much easier and I am happier and more productive.

dreta · on Jan 3, 2017

No, you just allocate enough space to store an extra int at the start for the length, and return a typed pointer to the actual data. Then you need an accessor that checks bounds, if you want safe access. Both of these problems are solved by simple macros.

dbaupp · on Jan 3, 2017

So you want the array to have type foo * ? Ignoring that this doesn't let the compiler help the programmer with arrays (you still have to manually remember to use the accessor, not []), you also have to manually remember which pointers are pointers and which are arrays, and this representation doesn't work for pointing into subsections of an array (a similar problem to C-style strings), nor does it work well for putting arrays on the stack, which means one is forced to allocate every array (both of which mean the safe C is likely slower than the equivalent in Rust or even C++).

dreta · on Jan 4, 2017

I agree that having to remember is a problem, it's one of the many shortcomings of C that it doesn't let you differentiate between types at compile time.

Pointing into subsections works fine. You just have to create a type for it. This solution doesn't have the same problems as strings because you don't rely on a terminating entry, and it's what languages like Rust or Java do as well.

You can allocate dynamic arrays on the stack in C just fine with alloca(). The only performance cost is when checking bounds, but since it's a dynamic array, it's the same cost you'd pay in Rust.

dbaupp · on Jan 4, 2017

Creating a type for it, for each type of array, will require exactly the macro array thing I was talking about. And see the sibling comment for how dynamic arrays/alloca isn't relevant, I'm just talking about static arrays. (Dynamic arrays on the stack do have a performance cost, as they get in the way of the compiler's optimiser/code generator: having non-fixed stack frames makes accessing locals annoying.)

awinter-py · on Jan 4, 2017

agree -- inability to do variable-sized arrays on the stack is the root of the problem.

dbaupp · on Jan 4, 2017

I'm not even talking about variably sized arrays, just creating a statically sized one and passing it into functions that take dynamically-sized one. For instance, a read function that fills an existing buffer doesn't care if the buffer is on the heap or on the stack, it only cares that it doesn't overrun the bounds.

alloca-style variable arrays is a whole other can of worms of danger and complexity.

enriquto · on Jan 4, 2017

What do you mean? Variable-length arrays are OK in C.

pjmlp · on Jan 4, 2017

Not in C11, they were removed due to security issues and became an optional language feature.

awinter-py · on Jan 4, 2017

oops -- you're right.

https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html

Legal in C99, available in C++ on gcc/clang.

pjmlp · on Jan 4, 2017

Optional in C11, not accepted in ANSI C++17.

PeCaN · on Jan 4, 2017

This is what I really like about Ada. Variable-sized arrays and structs on the stack is easy, safe, and efficient in Ada.

AReallyGoodName · on Jan 4, 2017

Arrays and pointers in C already have that int. That's why sizeof() works. The issue is an extra if statement on every single array and pointer access.

dbaupp · on Jan 4, 2017

They don't, sizeof is a compile-time constant. On a pointer, sizeof() just reports the size of the pointer itself (i.e. 4 or 8 bytes on most modern platforms), not the size of the data to which it points (and sizeof(*pointer) reports the size of the type to which pointer points, it doesn't know anything about how many values of that type are stored). For an array, the length is known statically (i.e. it's in the type), and so the computation can be done at compile time.

steveklabnik · on Jan 4, 2017

And this is such a misconception that it's often cited as a footgun: http://www.cplusplus.com/faq/sequences/arrays/sizeof-array/

> A beginner will often try something along the lines of size = sizeof( myarray ) (which is incorrect).

caf · on Jan 4, 2017

sizeof isn't always a compile-time constant: if it's applied to variably-modified type, it's not (obviously).

dbaupp · on Jan 4, 2017

Oh, yes, I'd forgotten about VLAs... Compile-time constant, except for sometimes.

nwmcsween · on Jan 4, 2017

You're missing main glaring issue with parametric polymorphism, bloat.

dbaupp · on Jan 4, 2017

While "unnecessary"/extra generated code is a trade-off one has to consider when choosing to use the specialize/monomorphise-everything implementation of parametric polymorphism (it isn't the only one), it isn't a problem in this case: the types themselves are a compile-time abstraction and don't exist at runtime, and the functions are all tiny (a branch, a memory access and a function call/abort).

Additionally, all the functions should be inlined anyway because the function call overhead will likely be as much or more than the actual code, and, more importantly, inlining enables other optimisations (removing the branch, vectorising the memory access, etc.). Once inlined, the code will be the same as the manual/macro-based approach of writing `if` statements around each array[index] access.

staticassertion · on Jan 3, 2017

Because your 'safe' implementation will certainly have a performance cost, and won't be the default. This is why, despite C++ providing std::array, you'll still find buffer overflows in C++ code. C++'s std::array provides the safe 'at' function but you're opting into a performance penalty and it's not the more familiar [] syntax.

Rust arrays/ vectors are safe-by-default. To use the unchecked, unsafe version requires using the 'unsafe' keyword.

let v = vec![0, 1, 2]; unsafe { let x = v.get_unchecked(5); }

This means you can basically grep audit for vulnerabilities, and the above code should be very rare.

jdmichal · on Jan 3, 2017

In addition, the Rust compiler can also remove the built-in indexing checks if it can prove the code is safe. So, say, iterator loops over an array won't have any index checking.

dbaupp · on Jan 3, 2017

As others have said, this is a fairly standard optimisation: compilers (Rust, C, C++, Java, PHP, whatever) will remove branches if they can see that it can never be taken. Iterators are slightly different in that they don't have any indexing checks at all: an array iterator yields a plain reference, and these are always dereferenced like plain pointers.

Veedrac · on Jan 3, 2017

Iterator loops are different. They don't elide bounds checks automatically, like frequently claimed. Instead they merge it with the loop branch, so there's no redundant checking.

ori_b · on Jan 4, 2017

So can the C compiler. The optimization is called 'value range propagation'.

staticassertion · on Jan 3, 2017

C/ C++ compilers are also capable of doing this, though rust's iterator syntax tends to make it a pretty natural optimization.

Animats · on Jan 4, 2017

Neither C nor C++ knows, at the language level, the size of an array, unless that size is fixed. The subscript checking variants of C and C++ have to use "fat pointers" which carry along size information. The overhead for this is large and nobody uses that. Fat pointers used to be a feature you could turn on in gcc, but it's somewhat abandoned now.

sirclueless · on Jan 4, 2017

> Neither C nor C++ knows, at the language level, the size of an array, unless that size is fixed.

Neither does Rust.

> The subscript checking variants of C and C++ have to use "fat pointers" which carry along size information.

So do Rust's Slices.

> The overhead for this is large and nobody uses that.

People use std::vector all the time for this purpose in C++. It has about the performance you'd expect, with very little overhead except where you want it in bounds-checking.

I don't think there's actually a performance difference here. Rust's default is safer because it requires dropping to unsafe code to do something dangerous, but the same optimizations are available in both.

steveklabnik · on Jan 4, 2017

One thing that I've heard might be a difference, but haven't confirmed yet: Rust's lack of move constructors. So you have a vector, it's full, you push one more. It has to reallocate. How do you copy all of the elements over to the new allocation? In Rust, it's a straight memcpy of T * n bytes. But due to move constructors in C++, IIRC they must be moved one at a time.

Again, I haven't actually dug into this; maybe someone more knowledgeable about this can point me in the right direction here?

Manishearth · on Jan 4, 2017

This is correct. I suppose if you could get folks to mark all memcpy move ctors explicitly with a macro instead of relying on the default you could specialize std::vector's move with sfinae. Bit hacky. It already specializes for pod types though.

Lack of move and copy ctors in rust greatly simplifies things like this, and makes it very explicit when code is running, but the trade-off is that intrusive data structures are hard to do on the stack in rust.

rustmemcpy · on Jan 4, 2017

How would a self-referential object work in Rust in that case? The move or copy constructor could not be a simple memcpy.

The self-reference would point to the old object. See this for an example: http://ideone.com/sEFtbN

Manishearth · on Jan 4, 2017

https://github.com/Kimundi/owning-ref-rs exists, but basically you just don't do intrusive datastructures on the stack in rust.

(Intrusive datastructures on the heap are doable with some tricks.)

They're not very essential so it works out.

sirclueless · on Jan 4, 2017

See steveklabnik's detailed answers elsewhere in this thread, but the short answer is that you can't write this data type in Rust.

sirclueless · on Jan 4, 2017

Well, for trivially copyable types[1] the reallocation can be a straight memcpy. For the rest, I don't know that having or not having a move constructor is the important distinction; it will be preferred over the copy constructor if it is declared as not throwing exceptions, but either way some constructor of the object must be called if it exists (though it might be inlined and optimized away).

I imagine Rust does something similar, copying bytes if the underlying type has the `Copy` trait and calling some actual code if not, but I'm not familiar with the details.

[1]: http://en.cppreference.com/w/cpp/concept/TriviallyCopyable

Manishearth · on Jan 4, 2017

No, all moves in rust are memcpy if not optimized out entirely. Rust has affine types so moves don't need to "invalidate" the source value at runtime, the compiler just doesn't allow you to use the source variable after a move.

Rust's answer to copy ctors is Clone, which is always explicitly called. Variable use in rust is a move. Trivially copyable types (Copy) will be copied without compile-time invalidating the old type.

steveklabnik · on Jan 4, 2017

Thanks for elaborating on this!

> copying bytes if the underlying type has the `Copy` trait and calling some actual code if not,

It does not. Moves and copies are both "memcopy these bytes", the only difference is if you can use the previous copy or not. (This is also, of course, subject to the optimizer, which may elide the copy.)

> either way some constructor of the object must be called if it exists (though it might be inlined and optimized away).

Yeah, this is what I was getting at; this has to happen in C++, but not in Rust. You are right to point out that this only matters for things that aren't trivially copyable.

sirclueless · on Jan 4, 2017

Interesting. How does Rust handle types that want to do interesting things when copied, like bookkeeping or updating internal pointers? Maybe you just can't, which would preclude some kinds of intrusive data structures, or owning non-copyable mutexes for example. Does `drop()` get called on objects that have been copied from?

steveklabnik · on Jan 4, 2017

You pretty much just can't; that's what Manish was referencing above about this kind of thing being awkward.

It would be cool to have those things, but it also means that there's less "magic" stuff going on, which is nice. And it makes the semantics of stuff like this a lot simpler.

> Does `drop()` get called on objects that have been copied from?

Nope. In fact, Copy types can't have a Drop at all, but types that move don't have their Drop impl called when they move.

sirclueless · on Jan 4, 2017

Thanks for all the responses, I'm learning a lot. This explains to me why iterators and other references to the internals of a data type have to take ownership of the whole data type, which is something I ran into several times during my (brief) explorations with Rust.

steveklabnik · on Jan 4, 2017

Totally. There's one other interesting subtlety you might find interesting here, and that's self-referenceing structs. So for example,

struct Foo { s1: String, s2: &str, }

where s2 is always intended to point at s1's backing storage. What's unfortunate here is that Rust will disallow this, as it doesn't understand that s2 is pointing to some data on the heap, with a stable address, not the parts of the String struct in s1 that are part of the struct itself. So what this means is, in plain Rust, this type isn't movable, Rust is concerned about the invalidation.

However, you can get around this restriction with some unsafe code to teach Rust about it; this is the premise of the "owning-ref" crate.

Manishearth · on Jan 4, 2017

> have to take ownership of the whole data type,

This isn't the case. They only need to borrow it. A borrowed value isn't allowed to move (the borrow itself can be copied around and shared within the scope of the borrow), so that works out.

Most Rust iterators only borrow the container or iterator they operate on. It's only explicitly moving iterators like .into_iter() (which extracts elements by-move) that don't.

jdcarter · on Jan 3, 2017

Go also does this as of version 1.7:

http://www.tapirgames.com/blog/golang-1.7-bce

pjmlp · on Jan 3, 2017

Compilers for Pascal dialects, Ada and Modula-2 were already doing that in the early 90's.

sbov · on Jan 3, 2017

This is everyone's favorite excuse to trot out. But in reality the vast majority of projects I've seen never actually measure WHAT the performance cost would be and whether or not it's acceptable. So in the usual case the answer tends to be a mix of laziness and "that's how it's always been done."

sidlls · on Jan 3, 2017

How do you suppose runtime bounds checks are done in Rust? They certainly also incur a performance penalty in not-trivial cases.

Also, "safe by grep audit" means "safe according to a human." The argument of course is that it lowers the surface area of what a human must be trusted to verify. I'm still not convinced by that argument, because human error is a thing. And for actual systems programming, "very rare" may not be true.

staticassertion · on Jan 3, 2017

> How do you suppose runtime bounds checks are done in Rust? They certainly also incur a performance penalty in not-trivial cases.

Certainly. I didn't intend to imply otherwise.

> Also, "safe by grep audit" means "safe according to a human."

Again, totally correct here.

> The argument of course is that it lowers the surface area of what a human must be trusted to verify. I'm still not convinced by that argument, because human error is a thing. And for actual systems programming, "very rare" may not be true.

Well, given a codebase where both safe and unsafe code exists, the amount of unsafe code is strictly less than the amount of both safe and unsafe code. So it does reduce the amount of code needed to audit, even in a very atypical case where a ton of the code is unsafe.

It's true that a project may use egregious amounts of unsafe. That would be unfortunate. Rust is still safer than C in that case, since it just defines more behavior (like arithmetic overflow), but I certainly wouldn't pretend that the rust code should be trusted.

When writing rust one should certainly strive to write less unsafe code, and to always document the invariants required for unsafe code to be safe.

Rust is not 100% safe 100% of the time, I'm only arguing that safe defaults are critical, and that grep auditing is a powerful tool.

alexkehayias · on Jan 4, 2017

I wrote a little tool to help check Rust crates on GitHub. It's been really interesting seeing how different libs use unsafety. https://github.com/alexkehayias/harbor

EugeneOZ · on Jan 3, 2017

Rust is safe in terms of memory usage and race conditions, nobody claims that Rust compiler will catch all 100% of bugs human can invent.

sidlls · on Jan 3, 2017

Rust is safe by default in terms of memory usage. It is not, however, strictly memory safe. It is trivial to overflow a buffer in Rust, for example. I haven't discovered a trivial way to hide it, though.

steveklabnik · on Jan 3, 2017

If Rust is not memory safe in safe code, then you've found a bug. Please report it to https://www.rust-lang.org/security.html

derefr · on Jan 3, 2017

If you're relying on any random third-party Rust crates you haven't audited yourself, don't you lose the safety guarantee? A given crate might turn out to have implemented operations on some data-structure using unsafe blocks, and then to have failed to mark its own API functions as unsafe in turn (like the Rust stdlib does, but without the "extensive manual auditing" that the stdlib gets).

AFAIK, cargo doesn't have any feature to point out when a crate contains unsafe code—so you pretty much need to grep the source of every crate you consume for "unsafe".

Animats · on Jan 4, 2017

There's a lot more unsafe code in Rust crates than there should be. That's a fixable problem. Some stuff from the early days predates the optimizer getting smart enough that unsafe code isn't needed. I wrote on this a few days ago in a Rust topic.

Manishearth · on Jan 4, 2017

While I now mostly agree with you that there is more unsafe code than there should be, I still maintain that the frequency of unsafe in a deptree is usually still small enough to be practically auditable, ignoring FFI. It could/should be much less, but it's not too bad. I've done such audits a few times and it's not been too hard and taken very little time.

Auditing FFI is a whole other challenge, however :(

pjmlp · on Jan 4, 2017

> I still maintain that the frequency of unsafe in a deptree is usually still small enough to be practically auditable

Not in binary libraries, hence why it is important to have a culture to only use unsafe if it really must be used.

Manishearth · on Jan 4, 2017

Well, yeah, but you don't really download Rust binary libraries yet :)

You do have C libraries which you access through FFI. This is inevitably unsafe. We should be auditing more there. Though IMO it's still manageable, for most crates.

staticassertion · on Jan 4, 2017

Hmmmm. Note to self - actually try to audit a reasonably sized project's unsafe code to see how reasonable it is.

steveklabnik · on Jan 3, 2017

That's why I said "safe code," I mean, not using any unsafe.

The issue you're talking about is related, but different.

derefr · on Jan 4, 2017

Right, I was just trying to put a finer point on your use of "using any unsafe" here. It sounds like you mean "using" in the lexical sense (writing the token "unsafe" in your code), but you mean it in the dynamic sense (having an unsafe block in your control flow graph.)

steveklabnik · on Jan 4, 2017

That's a good distinction to draw, thanks.

sidlls · on Jan 3, 2017

Let me clarify: in Rust it is trivial to use memory unsafely. It is not, so far as I have found, trivial to hide that fact because it is required to use "unsafe" syntax decoration" to do so.

Manishearth · on Jan 3, 2017

The issue here is that that definition of "safe" language basically excludes all practical languages, including languages like Python, because FFI is possible.

In general when talking about safety in a language it's about the level of explicitness required to trigger unsafety.

I like the distinction made in the nomicon (https://doc.rust-lang.org/stable/nomicon/meet-safe-and-unsaf...) -- Rust comprises of two distinct languages. You have everyday Rust, which is completely memory safe, and "unsafe Rust", which looks similar to everyday Rust but is not safe. `unsafe {}` blocks are your FFI between the two. Looking at unsafe blocks as FFI is IMO a very useful mental model especially for understanding the changes to invariants involved.

staticassertion · on Jan 3, 2017

In safe rust it is definitely not trivial to overflow a buffer in a way that violates memory safety.

rtpg · on Jan 4, 2017

"Actual systems programming" mostly does not involve unsafe code.

For example, most OS code is _not_ interrupt hooks or malloc but the rest of the OS. Most of Postgres is not reading data quickly, but higher-level abstractions.

Large-scale systems programming will always be mostly higher-level abstractions, because that's the only way to write large programs. Name any "systems programming" OSS project, choose a random C file, and you'll see that most code does not require pointer arithmetic except because that's how you do things in C.

You don't bit-twiddle for 200k lines of code, so being able to limit dangerous stuff like that to the 5k lines that actually need it makes work an order of magnitude easier.

anarazel · on Jan 4, 2017

> "Actual systems programming" mostly does not involve unsafe code.

True. Or slightly rephrased: Most of the unsafe code is centralized to some core pieces.

From the POV of a PG dev: The big problem using something like rust for something like PostgreSQL is its it's portability, stability and uncertainty about where things are going. We do five years of back-branch releases (and for many that's not even enough!). Language and tooling around the new crop of languages simply aren't mature enough for that yet.

mistaken · on Jan 3, 2017

To be fair the [] syntax is an overridable operator and you could just point it to use the "at" method. Not sure why they've implemented it as an usafe function.

loup-vaillant · on Jan 3, 2017

With C++, the default is zero overhead, not safety.

temac · on Jan 4, 2017

If people start to consider bound checking branches as overhead (which, in some extremely rare limited cases, they are in the right to do so), they should as well understand what happen with e.g. some largely used calling convention such as the one of Windows. Even in optimised builds, a unique_ptr<> for example can have an overhead compared to a raw pointer, IIRC because it forces going through the stack. If I remember my typical latencies correctly, for modern Intel CPU and probably a lot of high perf CPU, that can actually in some cases have a greater impact than an extra bound check...

So if you are in a so performance critical section that you start to care about the "zero overhead" kind of stuff and the cost of your (predicted) bound checks, you might be impacted by this non-zero overhead... (that is falsely widely believed to be zero!)

And don't get me started about debug builds with all mainstream compilers. The performance is then complete utter shit. This is made worse by the fact such an unsafe language needs debug builds more...

C++ was an interesting experiment in its domain, and has been and still is a success in some aspects, but honestly given core language modifications are needed to significantly extend the behaviour of core types (like vectors, unique_ptr, strings, etc. -- e.g. with the introduction of rvalue references and all the associated machinery and default constructors and so over), I'm starting to believe there are little advantages of this approach over integrating such fundamental types/concepts in the core language (I'm not advocating to do that for C++, I'm thinking about other/future languages). Then you can have more true "zero overhead" stuffs, whatever that means.

loup-vaillant · on Jan 4, 2017

> if you are in a so performance critical section that you start to care about the "zero overhead" kind of stuff

If I don't have this kind of performance needs, I have no reason to use C++ to begin with. I mean, why use such a monster if I don't need the crazy performance it may offer when used properly? And even then, C, D, or Rust may be viable alternatives.

Veedrac · on Jan 4, 2017

If a call is statically dispatched, calling convention doesn't matter because static dispatch gets inlined away in hot functions. If a call is dynamically dispatched, you have much bigger things to worry about. Bounds checking is a much bigger problem for hot code.

pjmlp · on Jan 3, 2017

Not 100% true when using the standard library and compilers like VC++, which do provide such checks in debug builds, while allowing to selectively turn them on, on release builds.

pjmlp · on Jan 3, 2017

It is, if you are using VC++ debug builds or define the macro for secure STL.

The idea of ANSI C++ was to provide safety behind library calls instead of compiler switches.

lacampbell · on Jan 3, 2017

Would you care to provide a short rust implementation of reading an arbitrary length string from standard input?

I have rust installed. I would be curious to benchmark it and see if is indeed faster than the same thing in C, using a user-defined bounds checked array.

steveklabnik · on Jan 3, 2017

I'm not going to make any performance claims, but in the interest of sample code, https://doc.rust-lang.org/stable/std/io/struct.Stdin.html#me... has a very small program that does this in the most straightforward way. (You'll have to wrap it in `fn main()`, we don't show that in examples in the docs)

If you want a true speed comparison, you'd have to define more than just "read an arbitrary string". For example, encoding requirements. Your C is going to treat it as just a bag of bytes, and you can do that in Rust too, but it's not the most natural way; you'd convert to UTF-8, which isn't free. Stuff like that.

rnestler · on Jan 3, 2017

Did you mean something like this?

    use std::io::{self, Read};
    fn main() {
        let stdin = io::stdin();
        let mut data = vec![];
        stdin.lock().read_to_end(&mut data)
            .expect("Reading stdin failed");
    }

dbaupp · on Jan 3, 2017

(FWIW, You could golf that to io::stdin().read_to_end(...), because the Read implementation for Stdin does the same locking internally: https://github.com/rust-lang/rust/blob/8f62c2920077eb5cb8132... .)

_sh · on Jan 3, 2017

It's not so much a question of benchmarks, it's that one of the standard C tools for reading an arbitrary string from standard input is gets(). And if you reach for that from the standard toolbox, you've failed before you've started.

caf · on Jan 4, 2017

gets() isn't part of the C standard anymore (it was entirely removed in C11, and prior to that was marked "obsolescent and deprecated" 18 years ago in C99).

lacampbell · on Jan 3, 2017

Many rust advocates talk about the performance cost of doing things safely in C, and inform me that rust has "Zero-cost abstractions". So it's fairly natural I ask for something I can benchmark.

caconym_ · on Jan 4, 2017

> Many rust advocates talk about the performance cost of doing things safely in C

Who's saying this? It doesn't really make sense, taken out of context as it is here.

> inform me that rust has "Zero-cost abstractions"

IMO at best the array bounds checking thing is an extremely poor example of a "zero-cost abstraction"; at worst it's not an example of it at all. As I understand it, the term refers to things like static dispatch on closures and trait methods, iterator fusion, and generally any place where the compiler can transform high level abstractions into really efficient low-level code where in other languages/implementations you might incur a performance penalty, for instance by dynamic dispatch to heap-allocated closures, or allocation of an intermediate vector at every "link" in an iterator method chain.

tjr · on Jan 3, 2017

Having used C (on and off) for about twenty years, including writing aerospace software in C, I strongly agree that writing safe C is possible.

Rust (which I have thus far only tinkered with) looks pretty appealing to me, though. Do you suggest using C instead of Rust? If so, why?

dreta · on Jan 4, 2017

If Rust solves the kinds of problems you have, then by all means use it. I don't think there exists a language that's in all aspects better than any other. For me personally, C is the language that gets in my way the least without forcing me to write ASM, and that happens to be highest on my list. If there was a language with the same idea of minimalism behind it as C, but with all the quirks and bad syntax choices out, i'd switch in a heartbeat.

lacampbell · on Jan 3, 2017

Rust (which I have thus far only tinkered with) looks pretty appealing to me, though. Do you suggest using C instead of Rust? If so, why?

I'd probably suggest a higher level language if you can get away with it (portability, performance, etc), but I am not really advocating for using one language over another here.. What I am advocating is writing a thin layer of abstraction to avoid buffer overflows in the already extant code base. I suspect it would be a better choice in terms of cost/benefit than re-writing the code.

staticassertion · on Jan 3, 2017

I didn't claim that rust would be faster than C (not that it can't be) in that situation. I would expect very similar performance.

I'm not feeling very well unfortunately, not really in the mood to code.

lacampbell · on Jan 3, 2017

I just keep hearing about how rust handles safe bounds checked arrays with (and I quote) "Zero-cost abstractions", which to me implied it would be faster than C, since C pays some performance penalty by branching.

Hope you feel better soon.

wmf · on Jan 3, 2017

"Zero-cost abstraction" really means zero additional cost beyond what is required. So Rust still pays a cost for bounds checking, but it's the same cost as if you optimally hand-coded the bounds check in C.

pekk · on Jan 4, 2017

Adding the cost of bounds checking is not actually "zero cost" relative to a C default of not bounds checking. There's really no reason to speak in this misleading way except marketing.

Of course, bounds checking is worth paying a cost! As are many other common foot-guns like remembering to free memory. Which starts to raise the issue of when it would actually be necessary to reach for Rust instead of (arbitrary example) Java. Because a language with enforced bounds checking is not really the same kind of thing as C, and we've already had languages that are safer than C for decades.

Manishearth · on Jan 3, 2017

"Zero cost abstraction" means that the abstraction doesn't impose an additional cost. It is the same cost of branching that C will have, modulo optimizations.

(Languages with dependent types can help move bounds checks to compile time, but Rust does not have those)

staticassertion · on Jan 3, 2017

Ah, I see. For what it's worth C will perform similar optimizations to Rust, which is why I expect similar performance. And thank you.

EugeneOZ · on Jan 3, 2017

No, Rust arrays are not safe by default: https://is.gd/iY5lPQ

staticassertion · on Jan 3, 2017

Panics do not violate memory safety. In C that code would be undefined behavior.

EugeneOZ · on Jan 4, 2017

You are right, my mistake. I just can't call code failing in runtime "safe" (because I can't rely on it). But in terms of memory safety it's safe.

moosingin3space · on Jan 4, 2017

Then use the get() function on an array/slice. It returns an Option so it won't perform an access off the end of the array, and you can catch it.

EugeneOZ · on Jan 4, 2017

thanks, I do. Only thing left is explain it to everybody else to avoid panicking in 3-party crates. I really hate when people use panicking just because it's easier than error handling. So it was "surprise" to see such behavior from "[x]" construction.

cesarb · on Jan 4, 2017

Like in C, you are not supposed to pass out-of-bounds indexes to the [] operator. If anyone does so, it's a bug. Rust converts it to a panic, which safely terminates the thread/program, instead of C's undefined behavior.

You can also catch this panic at the thread boundary, so other threads in your program can keep running, unless the program was compiled to call abort() on panic.

EugeneOZ · on Jan 4, 2017

You still think in C limits of possibilities. Rust could just return Option as a result for [x] syntax.

cesarb · on Jan 4, 2017

And then people would simply call .unwrap() on the return of the [] operator, leading to the same situation. It's extremely common for an array dereference to always be within the bounds, by code construction (I'm iterating over the indexes of the array, or I have an index into the array saved inside some other structure, and so on). The programmer knows it will never be out of bounds. The assert!() within the [] operator is only to protect you when the programmer gets it wrong, and in a correctly written program, will never trigger.

(The alternative instead of .unwrap() would be to propagate the error, polluting the whole program with code to handle errors which never will happen. And since they will never happen, many programmers would simply start ignoring them - and in the process, they would by accident end up ignoring errors that can happen. Not a good situation.)

EugeneOZ · on Jan 4, 2017

There is HUGE difference between "unwrap" (which can found by grep) and mutual dangerous behavior. Programmer can only assume, not know, without additional checks. "Polluting with error" is much better than panicking.

EugeneOZ · on Jan 4, 2017

And for safe iterations Rust has "for .. in".

steveklabnik · on Jan 3, 2017

Panics are definitely safe. They are significantly different than the segfault you'd get out of the C code.

EugeneOZ · on Jan 4, 2017

Besides of memory safety, what is different in panicking? I'm not trying to argue - I don't know it. For me result is the same - process failed. What is difference in consequences for memory?

gue5t · on Jan 4, 2017

Not all violations of memory safety result in immediate program termination. There's an intervening period of execution during which your program simply performs unintended operations such as overwriting unrelated memory (which could be a memory-mapped file), terminating "correctly" (and performing whatever operations that entails) with corrupt state, and generally an unbounded set of other potential consequences. An immediate segfault is the absolute best hope for a program without memory safety, since you both see no unintended effects performed on the program's behalf and you learn that your program needs fixing.

EugeneOZ · on Jan 4, 2017

Got it, thank you.

gpderetta · on Jan 4, 2017

Panic behaviour is well defined. Segfaults are non-reliable (i.e. it might or might not crash) and often exploitable.

ue_ · on Jan 4, 2017

There are tools you can use with C compilers that don't access OOB memory on OOB access. What's your point?

I think I'd rather not have it panic at all. They were making a language from scratch and still didn't solve the crash-at-runtime problem that C has. Instead they just made a common C compiler option default and called it 'safe'.

It's worse than the fact that the Go creators ignored years of research after the year 1970 and didn't implement generics, for no good reason.

Rust's safety is a joke, and so is the idea that a lot of it is anything 'new'. They had the chance to fix it and they didn't. Boo.

the_why_of_y · on Jan 4, 2017

Which common C compiler has a production quality implementation of array bounds checking?

Before you say Address Sanitizer you should be aware that its authors intended it as a debugging tool and recommend against using it in production as it introduces new attack surface, as described in the "Address Sanitizer local root" mail.

http://seclists.org/oss-sec/2016/q1/363

steveklabnik · on Jan 4, 2017

> What's your point?

My point is that panics are memory safe, contrary to what my parent said.

outworlder · on Jan 4, 2017

I get: thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 4'

Looks pretty safe to me.

ue_ · on Jan 4, 2017

How is such a crash remotely safe, for example, in an aviation system? Or a hospital's main computer?

A panic isn't safe, and if you are going to claim that it is, you can get the same safety in C using gcc and clang features.

dbaupp · on Jan 4, 2017

Something crashing, so an error can be reported and a watchdog can restart it (etc.) is far better than just forging on with corrupt memory and wrong answers.

People do not use the sanitisers nearly as much as they should, and nor are they designed to be used in production. There's has been CVEs issued for them caused by the testing mindset in which they are written.

In any case, how much a programming language helps the programmer get a correct program (I.e. the most general form of safety) is a thing of degree, not all or nothing. Rust does a lot more than C, pushing many many errors to compile time, meaning they're fixed before the code even runs. Sanitisers only catch these when they actually happen, and will presumably result in the runtime crash you're so concerned about, if used in production. The fact that Rust isn't dependently typed and so can't catch OOB at compile time is unfortunate in this respect, but don't throw the baby out with the bath water.

cesarb · on Jan 4, 2017

> A panic isn't safe, and if you are going to claim that it is, you can get the same safety in C using gcc and clang features.

There's one important difference. As far as I know, the bounds checking of gcc or clang can either print a warning and continue, or terminate the process.

A panic in Rust, however, will safely unwind the stack (similar to a C++ exception, in fact it's the same mechanism) and terminate only the thread. The rest of the program can continue running, and even start a new thread to replace the terminated one.

You can see this in action when running "cargo test". A panic in a test (the assert!() and assert_eq!() macros, often used in tests, do a panic in case of failure) will not terminate the whole process; the rest of the tests still run.

steveklabnik · on Jan 4, 2017

(panics are for "non-recoverable" errors, and so don't _have_ to unwind the stack: you can abort on panic as well. The important difference is that you get well-defined behavior in all cases.)

macintux · on Jan 4, 2017

This sounds like the "fail fast" philosophy in Erlang. Crashing is the standard behavior for any unexpected data, but the process model makes it straightforward to manage those crashes.

naasking · on Jan 3, 2017

> i just don't understand why are buffer overflows such a huge problem in C when the same thing is going to come up when trying to work with memory in Rust.

False. Buffer overflows in C can overwrite the program's memory, so it can be hijacked and supplanted with the attacker's code. This cannot happen in Rust (unless unsafe code has the vulnerability), or any memory safe language.

Sure you can implement a safe array/buffer abstraction and use it in your C programs that abort on invalid indexing. Now how many actually do this? Very few given the prevalence of C programs on vulnerability disclosure lists.

jimmaswell · on Jan 3, 2017

How much does that still happen with DEP and such? Don't OSes not let you write to executable regions or execute from stack/heap by default now?

openasocket · on Jan 4, 2017

It doesn't help as much as you'd think. With a buffer overflow on the stack, you can overwrite the stack's return address to point wherever you want, and overwrite a bit further to set the arguments to whatever you want, as well as any local variables on the stack. That depends, obviously, on what code is already there, but there's a lot you can do. For instance, if the system() libc function is linked, you can overwrite the return address and arguments to call system("some arbitrary shell code").

Check out https://en.wikipedia.org/wiki/Return-oriented_programming for more information.

imtringued · on Jan 4, 2017

You don't need to execute the stack. Just use ROP to fill the stack with return addresses that point to "gadgets". Gadgets are basically a single assembly instruction at the end of a function like XOR EAX,EAX and then a RET. Every time a RET is executed the CPU will jump to the next gadget. There are usually enough gadgets inside a program to basically do anything you want.

https://www.exploit-db.com/docs/28479.pdf

Analemma_ · on Jan 4, 2017

DEP only makes attacks harder, not impossible. It's not a magic bullet for a couple reasons: it's opt-out on non-*BSD operating systems so software might not be using it, and it can be circumvented with things like return-to-libc attacks.

temac · on Jan 4, 2017

I would like to add that there are starting to be mitigations against attacks modifying control flow, for example CFguard under Windows (which will probably be extended to cover ROP also) but like DEP an ASLR those are only mitigations, not something that would make it useless to care about buffer overflow anymore.

moosingin3space · on Jan 4, 2017

In order for DEP to render such attacks impossible it would need to have permission bits at the byte level. DEP on most OSes is page-level, so the stack page is marked as non-executable, which turns an attack from executing code on the stack to executing code from the text segment - either through ROP gadgets or return-to-libc.

lossolo · on Jan 3, 2017

I think he means to work with raw memory so using unsafe keyword and in this case he is right. And you can't implement certain things in Rust if you are on the quest for maximum efficiency without using unsafe.

naasking · on Jan 3, 2017

> And you can't implement certain things in Rust if you are on the quest for maximum efficiency without using unsafe.

Trading efficiency for safety for unaudited code should be difficult. C does not make this difficult. Rust makes this difficult.

Auditing should be supported for this kind of code. C does not support auditing in any meaningful way. Auditing much easier in Rust since it explicitly identifies code that may be unsafe.

In conclusion, I disagree emphatically that C and Rust are somehow equivalent even when you are dropping down to unsafe code.

lilyball · on Jan 3, 2017

Working with `unsafe` code in Rust has the same potential issue to be sure, but very little (if any) of your Rust code should be `unsafe`. You can usually accomplish what you want without having to drop down into raw C pointers. And everywhere that you do have to do this is explicitly called out with the `unsafe` keyword so it's very easy to audit.

Manishearth · on Jan 3, 2017

This is throwing the baby out with the bathwater though.

You need unsafe to do some things in Rust, sure. But usually this code is isolated and auditable, and nowhere near the linecount of the rest of the application. Even the operating systems written in Rust have pretty conservative use of unsafe (redox, Phil's OS, etc). Most of your code won't be working with raw memory operations, most of it will work with zero-cost abstractions on top of raw memory.

lossolo · on Jan 3, 2017

> You need unsafe to do some things in Rust, sure.

And that's what I wrote.

> Most of your code won't be working with raw memory operations, most of it will work with zero-cost abstractions on top of raw memory.

For some web applications sure but industry is not constrained only to writing web applications especially if you want to get into C market space. In HPC or time/space constraint environments this matter.

Manishearth · on Jan 4, 2017

I'm not talking about web applications. I'm talking about systems programming. Even the operating systems written in Rust use unsafe pretty conservatively. Servo uses unsafe mostly to talk to native libraries. Rust is used more for lower level programming than it is for webapps as far as I can tell.

You might want to define further what you mean by "raw memory here". Rust lets you work with arrays and vectors and the heap safely just fine. These are designed as zero-cost abstractions over raw memory (slices, Vec<T>, Box<T>). The equivalent C (with relevant bounds checks) wouldn't be any faster. Rust does not let you do things like call out directly to malloc/free safely. But that's okay. The existence of these abstractions means that you rarely have to do this.

> And that's what I wrote.

.... huh? that's exactly why I put "sure" there, that means I agree with that statement, what followed was why I disagree with the conclusions.