> I find a bit of solace in the fact that implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky
What? No it isn't. The entire problem here is incorrectly assuming that "A owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed constraint, but it certainly isn't a logically necessary one. Just do what C++ (and C#, etc.) do: the data structure (std::list, etc.) owns all the nodes, and the nodes merely reference each other. The nodes don't own their siblings. It makes sense and it doesn't get tricky.
The difficulty in Rust is reflecting how there's not much difference between a correct doubly-linked list and an incorrect one: it's not too hard to end up with dangling pointers due to bad destruction.
It's (very) hard for a compiler to tell that a back-pointer won't be held around after the thing to which it points is destroyed, or even the pointer to a node after something else deallocates it, e.g. in pseudo-code:
Unfortunately, this would be a use-after-free, if one were to translate that code into C++, using modern features that are always touted as part of "modern C++ is safe enough", with the obvious definition:
class ListNode {
std::unique_ptr<ListNode> next;
ListNode *prev;
// some data or whatever
};
One could even switch to 'std::optional<std::reference_wrapper<ListNode>> prev' (i.e. trying to avoid C legacy that people sometimes suggest is the unsafety in C++), and... it doesn't change anything.
Each node is owned by its previous one, meaning 'node.prev.next = node.next' overwrites the owner of 'node', and so the next line is accessing freed memory.
Of course, it's not very hard to fix (or find, in the first place) this particular example, e.g. just swapping the lines, or a completely different ownership scheme ala what you say about std::list. But, the above code looks very reasonable, is a problem even using good modern idioms in C++, and is also only a minor change from correct code (and, C and C++ do not offer much assistance to find or fix these sort of problems).
Fun fact: The `unique_ptr` approach you mention is also broken (and I would go so far as to say it's wrong entirely). The destructors will blow the stack. I really do think std::list's approach is the only correct solution for a generic linked list.
Using the default destructor is broken, yes, but the approach isn't: one can avoid it in the list's destructor using a loop that walks over the list to not let nodes be destroyed recursively (i.e. clearing the next pointers).
node = head
while node:
next = transferOwnershipAndClear(node.next)
node = next
Where the third line would be 'std::unique_ptr<...> next(std::move(node.next)); ' in C++ that uses unique_ptr<...>, or 'let next = node.next.take()' in Rust that uses Option<Box<...>>.
Of course, you could definitely argue that there's little point using unique_ptr if you're still having to write a destructor for the list itself.
You can keep putting bandages over it, but the approach itself
is fundamentally broken. It fundamentally intertwines the memory management with the data structure management when the two really are pretty orthogonal to each other. For example, if you ever want to use a different allocation function, you'll run into trouble. How do you get the right function to deallocate each node, and at what penalty? Does every node keep a pointer to a common allocator now? Or, for example, if you have a move constructor/assignment like that, now you're declaring that it makes sense to "move" one node onto another, even if they use different allocators. But does it really? Their ownership simply doesn't have anything to do with their sibling relationship, and you're forcing them to be tied together. Like, yeah, you can keep putting bandages on top of this, rub some alcohol on it, and giving it crutches to make it work, and I'm sure you'll eventually make it work, but the right thing to do is to just step back and realize the flaw is in the approach itself: some concerns are global across the nodes (like memory management), and some are local (data structure management), and hence it doesn't make sense to mix the two.
(Oh, and did I point out that all of this means you'll have to find a different solution when the data structure is no longer linear to allow a destructor hack like that? The fact that the scope of the approach is limited by a property that really isn't all that relevant to the actual problem is another sign that it isn't the gift one.)
I am not advocating for it being a great way to implement a linked list, just using it as an example that can be easily understood in a throwaway comment.
As an alternative:
# Delete all the elements from node 'from' to (but not including) node 'to'
erase(list, from, to):
from.prev.next = to
to.prev = from.prev
while from and from != to:
from = from.next
list.allocator.free(from)
# update 'tail' if necessary
destroy(list):
node = list.head
while node:
next = node.next
list.allocator.free(node)
node = next
This code is also wrong. Calling erase(list, list.tail, list.head) or similar will create a loop, and then destroy will loop forever/read dangling pointers. This is also very close to correct (I believe 'destroy' looking for node == tail instead of node being null would avoid the loop), but isn't.
Yes, it's easy to write a linked list, but it's also easy to screw up in subtle ways, and that line is thin. The safe subset of Rust restricts what is legal to be able to get some sort of control/understanding on pointer-soups, so that it can verify and validate that problems like the above don't turn into really bad problems. (The equivalent thing couldn't happen with vector-and-indices or arena-allocated nodes, and would result in the less dangerous[1] problem of an infinite loop/memory leak with Rc/Weak or garbage collection.)
[1]: A denial of service is better than remote code execution (proof: the latter can be turned into the former).
So now your rebuttal to me pointing out of this design flaw is that it is possible to write a buggy implementation even in the absence of such design flaws, and that Rust can help you avoid those bugs too. I'm not sure entirely what else I'm supposed to say in response, but yes, that is a true statement.
I'm arguing against your top-level original comment; I have no attachment to any particular scheme for implementing linked lists, they're just examples of things that are almost right, but are wrong enough to result in memory corruption.
My thesis is "it does get tricky to implement a linked list without garbage collection".
My original example was simple to avoid having to work through a longer example. Now that I reread, I agree that that particular example isn't relevant.
My second example was demonstrating that even a list fitting into your proposed ownership scheme (which, you say, "doesn't get tricky") does get tricky. You apparently agree that this is problematic, so you also apparently agree that it is tricky to implement a linked list.
Rust rejects most attempts because of the risk of memory unsafety, and it's hard to convince it otherwise, because there's little difference between 'safe' and 'unsafe' in a pointer soup.
Oh, so all you're arguing against is my statement that "it doesn't get tricky"? In that case I think you're somehow missing the entire point of these discussions. I didn't make that statement in the abstract; it had some context behind it. The thesis of the article (and hence the basis for my comment) was that linked lists are tricky in Rust -- and presumably not (as much) elsewhere, or Rust wouldn't the focus of the article. I am saying, no, that trickiness only comes about because he's insisting on the wrong design, and if he took the same approach as in C++ (or C#) he would no longer encounter it. That's what "it doesn't get tricky" means. If you pull it out of context and make your thesis that "it does get tricky to implement a linked list without garbage collection" in general, then okay, yeah, sure... linked lists are tricky, pointers are tricky, programming is tricky, life is tricky, etc... but then we get nowhere since all of those are entirely missing the Rust context behind the discussion.
No, I am disagreeing with your explicit disagreement with "implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky".
Now that I reread for a third time, I suspect you may've picked up on the "a data structure like this" to mean "a linked list with this exact ownership arrangement", whereas I interpreted it as the looser "a data structure like a linked list". That puts the rest of your comments into more context, and sure, I grant you that it's a suboptimal choice, but I don't think it's what the author meant.
In any case, I think I was rather assuming too much Rust context: taking your suggestion does not resolve the trickiness in Rust (as in, you won't be able to convince the compiler to accept it without using `unsafe` or reference counting). It's a very common meme about Rust that there's no way to write a safe C++-stye linked list with pointers, something independent of the choice of ownership scheme. "Fixing" the design doesn't actually help: it's still way too much of a pointer soup.
> No, I am disagreeing with your explicit disagreement with "implementing a data structure like this in a non-garbage collected language without Rust is also quite tricky". [...]
I see what you're saying, and again: it's missing the entire baseline and context for that claim. My thesis was that if he had known/considered/used the actual std::list design (which I assume he hadn't, or he wouldn't have proposed nodes that owned siblings), he would not have considered linked lists to be tricky in non-GC'd languages according to whatever his baseline is for that (presumably, tricky enough to blog about it). But somehow you simply extracted my reply with that one quote I replied to, discarded all the rest of the context (his blog post and my comment and all), lowered the baseline for "tricky" (from ≈"tricky enough for him to call it 'tricky' and blog about it" to ≈"possible for the average programmer to write an initially-buggy implementation thereof"), and then ran away with this beautiful straw man to refute. =P Except that (obviously, I had thought) wasn't my claim in the first place...
That doesn't work. std::weak_ptr is a reference counted pointer matched to std::shared_ptr; there's no concept of weak pointers (i.e. non-owning with dynamic checks) for std::unique_ptr, which corresponds almost exactly to Rust's Box.
However that LinkedList implementation requires unsafe{} to be implemented. All unsafe really means is that the compiler isn't going to hold your hand, the usual memory ownership footgun is available at your discretion.
unsafe shouldn't be this mythical thing you don't touch like people seem to think it is. If you need to escape the compilers very helpful guidance you can and should, but test thoroughly!
Going on a tangent, but I honestly think 'unsafe' might suffer from a naming issue. It should've been called 'unchecked' or 'unverifiable' or something that says the code is merely not verified to be safe, not that it is actually unsafe.
Nope, unsafe does exactly what it says on the tin.
C# tackled this problem 15 years ago. I'm sure other languages (Haskell) did it even earlier. When to use unsafe is a judgement call. Each developer and team will have to set their own standards. Some people will abuse it. None of this is new. At first it scares people. They think this is the brave new world, using unsafe feels gross and backwards! Eventually they understand where it is and isn't appropriate.
You might think "so what? Why even bother with a safe-by-default language?"
Because it greatly restricts the problem space. Rather than being forced to examine every line of code for every possible bit of undefined behavior or every path of flow control for memory errors you only need to think really hard about edge cases inside the unsafe blocks. Simply by virtue of being a relatively small number of blocks of few lines the problem of safety and correctness becomes easier to understand. Easier to test. Easier to reason about.
Unsafe is a tool. It's a dangerous tool so you should always wear your gloves and safety goggles. But when faced with a problem for which it is the best tool you should use it without regret.
> Nope, unsafe does exactly what it says on the tin.
Depends on how you interpret the name—whether it's referring to what it does (makes things no longer automatically safe), or whether it's referring to what the code inside it does.
If you write only safe code, inside an unsafe{} block, then nothing unsafe is happening. Fewer compile-time static-analysis checks are happening, but if you manually verify that the code is "safe anyway" the way C/C++ programmers are forced to do, then you can declare that the code is safe, maybe with a comment like:
// this is safe
unsafe{ ... }
That seems bizarre and contradictory, no? But it would seem less weird if it was:
// this is safe
unchecked{ ... }
Of course, there's no reason you should be using an unsafe{} block for only safe code, so unsafe{} is usually a pretty good label for the content of the block.
Right but the hope was that Rust was all the way safe, not just most of the way safe. That’s it main niche. That’s why someone would choose it over C++. But if 95% of your code is safe and 5% of your code isn’t, and the safe 95% uses that unsafe 5% all over the place, it inherently makes the safe code an entry point into unsafe code, kinda-sorta making it unsafe too. So it ends up feeling like all the hard work to keep things safe was a waste.
But when you're debugging, you know where to focus your efforts.
If 95% of your code touches the other 5%, then that 5% is probably pretty important and useful and hopefully fast. Spending some extra time to verify safety in exchange for speed/control is a small price to pay, and will pay dividends from the other 95% of code that doesn't have to be inspected so closely.
No, the objective for Rust was never to be all the way safe.
Rust gets much inspired by C++ and seeks to be a systems language where " there should be no room for a lower-level language between [it] and native machine code".
If you want that, you need unsafe blocks. The intent is to use those blocks to build safe abstractions that can be used for the lion's share of your program.
I'm not having a problem with it personally, I'm just saying that trying to educate the developers instead of addressing it on Rust's end seems like a potentially losing battle, as unfortunate as it might be.
I hope we don't settle for unsafe being okay forever. Right now, sometimes it is the right thing to do and there shouldn't be any regret. But in the future, I hope Rusts compilers become better.
There are two things I consider necessary for that. First, that the Rust compilers become smarter in proving the safety of things by themselves. Second, that the Rust compilers become capable of verifying proofs given to them that show the safety of a given piece of code the compilers can't prove as safe on their own.
I think that it is a worthwhile goal to be able to someday formally prove all the unsafe blocks correct in, say, the standard library and popular crates.
However, I honestly feel that the Rust language itself isn't really the right language to be doing these kinds of proofs in. I think that the right language to do the actual formal verification in is likely to be something closer to Coq. Whoever undertakes this effort would probably use an automated theorem prover to prove the unsafe Rust code correct, like was done for seL4 using Isabelle.
You can think of this setup as offering a sort of layered verification: once the small core of code (the unsafe code in the standard library and popular crates, say) is proven correct, the type system and borrow checker effectively prove the rest to be memory- and thread-safe. In fact, that's what would make this system practical: most programmers wouldn't have to understand anything about the complexities of the theorem prover. They would get the benefit of verified memory- and thread-safety for free just by learning Rust.
I also thought foremost about standard libraries and other often used code, or say, critical code in some OS.
We don't need to verify every single occurrence of unsafe, but whenever unsafe is necessary I feel the lack of some other guarantee holding our back.
Having optional small-scale(!) verification in Rust would be awesome.
Using other theorem provers has the same problems as when one tries to establish tools for checking memory safety in C: it isn't the default, just an addon.
In my eyes, there is a scale of verification-readiness in which Rust can position itself. The least ready would be having it completely separate and done by other tools in other files, the most ready would be having syntax for it in the language, having it in the same files, and checked by the standard compiler.
I think every bit of verification-readiness Rust has by default will have a strong effect that we can't achieve through other means.
Maybe, some things don't impact those not interested in verification. Say, have next to 'safe' and 'unsafe' also the 'verified' environment that also contains proof language as a core part of Rust. That way all ordinary code is still valid and everyody is free to use Rust without verification in mind.
Any ideas what could be done to make Rust verification-readier?
Rust needs unsafe to be Rust. It's reasonable to expect (safe) Rust to become more expressive over time so that more things can be written in safe Rust (or written more conveniently) but expecting the escape hatch to go away entirely is misguided. It is as unlikely as C++ or C trying to do away with inline assembly.
IMO that's less of a Rust problem than a problem for the next generation of verified languages. Probably total functional languages, or maybe just usable versions of Coq.
The Rust compiler will not solve the halting problem. It is pretty trivial to write programs which are safe if and only if they halt. So 100% safe is simply absurd.
You don't need to solve the halting problem just to verify an existing proof of a semantic property, nor to use smarter heuristics to avoid requiring such a proof. 100% safe is totally reachable, though I bet the syntax would be pretty hairy added to today's Rust.
Hmm, good point. You'd have to extend the formal verification into the C code, at least. If you can do that, it might be easier to just write verified C.
Inline ASM is exactly as verifiable as the underlying CPU, given an adequate model in the verifier. That's probably easier than verifying C, which introduces extra ambiguity in its semantics. But yeah, verifiable CPUs would be nice.
That may be due to a quirk of how we use it in English (and maybe other languages, but I can't speak to that). For some reason, unsafe or not safe isn't always perceived as a logical negation as some other concepts usually are, and is instead parsed as opposite of safe. "Verified" seems to suffer from this less, but is also less strong in what it positively implies.
Is it possible that the link to personal safety of the terminology is what provides the best positive connotation in terminology but also causes this unwanted ambiguity in negated interpretation? If so, that's an annoying catch-22...
The word that I have seen in similar contexts is 'trusted', which I like and would have preferred -- the block has extra privileges and isn't machine verified. Some people tend to give 'trusted' an opposite reading when they first come across it, though.
The problem with that word is it doesn't say who is doing the trusting, which is the crucial point. In fact, "trusted" can be used to describe both safe code and unsafe code. In the safe code, the programmer is trusting the compiler. In unsafe code, the compiler is trusting the programmer. Both code environments are "trusted," but the trust is being given to different parties.
Oh yeah, 'trusted' would definitely give the opposite of the intended meaning! I had to read your comment twice just to get why you were calling it that.
Agreed. Pointers are unsafe. References are safe, & almost powerful enough to make one think they don't need pointers. But if you find yourself needing pointers over references, they're there
It can't. And there are many things std does that are in unsafe blocks. There's nothing wrong with that if you have a specific use case for it. (And wrap that functionality in a safe function)
I find that evangelical Rustaceans often speak ill of C++ without knowing how one would solve a problem in C++, especially C++11/14/17.
I’ve been told that “it takes a decade to grow a good C++ developer”. I think that’s an overestimate, having only written it for a few years, but it does take a lot of skill to write good C++, while Rust’s borrow-checker enforces safe practices.
> The entire problem here is incorrectly assuming that "A owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed constraint in some fashion, but it certainly isn't a necessary one.
Rust has support for weak references[1] to support this exact use case.
For 90% of uses cases A owns B tends to produce much cleaner architecture. Having ambiguous ownership is just a recipe for resource leaks and non-deterministic behavior.
There's no safe, general way to do shared ownership without ref counting or garbage collection. You can't expect Rust to provide one; it already provides ref counting and raw pointers, which is as good as any other Non-GC language does.
Yes but single owner and refcount of 1 are semantically the same. If you want to be strict about this just keep the Rc private to the implementations and only hand out Weak<T>.
Not quite what I meant. What I meant was that the hidden assumption is that ownership must be expressed through a direct pointer, leading him to ignore the possibility of something like std::list, which doesn't have a direct pointer to all the nodes.
What? No it isn't. The entire problem here is incorrectly assuming that "A owns B" implies "A has a pointer to B". I don't know if this is a Rust-imposed constraint, but it certainly isn't a logically necessary one. Just do what C++ (and C#, etc.) do: the data structure (std::list, etc.) owns all the nodes, and the nodes merely reference each other. The nodes don't own their siblings. It makes sense and it doesn't get tricky.