Hacker News new | past | comments | ask | show | jobs | submit login

Frankly, I'd just use unsafe pointers for the backrefs, and wrap the tree API up in a typesafe layer, and build on top of that.

RefCells seem to add unnecessary redundancy here. You'll take a hit for runtime borrow for every pointer chase up the tree. If walking from a leaf to the root is important, you don't want to add an extra compare/branch/set to every pointer chase. Turns a single memory read into a branch, a write, and two reads. Probably about 5x slower at least.




This is why, as a user of GC languages, I think a bit sad that the ergonomics are so bad for cyclic data structures, that this solution is most likely what the majority will turn to.


Thr majority never need to write a cyclic data structure, because they can use an existing one from a library.


Assuming it covers their use case, regarding API definition and O(n) expectations.


I'd take a similar approach. After all `unsafe` use cases include the implementation of data structures.


I've argued this issue before.

The answer isn't generics, Coq, or some fancy type system nobody will understand or use properly. There are only a few inherent trouble spots in pure Rust code safety. The big two are:

- Partially initialized arrays. "Vec" has to be unsafe because growing an array involves uninitialized slots. You just need a way to say "this array is initialized from 0..N only", where N is in a data structure associated with the array. Then you need an operation that says "initialize entry N+1 and update the count". That's all it takes.

"Map" could be implemented on top of "Vec", instead of using unsafe code. It would be worth trying this and seeing what the performance penalty is. That may be a premature optimization.

- Backpointers. Backpointers have an easily checked invariant relationship with the forward pointer that owns their containing object, but there's no way to tell the language that something is a backpointer.


> The answer isn't generics, Coq, or some fancy type system nobody will understand or use properly.

The solution is a formal semantics for unsafe Rust, so that programmers can prove that their unsafe Rust code is safe to use by whatever means they prefer. (Mine would be by hand.)

---

Reply to dmix:

A formal semantics doesn't have to be particularly fancy, although in Rust's case, it will in most likelihood not be straightforward either.


This is being worked on! Several bits of the rust standard library have already been proven safe in the initial model.


Is that a fancy <x> system that nobody will understand or use properly? (Honest question)


Probably. I used to do formal proof of correctness work and headed a project to build a verifier.[1] That stuff is very hard.

The partially initialized array thing is an issue of expressive power. You can't talk about that in Rust yet. This is a classic issue. The three big headaches in C around memory safety are "how big is it", "who owns it", and "who locks it". The language lacks the syntax to even talk about those issues. Rust can talk about those, which is a huge step forward.

Before you can even consider verifying something, you have to be able to talk about it in some formal language. Preferably the one you're programming in. Having to do formal specifications in a separate language is a huge headache. Been there, done that.

There are a few standard trouble spots. I've listed two of them. Most other unsafe code comes from

1) Foreign functions, which can be expected to decline over time as more libraries are implemented in Rust. (How's SSL/TLS in Rust coming along?)

2) "Optimization", which may be premature. This usually consists of bypassing subscript checks. I'd rather have the subscript checks on all the time, and see effort put into hoisting subscript checks out inner loops. (Subscript checks that aren't in inner loops usually aren't significant overhead items.)

3) replicating C/C++ code in Rust. (An early attempt was a transliteration of Doom into Rust, with lots of pointer arithmetic.)

Remember, it can blow at any seam. It only takes one buffer overflow to allow an exploit.

[1] https://github.com/John-Nagle/pasv


> Before you can even consider verifying something, you have to be able to talk about it in some formal language. Preferably the one you're programming in.

I'd personally be excited too see a modern language implement this. I saw the potential for this type of verification in my (hobbyist) dabbling with Haskell. Which subsequently inspired me to relearn math, including a great book on proofs recommended on HN which really changed the way I viewed/approached math.

The use-case analogies for formal verification can probably best be drawn from automated testing and TDD. Which is another 'optional' part of programming with varying degrees of usage - although with a lower barrier to entry.

Types/proofs seem to have a positive influence on the 'best practice' part as well, as it promotes a programming style which force you to really consider the implementations you're coding. Very similar to testing.

At the moment formal proofs tools today seems to me like a rabbit hole with questionable practical ROI so I've been hesitant to try out the current state-of-the-art.

Assuming it does get built in to the language, even if it doesn't get used by the user they will likely benefit just by being able to build on layers beneath that were proven in the standard/popular libraries. I felt a similar feeling of assurance when building on top of well-typed Haskell libraries.


We built verification into the language in Pascal-F, 30 years ago.[1] That's rarely been done since in real-world imperative languages. It's much easier to keep the verification statements correct if they're in the same file and the same language as the program.

But Pascal was a small language. Getting this into today's bloated languages is tough. Back then, we looked at Ada, sized the project, and realized it was comparable to building an optimizing compiler for the language.

[1] http://www.animats.com/papers/verifier/verifiermanual.pdf


> We built verification into the language in Pascal-F, 30 years ago.

You sound like a very interesting person to buy a beer for, assuming you'd be patient enough for my questions :p, I'll check out the paper instead when I get the time.

> It's much easier to keep the verification statements correct if they're in the same file and the same language as the program.

Agreed. Hell even using Dialyzer in Erlang/Elixir for static type checking (which I make the effort to do often in my daily side project hacking) just doesn't feel right, even though it's still appended to function definitions in the original files.

This is one of those things that need to be a core part of the language design IMO, not just a tool built on top - 3rd party, by the core team, or otherwise. But, that said, tooling can still be superior compared to the complete absence of it.

Maybe you can answer a question I've been struggling to find the answer to via Google. Do you know the name of this popular older language/tool used by to Microsoft for doing formal verification? For c/c++ style code. It sounds like tkk or something similar? I can't seem to find it.

My other question I'd ask is if you think the testing analogy applies here as I mentioned in my comment above or do you see it as an entirely different paradigm? Basically changing how you program rather than adding on a tool/skillset.


What book was that?


> Before you can even consider verifying something, you have to be able to talk about it in some formal language.

Sure, but it doesn't have to be a programming language.

> Preferably the one you're programming in.

Who says so? Programming languages (justifiedly) optimize for the ability to express computation, not the ability to express proof.

In spite of Curry-Howard, there exist important differences between proofs and programs:

(0) Proofs are primarily for humans to understand. Programs are primarily for computers to execute.

(1) Proofs are sometimes allowed to be non-constructive. Even when they aren't, an easily understandable proof beats one that corresponds to an efficient program.

(2) Allowing non-terminating programs is actually a good thing: it allows for proofs of correctness that use techniques not anticipated by the language designer. OTOH, if your logic lets you prove a contradiction, it's the end of the world (for your logic).


Proofs and specifications are different things. Users need to be able to read specifications. For example, if you want to talk about definedness for arrays, a predicate

    defined(arrayname, lowbound, highbound)
is helpful. That goes in assert statements. It's run-time checkable, if you have some extra state in the form of "is defined" flags. But you'd rather prove it once so the run-time checks are unnecessary. With a few simple theorems, such as

    defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1)
and a simple automated prover, you can deal with most of the issues around partially defined arrays.

Think of proof support as being an extension to optimization of assertions. Most assertions in programs can be proven easily with an automated prover. Users need never see those proofs. Some will be hard and require more proof support.


> That goes in assert statements. It's run-time checkable

And runtime checks help because...?

> But you'd rather prove it once so the run-time checks are unnecessary.

No. I'd rather prove it to rule out the program being wrong. The runtime check is totally besides the point.

> With a few simple theorems, such as (...)

I think you mean “proposition”. It's not a theorem until it has been proven.

> Think of proof support as being an extension to optimization of assertions.

I never use runtime-checked assertions, so I never need to optimize them away.

> Users need never see those proofs.

But, you see, I want to see the proofs. How am I supposed to maintain a program I don't understand?


    defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1)
is a theorem. It looks like this in Boyer-Moore theory:

    (PROVE-LEMMA arraytrue-extend-upward-rule (REWRITE) 
        (IMPLIES (AND (EQUAL (arraytrue A I J) T) 
                 (EQUAL (alltrue (selecta A (ADD1 J))) T)) 
                 (EQUAL (arraytrue A I (ADD1 J)) T)))

     Name the conjecture *1.


     We will try to prove it by induction.  There are three plausible
inductions. They merge into two likely candidate inductions. However, only one is unflawed. We will induct according to the following scheme:

      (AND (IMPLIES (LESSP J I) (p A I J))
           (IMPLIES (AND (NOT (LESSP J I))
                         (p A (ADD1 I) J))
                    (p A I J))).
Linear arithmetic informs us that the measure (DIFFERENCE (ADD1 J) I) decreases according to the well-founded relation LESSP in each induction step of the scheme. The above induction scheme leads to three new formulas:

    Case 3. (IMPLIES (AND (LESSP J I)
                          (ARRAYTRUE A I J)
                          (EQUAL (ALLTRUE (SELECTA A (ADD1 J)))
                                 T))
                     (ARRAYTRUE A I (ADD1 J))),
which simplifies, rewriting with ARRAYTRUE-VOID-RULE and SUB1-ADD1, and opening up ARRAYTRUE and LESSP, to the following eight new conjectures:

...

That finishes the proof of *1. Q.E.D.

"arraytrue" is defined recursively:

    (DEFN arraytrue (A I J) 
        (IF (LESSP J I) T                          -- the null case is true
            (AND (EQUAL (alltrue (selecta A I)) T) -- next element is true
                 (arraytrue A (ADD1 I) J)))        -- and rest of array is alltrue
And yes, there's a machine proof that this terminates.


> defined(a,i,j) and defined(a[j+1]) implies defined(a,i,j+1) is a theorem. (proof)

Sure. My point is just that a theorem is a proposition equipped with a proof. If the user just enters a proposition into the system, then they aren't entering a theorem. They're entering a proposition that the system can turn into a theorem.


> > But you'd rather prove it once so the run-time checks are unnecessary.

> No. I'd rather prove it to rule out the program being wrong. The runtime check is totally besides the point.

That's the same thing. It's just a matter of it being automated.

> > Think of proof support as being an extension to optimization of assertions.

> I never use runtime-checked assertions, so I never need to optimize them away.

If you've ever used a language that uses bounds checked array access, you have. IIRC, you've used Rust some, so you've used bounds checked array access, except where Rust was able to prove that out of bounds access was impossible and elide the code that checks.

> > Users need never see those proofs.

> But, you see, I want to see the proofs. How am I supposed to maintain a program I don't understand?

There's a difference between need and ability. Just because it's stated you don't need to see something doesn't mean you can't. If you want to see a proof, you dig in and find it, or find where it's accepted canon that we don't need to reproduce (do we need proofs for integer addition? That seems of limited use to me, but maybe a formally proved as much as possible all the way to that level language has some interesting benefits).


> That's the same thing. It's just a matter of it being automated.

Re-read what he said. He presented runtime checks as an acceptable but non-optimal scenario for performance reasons. My reply was that runtime checks don't prevent a wrong program from being wrong, and it's this wrongness itself that I consider unacceptable.

> IIRC, you've used Rust some, so you've used bounds checked array access, except where Rust was able to prove that out of bounds access was impossible and elide the code that checks.

Runtime bounds-checked array manipulation is literally the single thing I hate the most about the languages I like the most (ML and Rust). It introduces a control flow path that shouldn't be reachable in a correct program, and hence shouldn't exist. This is particularly painful because beautiful array-manipulating algorithms have existed since, like, forever, yet in 2017 I still can't express them elegantly.

> If you want to see a proof, you dig in and find it, or find where it's accepted canon that we don't need to reproduce (do we need proofs for integer addition)?

I don't need to rebuild arithmetic from scratch again, because I've already done it at some point in time, and once is enough as long as you understand the process.

On the other hand, when I'm first confronted with an already existing program, I don't understand why the program is correct (if it is even correct in the first place), so I do have to set some time aside to properly study it.


We intend for it not to be; more on that as we get closer to actually having a model.


In the meantime, Rust's support for automated tests is great.


Alas, tests aren't proofs.


Yet, tests are still useful!


Not if the goal is to prove that an algorithm meets a specification, which is more often than not my goal.


Good Map implemented on top of Vec: https://github.com/bluss/ordermap


This is a PHP map, correct? It is the one language feature that made PHP stand out from the crowd.


Python has followed suit


Very nice.


A hashmap being implemented on top of vectors is a pretty common implementation. Not sure what Rust uses though.


IIRC, Rust's HashMap uses a single raw vector containing for each entry its hash code, its key, and its value; empty entries have a special hash code as a marker, with the key/value left uninitialized. Also, the hashes are kept together (separate from the rest) for better cache behavior during the linear probing.


Wouldn't that be two parallel vectors? (Side note: nobody ever thinks to bring up cache behavior in interviews where I ask about how a hash map could be implemented - it's nice to know that the library writers care :) )

How does that special hash marker work? What happens when something actually hashes to it? Just silently increment the hash?


> Wouldn't that be two parallel vectors?

Yes, but it's a single contiguous memory allocation. (It used to be three parallel vectors in a single allocation, with keys and values also kept separate to avoid padding between them, but experiments showed that had worse cache behavior.)

> How does that special hash marker work? What happens when something actually hashes to it? Just silently increment the hash?

Looking at the code, it always sets the most significant bit of every real hash value (since the least significant bits select the bucket, it makes no difference), and the marker has the most significant bit clear (in fact, all bits of the marker value are clear).


Oh, clever. Of course you can do it that way.


Problem is avoiding having to do Vec<Option<T>>


If T is a reference type, doesn't Rust do an optimization where <Option<T>>> is implemented by using null pointers?


Yes. This optimization is expected to be expanded in the future but there are currently "NonZero" types. Rust notices this and uses the zero value as the enum descriminant.

So in this case the code generated is idential to a nullable pointer.


Yes, but that doesn't help when T is not a reference type.


If it's not a reference type, then the Option enum value is probably small relative to the rest of the data in the type.


Option<i32> is 8 bytes




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: