This is C to Rust, not C++ to Rust. That's important. DARPA was willing to award...

juliangmp · 2025-10-02T08:21:50 1759393310

> The test case which caused a segfault in the C version also caused a segfault in the unsafe Rust version. At least it generates compatible code.

I mean... That's what I would expect. It's a translation not a rewrite. The question here is really “what's the end goal?”. Are you gonna refactor parts of the translations output, or is that too much work? Cause a pure translation won't give you anything besides a different compiler and build system. Ah well I guess you can claim you're memory safe and blazingly fast

Animats · 2025-10-02T18:02:56 1759428176

The end goal for DARPA's project is safe Rust. Translate unsafe constructs to safe constructs. C arrays of unknown length become Rust arrays or Vec items of known length. Pointer arithmetic becomes slices. (This covers most string manipulation, a high-risk activity in C.) Ownership translation probably will involve too much Rc/RefCell at first.

The key to this is using an LLM to recognize idioms, then formal methods to verify that the LLM guessed right. Which is what the current project is trying to do.

For example, consider C code with a function:

    void foo(int* tab, size_t len) { ... }

TRACTOR needs to do something like:

- Look at foo and its callers. Determine whether "tab" is an array or an indirect pass of a single integer.

- Look at foo and see if tab is ever modified. That determines whether it is mutable. It may be necessary to go down the call chain. This analysis may not be able to determine if it is mutable, but if determines it is immutable, tab can be passed immutably.

- Ask the LLM to figure out the size of tab from context in both foo and its callers. Is len the length of tab? That's a common idiom in C an LLM can recognize, but it might be a wrong guess. Is it the length in bytes or in ints? Remember, the programmer had to be able to figure this out, so the LLM has a good chance of doing it.

- If tab is immutable, hypothesize that the correct translation is

    fn foo(tab: &[int]) { ... }

and within "foo", len comes from tab.len()

Now attempt to use "lifting" formal analysis, as mentioned in the paper, to see if that changes the semantics of the program. Does "foo" subscript out of range? Can you prove that it doesn't?

This problem needs both LLMs and formal methods. LLMs to infer intent, and formal methods to check that inference.

estebank · 2025-10-02T15:26:49 1759418809

There's one reason I could think of for translating from C to Rust without further changes: simplifying the build. If you can remove the need for a build.rs script running make, that's one headache less. At that point you have exactly the same behaviour as wrapping the original C code in a -sys crate, so all of the usual caveats are warranted.