Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is C to Rust, not C++ to Rust. That's important.

DARPA was willing to award multiple contracts, but the only one awarded is to a group from several universities. [1]

"The team’s approach, Formally-Verified Compositional Lifting of C to Rust (which forms the acronym ForCLift, pronounced “forklift”), uses Verified Lifting, which combines formal methods and program analysis with AI techniques such as Large Language Models, so as to create accurate translations of complex C code into safe, idiomatic Rust. The approach seeks to enable formal verification of the translated code while also preserving performance-critical behavior."

This involves something called "Verified Lifting."[2] "Metalift uses verified lifting to search for possible candidate programs in the target language that the given input can be translated to." So it's looking for matching idioms in the target language and using proof techniques to check that they are equivalent. Going uphill in abstraction automatically is rare. This will definitely be interesting. It's essential to good translation.

The old C2Rust is all downhill. What comes out of C2Rust are huge numbers of calls to functions that exactly emulate C semantics, bugs and all. I tried it once, on a buggy JPEG 2000 decoder. The test case which caused a segfault in the C version also caused a segfault in the unsafe Rust version. At least it generates compatible code.

[1] https://www.cs.wisc.edu/2025/07/15/translating-legacy-code-f...

[2] https://metalift.pages.dev/



> The test case which caused a segfault in the C version also caused a segfault in the unsafe Rust version. At least it generates compatible code.

I mean... That's what I would expect. It's a translation not a rewrite. The question here is really “what's the end goal?”. Are you gonna refactor parts of the translations output, or is that too much work? Cause a pure translation won't give you anything besides a different compiler and build system. Ah well I guess you can claim you're memory safe and blazingly fast


The end goal for DARPA's project is safe Rust. Translate unsafe constructs to safe constructs. C arrays of unknown length become Rust arrays or Vec items of known length. Pointer arithmetic becomes slices. (This covers most string manipulation, a high-risk activity in C.) Ownership translation probably will involve too much Rc/RefCell at first.

The key to this is using an LLM to recognize idioms, then formal methods to verify that the LLM guessed right. Which is what the current project is trying to do.

For example, consider C code with a function:

    void foo(int* tab, size_t len) { ... }
TRACTOR needs to do something like:

- Look at foo and its callers. Determine whether "tab" is an array or an indirect pass of a single integer.

- Look at foo and see if tab is ever modified. That determines whether it is mutable. It may be necessary to go down the call chain. This analysis may not be able to determine if it is mutable, but if determines it is immutable, tab can be passed immutably.

- Ask the LLM to figure out the size of tab from context in both foo and its callers. Is len the length of tab? That's a common idiom in C an LLM can recognize, but it might be a wrong guess. Is it the length in bytes or in ints? Remember, the programmer had to be able to figure this out, so the LLM has a good chance of doing it.

- If tab is immutable, hypothesize that the correct translation is

    fn foo(tab: &[int]) { ... }
and within "foo", len comes from tab.len()

Now attempt to use "lifting" formal analysis, as mentioned in the paper, to see if that changes the semantics of the program. Does "foo" subscript out of range? Can you prove that it doesn't?

This problem needs both LLMs and formal methods. LLMs to infer intent, and formal methods to check that inference.


There's one reason I could think of for translating from C to Rust without further changes: simplifying the build. If you can remove the need for a build.rs script running make, that's one headache less. At that point you have exactly the same behaviour as wrapping the original C code in a -sys crate, so all of the usual caveats are warranted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: