Hacker News new | past | comments | ask | show | jobs | submit login
Ask not what the compiler can do for you (github.com/juspay)
100 points by manojr13 on May 5, 2023 | hide | past | favorite | 79 comments



I don't mean this to sound superior or critical.

I found this article a little surprising because the authors seem to have an intuition about the compiler that I have never had. I've never expected potential stack overflows from unbounded recursion to be caught by the compiler. I've been delighted every now and then when a compiler gives me a warning when it can detect it. But I have always had stack overflows in my mental bucket of "stuff it's my responsibility to think about".

Maybe this says something about how compilers have gotten smarter and more helpful since I first cut my teeth on them.


What I also think is weird is that people think seg faults and crashes are "unsafe".

In the terms of Cardelli, a stack overflow is a trapped runtime error -- and thus a SAFE behavior of a language runtime.

What's not safe is when your program keeps running in an unknown state:

- buffer overflows - who knows what it does after memory corruption?

- integer overflows, especially in C where the behavior isn't defined. But in Rust's release mode, these runtime checks are omitted, so that's unsafe too.

---

On the other hand, the behavior of null pointer exceptions and a stack overflows is SAFE. It protects you from undefined behavior at runtime.

Crashing is a DEFINED behavior -- it's also what happens on divide-by-zero by default, and apparently when Rust's allocators run out of heap memory.

This isn't a theoretical distinction, because diagnosing and fixing NPEs and stack overflows is easy compared to the other two.

And crashes can lead to denial of service, but not remote code execution, which is WAY worse.


I think that you're the weird one here.

The most common reason to GET a segmentation fault is memory corruption. Frequently from a buffer overflow. Which makes them the canary in the coal mine for bugs that often turn out to be exploitable security vulnerabilities.

So if a program segfaults on me and I don't know exactly why, I should dial my paranoia up and expect that it is insecure. That an attacker who looked at it could use it to take over my computer instead of simply leaving me with a core dump.


Instead of stack overflow causing a crash, it could silently overflow into heap memory. Do you want that?

What I'm saying is that crashes are the thing that PROTECT you from undefined behavior at runtime.

---

If a C program crashes, then you might suspect memory corruption.

But here's the thing: if it DOESN'T crash, you actually have NO less reason to suspect memory corruption! (This isn't theoretical, I'm certain that there are thousands or millions of C programs that have memory unsafety 100% of the time at runtime, especially in embedded systems.)

If a Rust program crashes, would you suspect memory corruption? It could be, but it's still BETTER if it crashes 100% of the time than if it nondeterministically crashes.

---

Do you think there are any other options besides crashing or silent undefined behavior?


I agree that segmentation faults are the right thing to do in a bad situation. But the fact that we are in that bad situation says that there is a problem that we should worry about.

As for memory corruption, you're right that nothing can ever prove that code has no memory corruption. But that doesn't stop crashing from being a pretty good signal that THIS program is experiencing corruption NOW. Which should raise our alert level.

I don't have enough experience with Rust directly as opposed to C to have a great sense of how valuable this signal is for those programs. But then again the programs I see segfaulting all of the time tend to be C programs. So the problem is kind of moot.


> So if a program segfaults on me and I don't know exactly why, I should dial my paranoia up and expect that it is insecure.

This is true in that the program contains at least one bug, but some memory errors are safer than others. A SIGBUS is safer than SIGSEGV (usually means you mmapped a file and got an I/O error reading it) and a read is safer than a write.

This comes up if you're prioritizing crashes from fuzzing.


Rust has a particular definition of unsafe that they've documented here: https://doc.rust-lang.org/nomicon/what-unsafe-does.html

They have also documented some frequently discussed behavior that is not considered unsafe here: https://doc.rust-lang.org/reference/behavior-not-considered-...

For each of the items on the second list, I believe there are languages that would consider those behaviors "unsafe", but rust doesn't - ultimately it's an arbitrary choice. So, it's important to be clear about what model of "unsafety" we're talking about.


Yes definitely, which is why I mentioned Cardelli. Previous comment on the same subject: https://news.ycombinator.com/item?id=21832009

A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages. Therefore, safe languages rule out the most insidious form of execution errors: the ones that may go unnoticed.

-- Luca Cardelli

It's certainly fine for Rust to have a keyword called "unsafe". But as you note, users should remember that this isn't equivalent to semantic safety.

---

As a trivial example, integer overflow IS unsafe in Rust, but it's allowed WITHOUT using the unsafe keyword.

Any language that traps integer overflow, regardless of debug/release mode is safer than Rust in that respect.

Python is also safer than Rust in that respect, because it allows arbitrary sized integers.

Rust made a tradeoff for efficiency, which is appropriate for its use cases, just like C did at the time.

---

Programmers often use/abuse terms from computer science, which is perhaps OK when the difference doesn't have engineering consequences.

Here it absolutely has consequences -- a crash is SAFE because it protects you from undefined behavior at runtime. Crashes are also easy to debug.

I think the fallacy is that there's any other option. Many errors can't be caught at compile time, because they're simply incomputable problems. The mathematically BEST behavior is to crash.

Related thread: https://old.reddit.com/r/ProgrammingLanguages/comments/tqtxf...

e.g. "Regular expression" was abused by programmers, and that one also has significant engineering consequences. O(n) time and O(1) space are desirable engineering properties.

So if people let one definition of "unsafe" take over, then they've lost the ability to think clearly about the problem.

There are languages that are safer than Rust now (e.g. Python), and there will likely be languages in the future that are both safer and more efficient. (Maybe Mojo if they do something different WRT bounds checking).


> As a trivial example, integer overflow IS unsafe in Rust, but it's allowed WITHOUT using the unsafe keyword.

To reach this conclusion you need to decide that even though the behaviour (in compiler modes which don't panic on overflow you get wrapping) is well-defined it's somehow an "untrapped error" which sounds a great deal like nonsense to me. Maybe you've been talking too much to the gentlemen Babbage complained about, "if you put into the machine wrong figures, will the right answers come out?"

If you can afford to do overflow checking in production and would rather panic, today your two options are: checked arithmetic (e.g. 127i8.checked_add(127i8).unwrap() will panic) or to switch on the overflow checks even in optimised builds.

Some day, perhaps a Checked<T> will join Wrapping<T> and Saturating<T> so that you can explicitly express this in the type itself. I actually find Wrapping<u8> is rather pleasant to work with, having the same properties as unsigned char in C but with the overflow behaviour more explicit as part of the type.

As to what can't be done. You're think about General Purpose languages, but there's no good reason that so much of our software should be written in a General Purpose language. When the language can't express "Email the new model schematics and 2024 financials to our competitors in China" even if that was specifically what you wanted to do, good luck to hackers trying to make your program do that when it wasn't supposed to. This is the rationale for languages like WUFFS.

WUFFS doesn't have integer overflows, they don't compile. That is what safer than Rust looks like, not Python.


> And crashes can lead to denial of service, but not remote code execution, which is WAY worse.

Some crashes in the past have lead to remote code execution, because a program that was supposed to be guarding/filtering something was caused to die, and the system failed open.


Yeah that's an interesting consideration, but it's more of a systems design issue, outside the scope of what the C or Rust compilers and runtime can do anything about. (Or Python, JavaScript, etc.)


Dereferencing a null pointer is safe only on systems with an MMU to mark the 0 page as inaccessible so the processor can trap when it's accessed. On embedded systems with no MMU then 0 is a valid address just like any other.

By making null pointer dereference unsafe the compiler doesn't have to insert null checks for every single pointer access to maintain the same behavior as a CPU with an MMU.


> On the other hand, the behavior of null pointer exceptions and a stack overflows is SAFE.

For many languages only if the OS (sometimes with a bit of help from the runtime/ compiler) protects you against them. Not all language specs require programs to abort when they occur, if only because checking for them on systems without a MMU is expensive.


how is integer overflow unsafe? integer overflow in rust is not ub, unless you opt into the *_unchecked methods


The branding (or at least the zeitgeist) around Rust is that it is a "safe language", which implies that bugs are all logic bugs or somehow the fault of the programmer for doing something "unsafe" (hence the keyword).

Anyone who thinks about it for a bit knows there are limits to guarantees of safety. It's notoriously impossible to prove that a program terminates appropriately, for instance.

So I can understand some shock to run into the arcane all of a sudden. It must be especially shocking because it feels especially out of place.


Rust is a safe language for a specific definition of safe regarding a class of memory errors.


Regarding multiple classes of memory errors I think. Many languages are memory safe from control flow takeover, rust extends this to data races corrupting data.


Rust (without `unsafe`) protects you from Undefined Behavior. Behavior that is defined as crashing your program doesn't fall under that.


I have chosen this as a hill to die upon:

Philosophically, the Halting Problem and the Pigeonhole Principle are very similar. They both talk about what is not possible with arbitrary input, but they have very little to say about practical input. You can't compress a random number generator. You can compress human communication, in any form, some of them famously. The existence of PP or HP are only boundary conditions, as entropy goes to infinity.

Humans also call entropy 'chaos', which has a connotation that's doubly meaningful in information theory. Chaos obfuscates. It's only useful when obfuscation is the point (encryption). We run away from it as fast as we can, and the farther we get from in the more interesting opportunities open up. Languages that disallow certain patterns can solve the halting problem for pieces of the application, maybe even entire 'useful' applications.


For some categories of applications, or even components of applications, the inability to meet a deadline in handling an event is a safety violation in the category of "or else people could die". I mentioned the halting problem instead since it's close enough to those requirements for an HN comment.

Rust has nothing special to say about that definition of safety, except that it thinks it has a better model for avoiding certain kinds of memory bugs that could contribute to those issues. Given that, I can see that "safe language" as a concept can be confusing and even leave engineers a little unprepared for unexpected categories of bugs.

Anyway, I brought up the halting problem to illustrate. I'm sure there are other illustrations you can substitute regardless of your opinion on the halting problem.


Rust is pretty clear about “safe” meaning no memory unsafety in unsafe-free code. It certainly doesn't market that code without unsafe will be bug free (this would force the safe subset to no longer be Turing complete).


It can still be surprising because in other languages it may be the case that a stack overflow can lead to memory unsafety. However, in Rust, a stack overflow leads to a guaranteed abort on major platforms, and a best-effort abort on niche platforms (the reason being that LLVM only provides stack probes for major platforms, so niche platforms have to make do with only guard pages).


My understanding is that there is a decent amount of undefined behavior on stack overflow in Rust, due to its use of TLS.

https://github.com/rust-lang/rust/issues/43146


I'm unclear how that would lead to undefined behavior on stack overflow. Stack overflow on Rust is specified to immediately abort the entire process (with some aforementioned wrinkles as to how achievable that is on certain targets), which necessarily nukes all TLS.

The fact that the linked issue isn't tagged with "I-unsound" suggests that nobody has ever indicated that that might somehow be a soundness issue. If you think otherwise, I encourage you to bring it up so that any soundness issue can be properly tracked.


To push this point a little further: in Rust when there's not a clean way to prevent memory unsafety (or when you haven't requested to), it aborts the process. Out of array indexing, stack overflows, etc.


Minor clarification: there's a difference between a "panic" (which may either unwind or abort, based on compilation flags) and a "guaranteed abort" (which is not negotiable). Array-index-out-of-bounds is merely a panic, as are almost all "unrecoverable errors" in Rust. Things that are guaranteed to abort are rare in Rust; I can only think of stack overflows, and OOM for certain stdlib types.


Various refcount overflows also abort: mpmc, Arc, Rc: https://doc.rust-lang.org/src/alloc/rc.rs.html#2628

I think we're lucky the usual "aha rust isn't so safe after all, checkmate rustaceans!" crowd hasn't run into these aborts yet, because they won't have any kind of debugging spew so are not very friendly. You have to intentionally leak refcounts to hit these aborts or else you first run out of memory just storing the pointers; but then again, intentionally leaking things is just the sort of thing the anti-rust crowd is likely to try.


Mostly once they discover they can leak things, Rust is pronounced "not really safe" by such people anyway.


My understanding is that the code which aborts the process contains the potential for UB.


As far as I know, in the case of stack overflow, the code that aborts the process should literally just be a single line of assembly that contains an invalid opcode, resulting in a (safe) segfault. What has given you the impression that there is potential for UB in these code paths?

EDIT: ah, I see your other comment.


Stack overflow in Rust does abort the process, but not immediately: the signal handler first distinguishes stack overflow from other sources of SIGSEGV/SIGBUS, and then prints a message containing the name of the current thread. Both of these require accessing TLS, which in turn may panic, or invoke other non-async-signal safe functions. That is UB.

I don't know what the definition of a "soundness issue" is, but I think there are some nasal demons to be found here; it will be fun to try at least.

edit the linked issue missed a case! Rust aborts by using libc::abort(), which is not async signal-safe in glibc prior to 2.26.


I have put a bug report where my mouth is. https://github.com/rust-lang/rust/issues/111272


Thanks for the report! We can continue there. :)


My point is more that debugging a core dump (and similar activities) would be all the more foreign. The engineer might even be shocked that the need to do that sort of thing even exists for a piece of Rust code.


It's actually very easy for rustc to raise a compiler error if you have recursion inside an async fn. It naturally falls out from the way async is implemented, because the resulting Future type would be infinitely sized.

An `async fn` essentially compiles to a state machine where each `.await` is a transition to another state, and local variables and the future being `.await`'d are data of that state. If a future were to `.await` itself, that state data would contain the future itself, which means the future would need to be larger than itself, which is impossible. Similarly corecursion would simultaneously require that `FooFuture` contain `BarFuture and `BarFuture` contain `FooFuture`, which would require both to be bigger than the other, which is again impossible.

The solution to the infinite size problem is to wrap the future in `Box<dyn Future>` before `.await`ing it, so that the state data is a fixed size (`Box<dyn Trait>` is always two pointers long, a data pointer and a vtable pointer). Since TFA was using the `#[async_trait]` macro which changes the function to return `Box<dyn Future>` (as the stable alternative to async-fn-in-traits / TAIT), it ended up suppressing the recursion error too.


I have the same thought! But then again when I tried a simple example in clang the -Winfinite-recursion warning triggered. In the compiled code, clang compiled the function as just `ret`, which is alright because C++ standard says if a thread does not eventually do an atomic op or call certain functions then it's undefined behavior.

Still wouldn't recommend anyone to rely on the compiler to catch this.


> But I have always had stack overflows in my mental bucket of "stuff it's my responsibility to think about".

I remember working on kernel modules, and coming from userland being surprised to find kernel stacks of 16k or 24k are common. Anything that could help with that without having to do kernel debugging on weird errors would be great.


Static stack analysis is a thing, though it's a thing which annoyingly lacks any real open-source implementation, and any good tooling whatsoever in my experience. It does limit what you can do though, since dynamic dispatch and recursion can quickly make it fail.


It's certainly a Rust thing, and definitely a part of the appeal.

It doesn't remove the "stuff it's my responsibility to think about" but it certainly alleviates it. Your comment actually sounds like the first step to becoming a 'Rustacean'.


Fortunately this particular problem should go away soon. Async-fn-in-traits is due to be stabilized in the next few months [1], which means that the use of #[async_trait] will no longer be necessary, which means the compiler will correctly surface a warning for unbounded recursion.

[1] https://blog.rust-lang.org/inside-rust/2023/05/03/stabilizin...



The lack of warning does not seem to be related to #[async_trait] -- it's reproducible on a bare function, as shown in the post and in sibling comment's 2nd link.


The lack of warning is from returning `Box<dyn Future>`, which `#[async_trait]` emits. ( https://news.ycombinator.com/item?id=35834091 )

With async-fn-in-trait that flooow linked to (and more generally type-alias-impl-trait) there won't be a motication to use `Box<dyn Future>` in the first place.


I don't fully understand the context or the overview of the problem, but for a large codebase, if you are using `tokio` runtime, which most rust based application go with. `tokio-console` is good for profiling and debugging, have you folks looked into that during the whole debugging process? I personally trust debuggers and debugging tools in such circumstances, as they can easily help zeroing down on such issues.


Is this the case with all statically typed languages? I think I have faced the same issue with PureScript

(PS: I'm a contributor)


With exclusively static dispatch I feel like it should be possible to unconditionally detect at least the presence of recursion at compile-time, but in the presence of dynamic dispatch (as in the article) I think it might be impossible for a compiler to guarantee that it can statically detect recursion. (I have no basis for these claims other than gut feeling.)


> With exclusively static dispatch I feel like it should be possible to unconditionally detect at least the presence of recursion at compile-time,

This is equivalent to solving the halting problem, and therefore impossible in the general case.

Tools can use heuristics to make a guess. Dynamic dispatching to mutually recursive functions is one example where those tools fail. But that also includes things like checking a conditional before calling a function. The tool can check unconditional infinite recursion, but it can't tell you for certain that a branch that may be infinite is always taken.


Whether or not the recursion ever successfully reaches the base case in order to terminate is certainly equivalent to the halting problem, and thus undecidable for Turing-complete languages.

However, I was making a much weaker claim, which is that if you have exclusively statically-dispatched function calls, it should be possible for a compiler to detect the possibility of recursion 100% of the time.

For every function, you should be able to make a complete list of every other function that it calls directly (including branches that may never actually be taken at runtime (which it cannot know, because of the halting problem), making this overly conservative, like any type system). This list is both perfectly precise (because of static dispatch) and finite. With this, you can then construct a graph and run a cycle detector over the graph.


You can also determine that _some_ programs terminate. That is, there is a class of recursive programs that the compiler can look at and say "yes this terminates". Where it fails is that the programs that it says "no" to may still terminate. (So it's a "yes" or "maybe".)

Languages that I've worked with that do this are Lean4, Idris2, and Agda. It's needed for dependent type checking as you only want to run total functions at the type level (to ensure type checking terminates).

Idris will build the call graph as you described and analyze whether values get structurally smaller as they pass through any cycles in the graph. (Also lexicographical stuff like "arg1 gets smaller or arg1 stays the same and arg2 gets smaller".)

I don't know the details of Agda.

Lean has a couple of things that it tries: detect straight up structural recursion, detect general recursion where an argument gets smaller, and it also lets you provide a function on the args for what gets smaller (e.g. length - ix) and a proof that it gets smaller.

I gave this a try with a heap implementation that I wrote. I worked through the proof and hit a wall having to show 0 < 0. It turned out that I had an off-by-one error (I was doing the math for a starts at 1 array). Once I fixed that, I got my proof and the compiler certified that it was total.

Writing proofs is not something your average programmer would want to do for the average program, but I think there is some utility in the compiler being able to say "yes this terminates" or "maybe this doesn't terminate".


Even without object-oriented dynamic dispatch, if the language supports closures or function pointers, you still lose the ability to statically analyze all call targets.


I'm comfortable saying that any direct use of function pointers counts as dynamic dispatch, unless your language's semantics for function pointers are especially strict or static. :)

As for closures, I'm not entirely convinced; closures in Rust don't support recursion because they're lexically-scoped (unless "something something y-combinator", which I confess I don't grasp), and otherwise closures are (usually) just as statically-dispatched as any other function call.


Yes, I wasn't sure if by "dynamic dispatch" you mean "object-oriented polymorphic method calls" or more generally any kind of indirect calls which would include calls through function pointers.

Once you have callsites where you can't resolve the target function statically, you've lost the ability to statically prove the absense of stack overflows.

With Rust, I think methods on trait objects should be enough dynamic dispatch to make it impossible to prove the absence of stack overflows. But I'm not enough of a Rust programmer to know for sure.


Yes, perhaps "indirect call" would better capture my thinking here rather than "dynamically dispatched". :)

> With Rust, I think methods on trait objects should be enough dynamic dispatch to make it impossible to prove the absence of stack overflows.

Well, trait objects are absolutely dynamically-sized types, but in order to store them on the stack you'd need to stuff them behind a Sized pointer type. But there might be other strange edge cases that prevent the ability to guarantee the absence of stack overflows (to say nothing of the fact that stack size differs between platforms).


> Well, trait objects are absolutely dynamically-sized types, but in order to store them on the stack you'd need to stuff them behind a Sized pointer type.

I don't think the question here is about Sized vs. !Sized, but instead about the analyzability of the program's control flow. Aside from layout information, a trait object's vtable is literally just an array of function pointers. So since the target of the dynamic dispatch can only be computed at runtime, proving that it is never recursive can only be done with escape analysis of all trait objects, alongside all function pointers.

(Also, you're correct that Rust does not support directly recursive closures; you can use opaque types to get around the naming problem, but the compiler sees that the closure type becomes cyclic. At least one closure in the chain would have to use dynamic dispatch.)


It doesn't remove it completely: if you have strong type safety on your dynamic dispatch and function pointers, you can use that to restrict the possible control flow. You can also constrain things further if you can relate objects or function pointers passed into a particular function to a callsite. But it does rely on whole-program analysis and while you can make an analysis that will either find the full callgraph or report an error you can't make it accept all valid inputs and reject all invalid ones (because this would still basically reduce to the halting problem).

In practice it's often close enough: I have a hacked-together static stack analysis tool I use for an embedded system and it can deal with function pointers used for callbacks with basically one additional annotation (basically saying "any function passed in as a pointer to this function can be called from this other function"), and it works despite not actually having all the information the compiler would have. I would not however expect it to work on an arbitrary application, especially one which pulls in many libraries, which was not designed with at least some attention to allowing this kind of analysis in mind.


> With exclusively static dispatch I feel like it should be possible to unconditionally detect at least the presence of recursion at compile-time,

That's true, and Rust as a language does do so, but in case of recursion for `async` function rust itself suggests to go the `dyn` route. Adding more to the post, it seems that the use of `async-trait` was the reason that they didn't have any choice in the matter to choose static dispatch, as the language as a whole doesn't support async traits, (just yet! ^-^)


I don't think the recursion in this case is dynamically dispatched. It returns a dyn Future, but that isn't the same thing.


I haven't thought about it closely enough to know whether or not it's feasible to detect recursion in this specific case of dynamic dispatch (compilers are certainly capable of devirtualization, after all). However, any time you see a `dyn Foo` in Rust, any method called on it will be dynamically-dispatched (unless the aforementioned devirtualization optimization happens to kick in): https://doc.rust-lang.org/std/keyword.dyn.html


We are not calling a method on a `dyn Foo`. The type in the original example is statically known to be `KafkaWrapper`. The reduced example case isn't even a method -- it's just an ordinary function:

  fn do_something() -> Pin<Box<dyn Future<Output = ()>>> {
      Box::pin(async { do_something().await })
  }
(The infinite recursion happens before anything is ever returned --- the return type is irrelevant.)


> (The infinite recursion happens before anything is ever returned --- the return type is irrelevant.)

This is not true, the closure is not actually called until the future is awaited. Each instance of do_something does actually return.


I'm not an expert on fuzzing, but I think a fuzzer would be able to discover this issue.


Any UT of the function would have caught this issue.


Sure, but fuzzers are best-effort heuristics, rather than a static guarantee of correctness.


Not sure rustc can do a lot about this case but maybe a static analysis tool can complement it.


To some extent, all compilers are static analysis tools.

I think clippy could be enhanced to spot this.


Yep, it likely could. Spotting something that usually is an error now being wrapped in `Pin<Box<...>>` would hopefully be not too hard.


If it can detect the recursion in the non-dyn return case, it seems like it ought to be able to do the same once dyn is added.


Are there any tools to identify stack sizes and function traces?


Have not used it myself but looks like there is StackAnalyzer[0]. Have done it myself by adding analysis to angr[1] but it pretty much boils down to trying to solving the halting problem for many large programs (an unbounded recursion or alloca makes the max stack size infinite)

[0] https://www.absint.com/stackanalyzer/index.htm

[1] https://angr.io/


For rust code, I have found https://github.com/japaric/cargo-call-stack to be the best available option, as it does take advantage of how Rust types are implemented in LLVM-IR to handle function pointers / dynamic dispatch a little better. An even better solution would try to use MIR type information as well to further narrow down targets of dynamic calls in a Rust-specific way, but no such tool exists that I know of.


We spent decades trying to invent a sufficiently smart compiler when we should have been inventing a sufficiently empathetic one - @ekuber

It all makes sense now


Is there any flag in Rust's compiler, like GCC's `-fstack-check` that checks for such behavior?


rustc already tells LLVM to use stack probing for sufficiently large stack frames, so it can give a short error message on stderr ("thread 'whatever' has overflowed its stack") before aborting the program. But the author is looking for a backtrace here.

The problem is that printing a backtrace is the responsibility of the program, as part of the global panic hook. But running arbitrary user code (e.g., Backtrace::capture()) during a stack overflow would be extremely unsafe, since the thread could be in the middle of some operation that does not expect to be interrupted.

The soundness criteria would likely be very similar to those for async signal handlers, which are extremely platform-dependent, so it would be a challenge to document all the safety preconditions. An alternative would be to especially bless backtrace-rs for stack overflows alongside the default panic hook, but I don't think even it can guarantee complete safety on all supported OSes.


> The problem is that printing a backtrace is the responsibility of the program, as part of the global panic hook. But running arbitrary user code (e.g., Backtrace::capture()) during a stack overflow would be extremely unsafe, since the thread could be in the middle of some operation that does not expect to be interrupted.

> The soundness criteria would likely be very similar to those for async signal handlers, which are extremely platform-dependent

This is more or less exactly the same as any stack-print-on-crash handler you might be familiar with. In theory you should be able to unwind using DWARF eh_frame data from anywhere. In practice this stuff isn't always perfect and it's much easier to rely on framepointers, if your compiler emits them. But switching to an alternative signal handler stack and unwinding your main stack is mostly worth trying -- worst case, you're already crashing.


Of course, it's far from impossible, I'm just saying it wouldn't be a walk in the park. The backtrace handling would likely have to be fixed in the runtime (probably enabled with a -Zbuild-std option) rather than being user-configurable, since there's hardly any way to start documenting which standard library features are async-signal-safe.

But even then, doing it in a POSIXly-correct way is hard: already, the standard library unsoundly uses thread-local storage in the SIGSEGV handler (not async-signal-safe per POSIX) to verify that it came from hitting the thread's guard page. backtrace-rs mostly delegates the hard work of unwinding to _Unwind_Backtrace on Unix-like systems, but it unconditionally uses a global mutex (not async-signal-safe) to protect backtrace operations on all platforms, perhaps because of dbghelp.dll on Windows.

I think _Unwind_Backtrace is async-signal-safe (since libunwind claims to be async-signal-safe, and it supposedly uses _Unwind_Backtrace), so it could probably be done. But someone would have to put a fair bit of work into both the backtrace-rs API and the SIGSEGV handler in std, especially since _Unwind_Backtrace apparently needs additional incantations to work properly from a signal handler.


Is this what happens when we are too complicit and entirely depend on our tooling to find mistakes?


I guess we can never expect the tools to find all the mistakes only the straight-forward ones (which seems to be a moving target)...

This could be an example of when things fall through the crack, while rust & its tooling eliminates most of the errors during compile time, it's not entirely impervious...

Overall I'd prefer depending on tooling to catch a subset of errors (which could miss 1 out of 20 times), rather than the added manual review effort with each change...


What happens? We debug and find the error exactly like we would have to if the tooling was “worse”


ask what ChatGPT can do for you instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: