I totally agree with de-emphasizing the old "recoverable" vs. "unrecoverable" dichotomy (https://blog.burntsushi.net/unwrap/#what-about-recoverable-v...). Every time I've heard programmers (especially in the context of exceptions) try to define it, I've found it imprecise and open to debate.
When invariant violations or mistakes by programmers (aka bugs) are detected, the program should halt as it is in an inconsistent state and continuing could be very dangerous (think privacy/security/data corruption). Otherwise, don't halt (handle it or have the caller handle it).
The criteria I tend to prefer is "expected" versus "unexpected" errors. I/O errors, especially network errors, are things that are going to be expected under reasonable operation, therefore it make sense that code should handle them. Similarly, user input resulting in incorrectly formatted code should be reasonably expected and therefore handled.
But the same kinds of failures might not be reasonably expected in other circumstances--I wouldn't expect that the internal configuration files of an application should occur in reasonable operation, and therefore it makes sense to panic if they're corrupted... even if the cause is an I/O operation on a local disk, or parsing some JSON or TOML or INI or whatnot file.
One implication of this is that it needs to be easy for any error system to promote an "expected error" into an "unexpected error"--which is what unwrap/expect does. The recoverable/unrecoverable error suggests that there ought to be no reason to do this, but there is absolutely a reason to do so: what category an error falls into is ultimately decided by the context of the error, not the generation of the error itself.
I write Java and for me it's not exactly "expected" versus "unexpected".
There are unexpected errors for sure. For example "StackOverflowError" which could be thrown from any method call.
There should be unexpected errors handler which does some sane thing in given circumstances. Usually it involves logging error and its details (stacktrace, may be something else) and returning some kind of generic error to the caller (e.g. HTTP 500).
But the thing is, this handler very often suitable for handling errors that I expect but I don't want to bother handling them. For example those errors might be rare enough that writing code would be a net negative (every line of code is maintenance burden and error handling code is maintenance burden power 2). And I'm totally OK with those errors being handled in generic way.
Even if it's user input sometimes. For example user could be me. And I know input format. I don't want to write error handling for myself, all I need is to prevent data corruption. Getting HTTP 500 InvalidNumberFormatException is totally fine in some situations.
And language should provide means for writing that kind of code. At least that's my opinion and that's what I truly miss in those languages with explicit error handling of every function call.
You might call it lazy coding. I call it reasonable coding.
You can also implement `From` (see `impl From` in the above doc) to define how you want external error types to be converted.
Its one of the rare languages where there isn't much boilerplate involved in using `Result`-style errors, even compared to exceptions (Swift is like this too)
> The criteria I tend to prefer is "expected" versus "unexpected" errors
The way I've often heard this phrased is "exceptions are for exceptional behavior", and it's always rubbed me the wrong way a bit (although maybe this is partially just because I don't think wordplay is a sufficient argument to do something; I've made similar arguments in the past to friends who sung the praises of "no-shave November" and "thirsty Thursdays"). From digging a bit deeper when I've heard this opinion espoused, it seems like it mostly boils down to the fact that exceptions tend not to be as efficient as happy-path code, so using them for circumstances that are too common is not going to lead to good performance. I guess I don't really find this a subtle enough concept to warrant needing to introduce another abstraction layer into the discussion, especially one that's much vaguer like "is this behavior expected?" If there's a performance concern, I think it's much better addressed directly rather than shifting the discussion to a proxy.
> I/O errors, especially network errors, are things that are going to be expected under reasonable operation, therefore it make sense that code should handle them
It makes some sense, but often, code should do relatively little and process should do most of the handling. Recovering from I/O errors and properly testing the recovery code can take huge amounts of time and effort.
Often, aborting the program and rerunning it or even restarting a service is the best way to handle these because properly handling them in code costs time better spent on other things. Logging a decent error message and aborting may be the better choice.
But of course, that depends on the use case. Databases need lots of recovery code for I/O errors and deadlock recovery, for example, even for cases that occur maybe once every year.
(And yes, process nowadays is often automated, making it code again, but IMO, that’s a different kind of code)
I'm a fan of, "is this going to a user, or to another part of the calling program?" If expecting it to go to a user, exception make sense. Need to bubble it all the way up to the part of the system that is closest to the user.
The only thing that gets to make that decision is the call frame that initiated the program execution. Meaning -- only fn main gets to return stuff to the user. Everything else, all other functions, should (almost always) express faults as returned errors.
If you are writing a library, sure. If you are designing a system, this is silly. Functions aren't written in isolation. Such that there can be a broad definition of all exceptions that are expected to go to a user.
In the full system, you will have many side channels of data flow. Metrics and logging are also worth considering.
If you really want to boil a system down to a single entry point, at least do it like lisp, where the system has pluggable restart conditions defined.
Functions should absolutely be written in isolation. That's the point of functions!
If I see this line of code
result = val.method(foo, bar)
I make certain assumptions. Primarily that the implementation of val.method will in general only interact with foo, bar, and the fields of val. Metrics and logging should be dependencies injected to the val struct, not globals; expedience sometimes requires us to take shortcuts, sure, but those are exceptions, not patterns.
More fundamentally, that line of code, as it exists on the page, expresses an unambiguous flow of execution. Callers provide specific inputs to a function, the function goes off to do _something_ in a new level of the call stack, and, importantly, returns a result to me. Then I continue to the next thing.
The call stack underneath that function can be arbitrarily huge, sure. But I should be able to trust that when I read a sequence of expressions in my source code, the flow of execution through those expressions is exactly what it appears to be. Exceptions subvert this intuition. They make it so I can't know, or even predict, what will happen to my flow of execution when I jump into a new call stack. Every expression is potentially a return statement. That means I have to read the implementation of every function I call, recursively, in order to understand the execution flow of my code. This isn't tractable.
We obviously have to make some affordances for major failures, OOMs, div by 0, etc., so this isn't an absolute rule. But these have to be exceptional cases, not something that programmers need to consider as a matter of course. We simply don't have the cognitive capacity to model behavior in terms of recursively and arbitrarily complex implementation details. We need to be able to ignore this level of detail via simple and consistent abstractions.
Simple functions, certainly. But, not all things can be simple.
Consider your accelerator pedal in a car. Somewhat simple method to increase fuel to the engine. If something is wrong with the engine, it doesn't message that back to you through the pedal. Even though the pedal won't work. This can be as trivial as the car not being on, the pedal will not work.
So, if you are picking the straw man where every function has side effects and communicates back through a side channel, I agree. But if you are building a system where some things would require way more effort and code to get what you are aiming for, then we are in the realm of this article, where a panic that sends it back to the user makes far more sense.
> Consider your accelerator pedal in a car. Somewhat simple method to increase fuel to the engine. If something is wrong with the engine, it doesn't message that back to you through the pedal. Even though the pedal won't work. This can be as trivial as the car not being on, the pedal will not work.
I don't see how this is a problem? The accelerator pedal API, so to speak, is purely mechanical -- its "return value" is whether or not you were able to put it to the position you wanted, it doesn't promise to return any information about the downstream effects of that pressure/setting.
The accelerator is tightly coupled to the valve which controls how much fuel is given to the engine, but anything after that is a downstream (event-driven?) effect that I as a driver have to observe through other means. So if the valve is stuck I expect to learn about that failure after I call this fn, sure. But the impact on the intake, or the engine RPM, or the transmission, or my wheel speed, or my actual velocity, these are all things I discover through other means.
So it's not
fn depress_accelerator(f64 amount) -> CarState result
It's not complicated. When I call a function, I expect it will do what its signature says it can do, and I expect it to return control flow to me. If you can't rely on these assumptions, it's effectively impossible to build reliable software.
Well, first, the pedal is decidedly not coupled to the valve in many (most?) vehicles. It is hooked up to something else which is used as input to whatever is going to control the acceleration in the vehicle.
That said, this is precisely my point. Downstream from where my direct interaction is with the system, something can go wrong. And it doesn't make sense to think of it only in terms of a single "main." Instead, there are various parts of the system that all have different responsibilities. In particular, side channels of information are setup to route some errors/information back to a user that are not necessarily part of any individual function. And some systems should panic/fail instead of trying to continue, given the state of the system.
So, if something is being punted up from inside a software system, it makes sense to have an exception be there, to me. As it does, as well, to the article in this post. I can think this, all the while also agreeing that most of the time using return based responses makes a lot of sense.
Edit: Incidentally, I think your first function is more correct there. The state should also reflect where the accelerator pedal is currently sensed at. Any actual acceleration will be in response to the rest of the systems reading that state. There is literally no way for the pedal to know if it succeeded or not.
> There is literally no way for the pedal to know if it succeeded or not.
Success for the pedal is defined in terms of what it can know, same as any other component in a system. The pedal is typically a mechanical device, so what it can know is restricted to its mechanical outcomes. Was the floor mat stuck behind it? If so, pressing the thing to 100% only results in an outcome of 50% depression -- maybe that's failure. And so on.
It's incoherent for depress_accelerator to return information about the car in which the engine it is connected to is installed. Does `ls /mnt/volume/file` return information about the chassis of the server in which the hard drive hosting `/mnt/volume` is installed?
> it doesn't make sense to think of [programs/systems] only in terms of a single "main."
Why not? You can absolutely model any program, composed of arbitrarily many parts/components, as a graph of components with a single root. I acknowledge that's not the only model, but it's definitely universal and effective.
> In particular, side channels of information are setup to route some errors/information back to a user that are not necessarily part of any individual function. And some systems should panic/fail instead of trying to continue, given the state of the system.
What do you mean by "errors/information ... not part of any individual function"? Isn't it the case that anything which could produce an error is necessarily part of a function i.e. a sub-tree of the execution call stack? If I write a fn doSomething(input x), why should the result of my implementation do anything, in general, other than return execution to my caller with an appropriate result value?
> So, if something is being punted up from inside a software system, it makes sense to have an exception be there, to me. As it does, as well, to the article in this post. I can think this, all the while also agreeing that most of the time using return based responses makes a lot of sense.
Functions define layers of abstraction. A given function knows only its own implementation, narrow properties of its caller (input parameters and output expectations), and the characteristics of the functions it calls as defined by the signatures of those called functions.
There should never be the concept of something "being punted up from inside a software system". A function is defined in terms of its inputs and outputs. Its success or failure is defined in terms of those things only.
Network errors might be retryable/routable differently, but often (especially when starting out) should probably returned to the User. I mean if s3 is down you can retry the call but often it is down then
Sort of. At my company, if we removed retries from our services, our reliability would drop precipitously. Something like 99.99% of retries succeed on the second try, if there's not a hard service outage. If there is a hard outage, well, not much to do about that.
Only the root of the call stack, fn main, should be able to return anything to the user. Everything else should return errors to their callers through the normal return mechanisms. Anything else, anything that introduces the possibility of shadow control flow, makes it basically impossible to maintain a working mental model of nontrivial programs.
> I wouldn't expect that the internal configuration files of an application should occur in reasonable operation, and therefore it makes sense to panic if they're corrupted.
Whether an error returned from a fn is expected or unexpected has to be a property of the fn signature and language conventions in isolation. If semantic or other locally-unknowable details influence fault classification and error control flow at call sites, your program program becomes unmaintainable over time.
So like if you have fn parse_config that takes a string, or a file descriptor, or whatever -- it should return a Result<Config, Error> and yield an error for any un-parseable input. There is no reason this fn should ever panic.
IMO -- call stacks are sacrosanct, making control flow visible and obvious is one of if not the most important thing to optimize in nontrivial programming contexts.
It's kind of the opposite. If you have a network error, often you shouldn't bother to try handling it, the best strategy is often to zero out your state, give up and try again later.
By contrast, you should give up parsing a JSON if it's a config file read on startup but probably not if it's user input.
> When invariant violations or mistakes by programmers (aka bugs) are detected, the program should halt as it is in an inconsistent state and continuing could be very dangerous (think privacy/security/data corruption). Otherwise, don't halt (handle it or have the caller handle it).
Well it's not always the case. There are situations in which if you detect errors you want the program to continue running, and have only that particular functionality to fail.
I tend to write resilient code, since I work in embedded systems and what you never want is the system to crash. Halting a CPU on an invariant violation (i.e. and assert failing) is something useful for debugging (you trigger the debugger and you then analyze why it happened), but something you generally don't want in production.
Bette to have a ton of checks more and in case of an invariant violation (that maybe is resulting from a programmer mistake, but there is always the possibility of hardware memory corruption errors) to return an error and handle it in some ways (for example restart the task that returned the error, trying to go back to the last working state).
> There are situations in which if you detect errors you want the program to continue running, and have only that particular functionality to fail.
Yes, like a web server. If a request handler fails by panicking, in a Rust program, you catch the panic, respond with a 500 error and log the panic somewhere. But you continue serving other requests.
I talked about this in the blog post.
The problem with your strategy is that it requires you to be aware of your own mistakes. That doesn't sound like a robust strategy, unless you're investing huge resources into sophisticated tooling and have drastically restricted the expressivity of your programming environment. That exists and is fine, and I even addressed that in the blog post too.
Great article BTW, loved it! Will certainly become a classic.
The web server example scares me.
Something happened during execution that the programmer didn't expect. There's a bug in the program. What if the panic is due to memory corruption (less likely in Rust) or internal data structure corruption?
Without knowledge to the contrary, swallowing a panic and YOLO'ing execution is driving full speed down the road of very poorly defined behavior.
If the programmer had sufficient knowledge to conclude it was safe, they could have just used a Result<> to report the error.
So ... don't make panic part of your API, and don't recover from panics?
Doesn't scare me in a memory safe language. Request handlers tend to be very well isolated. It's standard practice for web servers written in Go too.
> So ... don't make panic part of your API, and don't recover from panics?
That's the wrong lesson. As my blog says, don't use panicking for error handling. Making panicking part of the API is standard practice. Otherwise `slice[i]` wouldn't exist. (Its API is to panic when `i` is out of bounds.)
Lest you think you should just never panic and convert everything into errors, that's addressed in the blog too. :)
> I mean, if all bets are off, all bets are truly off.
We have to expect that panic (e.g. from slice index operator) exercised an execution path that the library developer had never even considered, much less validated.
So then, post-panic, are we left with any guarantees about the validity of postconditions?
If you want to go on crashing your process and restarting it, then that's fine. It's a fine strategy. But you can't ignore that request handlers tend to be very well isolated, and catching a panic from one of them while continuing to service requests actually works well in practice. Oodles of folks are doing this in production. I've been doing it production for years. I have literally, not once, not ever, seen any problems arise because of this.
Consider what happens if there's a 'slice[i]' that panics in a regex library. The stack unwinds and any objects associated with that specific search that panicking get dropped. And since 'Regex' implements 'UnwindSafe', it says, "If a panic occurs it's all good to continue using me if you catch the panic."
You keep wanting to argue the theory. Rust has UnwindSafe for that. But please, consider looking at the practice too. You don't seem to have acknowledged that at all.
In a classic forking webserver, you get isolation from the OS; if your request panics, let that process die, and the main process will respawn it; further requests are isolated from the failure (although, if the request paniced after writing broken shared state outside the isolated process, who knows if further requests will work). If the part that paniced isn't isolated, you really should propagate the panic until the panicing part is isolated. But maybe that's the OTP/Erlang talking ;)
That's actually kind of a myth nowadays. For example the honest answer to the classic C10K question (how do you write a webserver that can serve 10000 requests concurrently) is that you spawn 10k threads and let the OS scheduler handle it and it'll be fine.
Even a modern Linux kernel on modern hardware will struggle with 10k processes if those processes are doing nontrivial (i.e. syscall-effecting) work. But that benchmark is what, 10 or 15 years old now? Contemporary benchmarks target 100k connections per server (with some constraints).
You don't have to kill/respawn the process after every request, only the ones that ended poorly. Maybe you can provide enough isolation through other means, but OS process per request is probably the most isolation you can get while being sensible and easy to use.
If you don't bind request lifetime to process lifetime, it's not process-per-request, it's a worker pool implemented with processes. That means you need a dispatcher, which tends to becomes the bottleneck. OS processes get you super excellent isolation, I agree, and sometimes that's the most important thing to have and it makes sense to design things that way. But request-per-process is just really hard to make fast. OS scheduling overhead is way less than it used to be but it's still a lot.
Blocking accept(2) makes a decent to good dispatcher, depending on your OS, and if you can stomach one request per socket (if you can't, you would need to pass the sockets back to a dispatcher between requests to wait for the socket to become ready. In the good old days, you could use accept filters and not see incoming connections until they were ready, but that doesn't really work for TLS or modern http with persistent connections.) You could make that pretty fast if you run one dispatcher per core, and align them with the NIC queues; each dispatcher with its own pool of workers.
If your work is mostly compute, then you usually don't really want to run more concurrency than one, maybe a few workers per core, and then OS scheduling is easy. If your work is more of waiting for i/o, large concurrency makes more sense, but the OS scheduling is not going to be too hard there, because it takes almost nothing for the OS to leave a process blocked on i/o; but you do need to have good timer scalability if you have a lot of processes, since they're all going to want to set and clear a timeout on most of the syscalls. io_uring etc with a small number of os processes/threads might be less work for the kernel, but certainly at the cost of isolation.
My experience is that basically all request-servicing work is I/O-bound. And AFAIK there is no request-servicing system in normal production use which does process-per-request. Even request-per-socket is basically outmoded; modern protocols multiplex logical requests over physical connections one way or another, e.g. HTTP/2.
> So ... don't make panic part of your API, and don't recover from panics?
Don't recover from panic except at high level - retrying or failing that "scope" is ok, but don't try to do business logic around panicing or not (if you absolutely must, catch the panic at the lowest possible level and turn it into a result).
> Without knowledge to the contrary, swallowing a panic and YOLO'ing execution is driving full speed down the road of very poorly defined behavior.
I mean, if you’re worried about code “corrupting” things such that it may not be safe to even serve the next request, you probably need to worry about successful requests as well. I mean, if all bets are off, all bets are truly off.
At some point, if you decide to colocate code from multiple “apps” in one server, you gotta have some level of trust in said apps. You’d obviously never do this for untrusted code. And you’d never do this for code that (hand-waving as you have) “corrupts” things in such a way that it’s not safe to handle the next request, panic or no.
> Doesn't panic imply that the program entered a state from which it was never explicitly designed to recover?
No? I think you’re making a lot of assumptions about the “state” of a program. Panicking doesn’t mean “I maintain state in between requests and that state is now poison and I can’t be called again”. In fact it doesn’t mean anything really. It means someone wrote a code path that panics, no more, no less.
Sure it’s possible that code can be written where a panic indicates that there is some poison state and now nothing can be trusted etc. But… the same is possible for code that doesn’t panic, so I don’t really know where we’re going with this discussion.
Well, shouldn't there be a standard way to convey that, that is understood (as a matter of culture) to mean that? And if it's not panic then what is it?
You want there to be a language-standard way of saying “I’m a backend for a web server and I don’t want to cause an error 500 but instead I want to crash the whole process”? Seems pretty specific to me.
Panic means “something’s wrong, this error isn’t recoverable”. In most contexts, the best thing to do is crash in this case. In some specific cases, if you keep things well-isolated enough (making HTTP handlers stateless, etc) you can design a system where panic means “something’s wrong, this error isn’t recoverable, and inform the remote end as such by rendering a 500 error”, and everything works just fine.
You seem to want some sort of truly fatal way of conveying errors. Panic is not that. If it was, Rust would not have designed panic to be handleable. It would be called “crash”, not panic. Go is similar… go’s panic is not meant for typical error handling, but it’s not synonymous with “crash”, either. It’s meant for situations where you want the moral equivalent of a crash, but taking the form of whatever makes sense for your use case. A multiplexing web server that renders proper 500 errors on a panic is basically the premiere use case doe this.
Panic means "something wrong and this system didn't anticipate it, so now any random invariant might be broken".
Or at least that's what it was back when people first started making the distinction vs handle-able errors. I guess it got skewed over time, just like everything else.
> The problem with your strategy is that it requires you to be aware of your own mistakes. That doesn't sound like a robust strategy, unless you're investing huge resources into sophisticated tooling and have drastically restricted the expressivity of your programming environment. That exists and is fine, and I even addressed that in the blog post too.
Those "huge resources" and "sophisticated tooling" could just mean something like Erlang, where fault isolation is a core design principle.
But fault isolation works in simpler systems, too. Here's a recent example: I wrote a function that automatically generates a table of contents for an HTML document. While extensively testing it, I ran across a number of edge cases that would lead to faults (mostly due to messy underlying libraries, but also due to programmer error). I fixed all that I found, but I still didn't feel comfortable not adding a generic exception handler to the function.
In case that an exception is ever thrown in that code, the function will just return the input HTML unchanged (and log an error, of course). That's graceful degradation, and a good example of recovering from a potentially unknown fault.
Similar situations can often be found in many applications. Not all functions are essential, and we're not always 100% confident about all parts of the code working correctly.
(Note that I'm not talking about Rust here, I lack sufficient knowledge about it. It's possible that this strategy wouldn't work in Rust for reasons of memory-safety or anything else.)
> I tend to write resilient code, since I work in embedded systems and what you never want is the system to crash.
That might be okay too depending on what your system is. I've had a cell phone (the monochrome dumb kind) display an assertion error at me, that was kind of cool. A screw was coming loose, so there were hardware issues. Worst case, I'd have to bring it to the store and get it repaired or replaced; there was no threat of injury or large monetary damage.
To me the real issue is this is an extremely forced binary and there's really at least three meaningful categories (especially in software with a UI of any sort):
- unactionable invariant violation (poisoned mutex, hard memory errors): crash immediately, something that should ever happen happened and there's no way to either handle or present the error to the user in a meaningful way.
- unactionable (at the call site) but normal errors (couldn't open a file, disconnected from the remote end of a connection, etc): these need to be propagated up to where they can be turned into actionable information for a user, ideally. This is rarely a thing the call site where it happened can usefully do.
- immediately actionable and normal errors (user input didn't validate, file user wanted to open doesn't exist, connection failed but can be retried with a backoff, etc). These need to be handled at the call site or maybe one or two levels up.
You need an exception-like mechanism (or at least a process for emulating one, a la go MRV or C errno) to handle the second case, you often want it for the third case, but it never really makes sense to use it for the first.
That said, I think in non-test rust code you should use expect instead of unwrap, because sometimes invariants do trip and that little tiny extra bit of info can make a huge difference to resolving it.
Recoverable vs unrecoverable comes down to requirements. Certain companies known for having software that 'just works' tend to have both very few unrecoverable errors and very conservative feature sets to help facilitate that short list.
It is very clearly a choice, even if many people are deciding by default. By not tackling an issue, you've chosen to have that issue.
I think we have compatible views. Each layer of the software must decide it's requirements and handle errors appropriately per requirements. You're right I didn't articulate when to handle an issue locally vs. pass it up. I think that's where requirements (and also explicit API guarantees) come into play.
I do think that APIs that "overpromise" by not returning the errors they do not handle to the caller, and instead halt or throw an exception, do their users a disservice in the long-run. These just become undocumented cases that bite you later on. Better libraries have all these conditions baked into the API itself.
> You're right I didn't articulate when to handle an issue locally vs. pass it up.
I'm saying this is an everybody problem. Most of us just pass the buck by default. It can take quite a bit of cajoling to get error passing pushed back down to the layer that can best cope with it, once it has escaped up the stack. And when we split teams horizontally, instead of vertically, this happens practically all the time.
This is one of the primary reasons I push for feature teams instead of client/server teams. The distance between, "I think this is done" and "this is ready to go to production" is mercifully small, instead of unknowable.
In my experience, APIs that throw rarely define all the exceptions that can come from it, especially transitively. I see exceptions as a failed (because undocumented, but still important for correctness) attempt at compromising between halting and returning an error.
You can still, at the very least, `catch(Exception e)` or `catch(...)` to handle a failure. Odds are very good in my experience that you will anyway have nothing better to do than log the error and abort the higher-level operation (e.g. HTTP request handler), even if you do know the specific type of exception that happened.
Also, even in languages that have error return types/codes, it's very uncommon to see anything other than the most generic error value/return code allowed by the convention (e.g. `return -1` in C or `return fmt.Errorf("...")` in Go). Writing an API to document all possible failure modes is hard, and is rarely done, regardless of the mechanics of how errors are returned.
One of the exceptions I've often seen is in SQL APIs, which generally do need to report the specific SQL errors that were signaled. And here, I've seen all possible errors explicitly exposed in the API, typically through a rich Error or Exception type that has a field for the specific SQL error code.
Not to mention, we were mostly discussing cases where a library finds itself in a bug situation, say a null pointer case. I would not expect the API to express the possibility of "NullPointerException" or "ArrayIndexOutOfBounds" as possible return values, but I do want the language to raise these and allow me to decide how to handle them instead of simply halting the entire program - at least in managed memory languages where memory corruption is not possible/likely (if there is a good chance of memory corruption, like in C++, halting is indeed much better than raising an exception).
To be fair, there are two reasons why it's considered a hassle.
One reason, which is a bad reason, is that many people just don't like to document and handle errors. Instead of being happy that the compiler is telling them that they forgot to handle or declare an IOException from this call, they get annoyed that it's yelling at them "for no reason". This is simply lack of understanding/care for how you program.
The other reason, which is actually a problem with the language that could be fixed, is that Java doesn't allow functions to be polymorphic in their Exception types, like it does for argument and return types. This makes higher-order constructs very annoying - for example, `stream.map(Function)` should `throw` the same Exceptions that `function` throws, as should `Arrays.sort(array, Comparator)`. Without this capability, you end up with an extremely ugly and brittle pattern of doing:
This huge verbosity is obviously unnecessary and ugly, and could be removed with some compiler support (compiler could just insert this), or some more complex runtime support. In my opinion, if Java did this, it would actually have the best error handling mechanism of any language on the market - much better than Haskell or Rust.
> One reason, which is a bad reason, is that many people just don't like to document and handle errors. Instead of being happy that the compiler is telling them that they forgot to handle or declare an IOException from this call, they get annoyed that it's yelling at them "for no reason". This is simply lack of understanding/care for how you program.
Usually, what happens is that you call some library code (or some code that a teammate wrote) and that code will declare an IOException or something like that. In many cases, there's no point in handling that, as that file that you're trying to open or similar isn't supplied by the user but e.g. a static resource. That's the entire point of the article, that panicking when encountering a violated precondition is totally acceptable.
The unchecked vs. checked exception distinction also suffers from the fact that it's usually the call site, and not the declaration site, that knows whether an exception is recoverable or not.
Java checked exceptions would be fine if it had something like "unwrap", which just converts a checked exception to an unchecked one, but it doesn't, and that's why everyone kind of hates them.
> Java checked exceptions would be fine if it had something like "unwrap", which just converts a checked exception to an unchecked one, but it doesn't, and that's why everyone kind of hates them.
It's a bit verbose, but try{ doThing();} catch (Exception e) {throw new RuntimeException(e) ;} is exactly that, and in today's Java it is very easy to put in a utility function.
> The unchecked vs. checked exception distinction also suffers from the fact that it's usually the call site, and not the declaration site, that knows whether an exception is recoverable or not.
But that is true of every possible compiler-enforced error handling mechanism. The declaration site is responsible for declaring what errors it can produce (through exceptions, result types, multiple returns, error codes, the Either monad etc.), and the compiler ensures that every call site handles those error values in some way.
> It's a bit verbose, but try{ doThing();} catch (Exception e) {throw new RuntimeException(e) ;} is exactly that [...]
The verbosity is exactly what is bothering most people. It's annoying to write and clutters the code. Even if you yourself agree that it should be written like that, your team mates will maybe not. The "swallow an error and pretend it didn't happen" pattern is incredibly common in Java from my experience and it simply wouldn't be so common if there existed something like "unwrap()".
> and in today's Java it is very easy to put in a utility function.
I had to do quite a bit of trial and error and googling to see how this can be done (you have to use the "Callable" interface, instead of something like "Supplier"). I suspect, this will be the same for most people, who wouldn't know how to write this, or just wouldn't bother.
And even if you add this, you'd have to call it from some utility class, which makes the call site much more verbose.
Defaults matter, and if there is no good built-in solution for this in Java, people will not use it.
> But that is true of every possible compiler-enforced error handling mechanism.
Not every language bothers with a compiler-enforced handling mechanism, precisely because it's hard for the compiler to predict whether an error should be recovered from or not, or at which part of the stack it should be handled.
If you do want compiler-enforced error handling, result types are probably better because they are regular language constructs that can be manipulated and passed around in regular ways, and it's easy to convert them to a panic, as with Rust's "unwrap()".
This is also why Kotlin switched away from checked exceptions and now maintains that if you do care about compiler-enforced error handling, you should use Result types instead.
Now, that said, I wouldn't mind better tooling in languages with exceptions (Java and Kotlin, for example), where I could ask the compiler about a function and it could tell me about all exceptions that could (transitively) be thrown from that function, or an annotation to the effect of "please, compiler, verify that this function only throws these types of exceptions (be they checked or unchecked)". But that's something to be used judiciously for some critical code paths, and not everywhere necessarily.
> The verbosity is exactly what is bothering most people. It's annoying to write and clutters the code. Even if you yourself agree that it should be written like that, your team mates will maybe not.
As I said, a bad reason to dislike compiler-enforced error handling. Especially since we're now discussing the small minority of places where you want to call a function that returns errors, but you believe those errors are not possible in your case.
> I had to do quite a bit of trial and error and googling to see how this can be done
I admit that only after writing that I remembered that Java has a dozen different interfaces that represent various flavors of functions, so indeed it's not that easy to write the utility I was thinking of.
> If you do want compiler-enforced error handling, result types are probably better because they are regular language constructs that can be manipulated and passed around in regular ways, and it's easy to convert them to a panic, as with Rust's "unwrap()"
I never understand this point, though I've seen it raised a lot in these types of discussions. Exceptions are also regular language constructs, there is nothing that magical about them. All of the problems that people list with checked exceptions are there for Result types as well, and then some. You try writing the result type for a function that can fail in 20 different ways, or use that result type to handle 2 specific error types and ignore the 18 others.
Note that another name for a "panic()" is "throwing an unchecked exception".
> This is also why Kotlin switched away from checked exceptions and now maintains that if you do care about compiler-enforced error handling, you should use Result types instead.
I have looked at their docs, and they do no such thing. They took the same path as C#, and never added checked exceptions in the first place, and cite C#'s designers for this decision in their docs, and a maintainer for what later became apache-commons. They give various reasons, that all apply to result types just as much - accumulation of error types, functions/interfaces that want to conditionally throw exceptions, interface breaking when a new error type is added to what a function can return etc.
All of these can be handled to various degrees. Unfortunately Java, while somewhat improved, is still an exceptionally verbose language, and this shows in its exception handling as well.
> As I said, a bad reason to dislike compiler-enforced error handling. Especially since we're now discussing the small minority of places where you want to call a function that returns errors, but you believe those errors are not possible in your case.
This comment thread is in response to an article in which it is (IMHO correctly) argued that "just let it crash/panic" is the correct response in many cases, so it's not a "minority of situations", in my view.
> I never understand this point, though I've seen it raised a lot in these types of discussions. Exceptions are also regular language constructs, there is nothing that magical about them.
You showed yourself how checked exceptions in Java mess up higher-order functions. That's because you can't deal generically with functions that may or may not throw exceptions, since return values and exceptions are different language mechanisms. Sure, Java could add special handling for map etc., but barring that, how would you actually implement "map" yourself? You would need to be able to parameterise the mapping function according to whether it throws an exception, and if so, which ones (it could be multiple ones!). To my knowledge, there is simply no language construct in Java that allows you to even express something like that. Result types allow you to simply handle that in library code, where a function can just pass on the return value of another function.
Moreover, you can write code that is generic in terms of whether it's dealing with an optional, a result type or a list (for which "empty list" can signal "no results", which is in a sense similar in kind to optional or result). That's because all these types have a monad instance. That means that you can convert code that returns "null" into code that returns result types (if you want to add more information to the error condition), without the caller needing to be aware of it. That's simply not possible with exceptions.
> Note that another name for a "panic()" is "throwing an unchecked exception".
And I don't generally mind unchecked exceptions, I mind checkef exceptions. Panics are a bit more limited in scope than your regular Java unchecked exceptions, though (not in small part due that a language like Rust just has a better type system that allows you to to express more invariants - at least historically, I know that Java has sealed interfaces now too), and there are restrictions in terms of how you can recover from them - which should dissuade people from abusing them for control flow, something that is common in Java.
> I have looked at their docs, and they do no such thing.
They probably don't think that code guidelines belong in official documentation, but:
One of the early lambda proposals for Java - the one by Neal Gafter, if I remember correctly - actually had exception parameters for generics (including type unions), so you could do HOFs like that. But, at the end of the day, they went for something simpler.
Don't exception also halt your program if you ignore them?
Also if using a library I don't want a bug in it to bring my program down, then I am forced to use workarounds like create a child process to use the library, start the child process from the main process and check on it to see if it fails or succeeds, that would be bad for performance and ugly.
Yep, there's not really any such thing (IME) as a 'recoverable' error - except with respect to I/O.
There's either I/O errors - or there's logic errors. A failure with logic should nuke due to the app being in an inconsistent state; trust is lost. An I/O error should fail softly.
> There's either I/O errors - or there's logic errors. A failure with logic should nuke due to the app being in an inconsistent state; trust is lost.
nah. in GUI apps for instance you want the failure in the logic of a sub-sub-function to just tell the error "wops" when the button that triggered the action was clicked, not nuke the app (unless you hate your users). e.g. imagine a 3D software which allows to do mesh operations - user clicks on the "Smooth the mesh" button somewhere. Programmer forgot to handle a division by zero in some degenerate case of the smoothing computation which ends up leading to an exception: a value becomes zero, someone used unsigned integers for n in an "n - 1" computation which ends up in a call to array_of_floats.resize(0xffffffffffffffff) (and a likely std::bad_alloc being thrown if you're in c++).
The original mesh is unchanged as the operation waits until the computation is complete to replace the old mesh with the new.
If you ever decide to crash in this situation I am sure you will have great reviews on 3D modeling software comparisons.
> Programmer forgot to handle a division by zero in some degenerate case of the smoothing computation which ends up leading to an exception: a value becomes zero, someone used unsigned integers for n in an "n - 1" computation which ends up in a call to array_of_floats.resize(0xffffffffffffffff) (and a likely std::bad_alloc being thrown if you're in c++).
To me this is a textbook case of why panic::catch_unwind exists in Rust. Conceptually, the smoothing algorithm failed in an unexpected way, so a panic is appropriate: there is nothing else for the procedure to do but crash. But crashing would be a very poor user experience, because the smoothing operation is conceptually isolated from the application as a whole, so the program shouldn't be affected by a single operation crashing. This is why Rust goes to the trouble to implement unwinding on panic: in certain domains, software fault isolation is important.
Another possibility, of course, would be to spawn a separate process for the smoothing operation, which would effectively replace software fault isolation with hardware fault isolation. But this might be awkward (and slow), which is why few modeling software packages do this.
("A complex smoothing operation in a 3D modeling package fails" is maybe the best example of the need for panic::catch_unwind I've ever heard of--thanks for offering it. I wish I had been able to deploy this example back when we had the debate as to whether catch_unwind should exist at all. Thankfully, we kept it in.) :)
> Another possibility, of course, would be to spawn a separate process for the smoothing operation, which would effectively replace software fault isolation with hardware fault isolation. But this might be awkward (and slow), which is why few modeling software packages do this.
I do agree this is a much better model, and it doesn't have to be particularly slow or painful. An example system that does this well is XPC on macOS [1]. The nice thing is by defining a good service model is you can also run your third-party plugins the same way - with full process-level privilege isolation - and in the event of a crash the service can simply be restarted.
The problem with this approach is that, when a panic actually happens at runtime, there is no way to tell whether it was caused by an isolated mistake like forgetting to check for zero, or by memory corruption writing garbage all over your data structures. If it's the first, recovering is fine; if it's the second, you risk exposing the user to data loss and security vulnerabilities. So the only safe thing to do is crash.
Now, it may be true that you get better reviews selling software that corrupts data or gets its users hacked than selling software that crashes. But unless you hate your users, those are probably things you want to avoid.
In memory-safe languages, memory is not suddenly corrupted because of some exception that is thrown.
Instead, the bigger issue is corrupt external state, e.g. in a database. But that can of course happen just as well if you panic. What you'll need to do is to wrap your code in a transaction, if possible.
Corrupt internal state is just as much of a problem. It doesn't have to be corruption in a sense of garbage memory. It just has to violate invariants - say, one variable got updated, and the other one that depends on it did not.
But that doesn't just happen on its own just because an exception is thrown.
Either you don't handle it and it kills the process, or you handle it at the top level of a web server or similar, and it just kills that request (and that doesn't affect other requests), or you've written explicit error handling logic in which case you've hopefully thought about how to handle the error so as not to carry garbage, inconsistent state around, and hopefully also written some tests for it. This gets even easier if you mostly use immutable data, or at least limit shared state between components.
GGP's original point was that "the only safe thing to do [when encountering an exception] is crash" and for managed-memory-languages that is simply not true.
It happens because a piece of code that was supposed to execute atomically ended up having an exception in the middle - so one assignment did execute, while the other one did not.
If you can well and truly isolate the state that concerns a single request from any other state, then yes, it is safe. The problem is proving that such isolation holds.
Memory-safe or not doesn't actually matter much here.
I would actually want that to crash, yes. That would ensure it gets caught and resolved by developers during development. Or at least increase the odds thereof. Further, if it allocates a few gb, or if it's allocating a large amount of memory because the size parameter got smashed?
If it crashes enough that you'd get terrible reviews it would definitely be caught during development.
If it messes up your mesh silently then you'll definitely still get bad reviews.
And actually having an std::bad_alloc thrown is even worse since you have no idea what state it left things after running some destructors when you weren't planning for it.
> That would ensure it gets caught and resolved by developers during development
i don't know in which reality you live but in mine there's not much relationship between the existence of crashes, and them being resolved during development
> And actually having an std::bad_alloc thrown is even worse since you have no idea what state it left things after running some destructors when you weren't planning for it.
i don't even know what to say. that's the whole point of destructors - you know that things will be unwound in the reverse order from which they were created on automatic storage. that's, like, why c++ exists
> i don't know in which reality you live but in mine there's not much relationship between the existence of crashes, and them being resolved during development
Really? Crashes during development are the easiest thing to resolve aren't they? Your debugger gets triggered automatically. In my experience crashes are treated as highest priority.
> i don't even know what to say. that's the whole point of destructors - you know that things will be unwound in the reverse order from which they were created on automatic storage. that's, like, why c++ exists
> Really? Crashes during development are the easiest thing to resolve aren't they?
sure but here we're talking about the edge cases which you won't encounter during development.
e.g. imagine you're writing a 3D .obj parser and someone somewhere uses a mesh which for some reason has a NaN instead of a real value somewhere, and your parser parses it and gets a NaN into your system - you'll only encounter the subsequent crashes if your tests had expected such a thing.
now multiply this by infinity - it's impossible to think of everything beforehand. But you really don't want the person to crash their 3D software because of that invalid .obj file, just to show them a pop-up "the mesh could not be loaded" and roll-back to before the loading was attempted.
This hasn't been an issue for more than a decade. clang 3.0 (2011) and gcc 4.1 (2007) already warned for this (and there aren't older compilers on CE that I can test).
There are people who start learning programming today who weren't even born when GCC implemented this FFS
Only if you have implemented your destructors correctly. Forgot to add a 'virtual' on one of them? Too bad. Have some pointers to objects? Better remember
to call delete on them all. You never really 'know' things are destroyed, that's why tools like Valgrind exist.
The real fun is really the combination with exceptions and destructors. So much so, some projects essentially ban exceptions in C++ code.
A parser library can return an error when the input is wrong. It doesn't necessarily mean the app is in an inconsistent state when that happens; it all depends on what the application does. It follows that aborting is often a sensible decision in application code and rarely in library code.
Partially true. In practice people implement multiplexed servers, for many reasons, including performance / throughput. A logic failure should nuke _the offending request_, not the entire server with all the unrelated concurrent requests.
This is an attitude that I often see -- library authors who believe they own the process. Aborting may be the only sensible thing to do in runtimes where you can end up corrupting the entire process memory or the like, making recovery "dubiously possible", but absolutely not for anything higher level, where recovery may be safe and possible.
I gave examples covering this area in the post. How would you rewrite the code in this section[1], for example, to conform to your views? (Assume this code is in a library.)
* Short term: add explicit runtime checks for every step that may panic. With the way jump prediction works in modern processors, the runtime cost may be smaller than one may naively assume it is. Some of us would take, say, a 10% perf. degradation instead of chasing prod panics in the middle of the night.
* Long term: plug-in a richer type (logic) system, so one can safely prove the most costly runtime invariants at compile time.
> add explicit runtime checks for every step that may panic
And do... what when the check fails? Can you please write the code for it? Because I don't understand what the heck you mean here. If you don't know Rust, pseudo code is fine.
> Long term: plug-in a richer type (logic) system, so one can safely prove the most costly runtime invariants at compile time.
This is just copying what I already said in the blog post. Until someone can show me how to prove the correctness of arbitrary DFA construction and search from a user provided regular expression pattern and demonstrate its use in a practical programming language like Rust, I consider this a total non-answer, not a "long term" answer.
Basically, your answer here confirms for me that your views on how code should be structured are incoherent.
So it looks like I already provided a rewrite of the code for you in your preferred style:
// Returns true if the DFA matches the entire 'haystack'.
// This routine always returns either Ok(true) or Ok(false) for all inputs.
// It never returns an error unless there is a bug in its implementation.
fn is_match(&self, haystack: &[u8]) -> Result<bool, &'static str> {
let mut state_id = self.start_id;
for &byte in haystack {
let row = match state_id.checked_mul(256) {
None => return Err("state id too big"),
Some(row) => row,
};
let row_offset = match row.checked_add(usize::from(byte)) {
None => return Err("row index too big"),
Some(row_offset) => row_offset,
};
state_id = match self.transitions.get(row_offset) {
None => return Err("invalid transition"),
Some(&state_id) => state_id,
};
match self.is_match_id.get(state_id) {
None => return Err("invalid state id"),
Some(&true) => return Ok(true),
Some(&false) => {}
}
}
Ok(false)
}
My favorite part is the docs that say "this never returns an error." Because if it did, the code would be buggy. And now all sorts of internal implementation details are leaked.
You can't even document the error conditions in a way that is related to the input of the routine, because if you could, you would have discovered a bug!
Another nail in the coffin of your preferred style is that this will absolutely trash the performance of a DFA search loop.
One additional data point: The Rust compiler got ~2% faster when they made some of the core traits in the serializer infallible (ie. panic instead of using Result): https://github.com/rust-lang/rust/pull/93066
> The Decoder trait used for metadata decoding was fallible, using Result throughout. But decoding failures should only happen if something highly unexpected happens (e.g. metadata is corrupted) and on failure the calling code would just abort. This PR changed Decoder to be infallible throughout—panicking immediately instead of panicking slightly later—thus avoiding lots of pointless Result propagation, for wins across many benchmarks of up to 2%.
So that's 10% right there. We can keep going though. Let's get rid of the bounds checks. We have to use unsafe though. The stakes have risen. Now if there's a bug, we probably won't get a panic. Instead we get UB. Which might lead to all sorts of bad stuff.
fn find_fwd_imp<A: Automaton + ?Sized>(
dfa: &A,
input: &Input<'_, '_>,
pre: Option<&'_ dyn Prefilter>,
earliest: bool,
) -> Result<Option<HalfMatch>, MatchError> {
let mut mat = None;
let mut sid = init_fwd(dfa, input)?;
let mut at = input.start();
while at < input.end() {
sid = unsafe {
let byte = *input.haystack().get_unchecked(at);
dfa.next_state_unchecked(sid, byte)
};
if dfa.is_special_state(sid) {
if dfa.is_match_state(sid) {
let pattern = dfa.match_pattern(sid, 0);
mat = Some(HalfMatch::new(pattern, at));
} else if dfa.is_dead_state(sid) {
return Ok(mat);
}
}
at += 1;
}
eoi_fwd(dfa, input, &mut sid, &mut mat)?;
Ok(mat)
}
This is so awesome, thanks for sharing! I am happy that the "~10% perf hit" intuition is reasonably close to what the data shows. What is the right tradeoff? I am not an author of widely popular regex engines, so I'll leave that evaluation to more qualified people.
10% is an excellent win. And given that I consider the first code snippet to be effectively nonsense, the right trade off is obvious to me. :)
I bet if me and you sat down with a big codebase side-by-side, I could convince you that the style you're proposing is totally unworkable.
Now, whether it's worth it to use unsafe and elide bounds checks is harder. The gain I showed here is small, but it can be bigger in other examples. It depends. But it is certainly less clear. It helps that this is probanly the easiest kind of unsafe to justify, because you just have to make sure the index is in bounds. Other types of unsafe can be quite tricky to justify.
Your short term suggestion appears confused. Panics are already the result of runtime checks. The issue isn't whether or not we have runtime checks, but what to do when those checks fail.
But to your original point, its true that a multiplexing server should probably try to stay up even if there is a bug uncovered in handling a request. But that's precisely why rust allows you to catch panics.
"It is not recommended to use this function for a general try/catch mechanism. The Result type is more appropriate to use for functions that can fail on a regular basis. Additionally, this function is not guaranteed to catch all panics, see the “Notes” section below."
Edit. Re. already existing runtime checks, I'd expect a decent compiler to optimize away redundant checks, i.e. explicitly coded checks vs. checks injected automatically by the compiler, as long as their semantics is identical.
Can you point to any compiler in the world that can see through the construction of a DFA from user input to the point that it can automatically optimize away the transition table lookup?
> The Result type is more appropriate to use for *functions that can fail on a regular basis.*
Emphasis mine. The DFA::is_match routine can never fail on any input. Yet, you want to use a 'Result' with it.
Seriously. Please please pretty please read the blog post. You are literally caught in precisely the tangled mess that I tried to untangle in the post. If my post didn't do it for you, then please consider responding to specific points you disagree with in the post.
Sorry, I only meant to address with some nuance the statement "A failure with logic should nuke due to the app being in an inconsistent state". If you are confident that the specific DFA code in the blog post will not fail, more power to you! And to us, which transitively trust the code via your judgment ;)
Edit. For technical geeking around:
* The point is about redundant check elimination, not full check elimination. A decent compiler should be able to eliminate redundant checks, though perhaps the language could offer some intrinsics (was: pragmas) to make this 100% reliable.
Panic practically everywhere is just not an admissible option, most definitely not for anything that tries to call itself a systems language. Proving that there is no reasonable alternative would be equivalent to claiming the language is unusable.
Turing complete languages are legacy cruft. There are no compelling use cases where they're actually more expressive, and sheer code execution performance is never the bottleneck in a realistic-sized system.
Feel free to back it up with real examples. Otherwise, previous conversations with you have just been frustrating exchanges about pie-in-the-sky stuff. Just seems like you want to be bombastic for the sake of being bombastic.
> and sheer code execution performance is never the bottleneck in a realistic-sized system.
TIL that code execution perf doesn't matter and I've wasted almost a decade of my life focusing on it. Wow I wish you told me sooner!
> Just seems like you want to be bombastic for the sake of being bombastic.
Ironically, that's exactly how you seem when just drop the "Turing completeness" bomb on a discussion about error handling. It's very hard not to see it as a ridiculously childish tantrum, since not only there's no actual argument, it is by definition impossible for it to be relevant to the discussion.
Anyway, I would not be writing this if it wasn't because you keep pumping direct personal attacks to everyone who dares counter your posts, which are already in a terrible tone.
Thanks for the free lesson, Random Internet Stranger. I appreciate it. Unfortunately, the cognitive dissonance you give me is just too much to bare. Your wisdom is clearly beyond reproach, and so therefore Rust cannot be a systems language since it has panics. Yet, I and many others use Rust for systems programming. My brain might explode from the contradiction!
> therefore Rust cannot be a systems language since it has panics
Or, the obvious alternative that I was trying to point out: your analysis is wrong, and therefore, assuming the language is not useless, there must be a proper alternative to "panics everywhere" * that you are not accounting for !
* This is just to explicitly write the phrase again before it is (again) twisted as calling for a strict prohibition of all non-terminating conditions or some other childish nonsense.
Even unwinding panics (which are exceptions in all but name) can be an alternative, and Rust does support them.
Cool, good thing I didn't say to panic everywhere.
That lkml post has zero relevance here. He's asking for fallible allocations. Which is totally reasonable.
Serious question: did you read the entire blog? If so, can you specifically point out passages you disagree with and why?
EDIT: And seriously, I started out this conversation with a serious comment asking you to elaborate on what you meant by asking something very concrete: how would you change the code example in my blog to conform to your view? You decided to flippantly respond with a non-answer in the first place. Why not go back to the original question I asked and give a real answer? If you don't know Rust, then use pseudo code. We're talking about a dozen lines here. It shouldn't take long.
The problem is that Rust developers don't just use unwrap() when it should panic. I've seen plenty of "production grade" crates basically just unwrap because the author didn't know how to handle it gracefully or just wanted to get the code compiling, then forgot about it.
That is a problem. Another similar problem is that programmers write buggy code. I don't see anything particularly special about 'unwrap()'. 'unwrap()' is a common manifestation of it. But as others have pointed out to you, this same phenomenon manifests everywhere and in other programming languages, but through different means.
This is why I write in a 2-pass manner quite often. In the first pass I use unwrap a lot to get the functionality going. In the second pass I search for all the unwrap and expect and take a moment to think about in what cases these may happen.
This works for me really well to get a prototype going, but then have a solid program later. Because the unwrap/expect is so easy to search for, that you can really postpone the error handling until later.
not sure why you're getting downvoted. maybe the scope of what you're saying ('plenty of "production grade" crates unwrap [when they shouldn't]') calls for a reference? Or maybe because you're singling out Rust developers while the issue is certainly observed in all languages with similar mechanisms (see unchecked exceptions abuse in java, or aborting asserts abuse in C)
Anyway i wouldn't say "plenty", but i did came across crates (parsers :/) that would unwrap on malformed input. the workaround is to encapsulate their use in a catch_unwind.
for the record, i had a similar issue in a c++ lib where the author elected to abort on the unsupported input, so i'm somewhat thankful that the idiomatic mechanism is panic (which is recoverable if needs be) in Rust
Guilty as charged, and honestly, I wish I had a good mechanism/function call to separate "I'm just being lazy here, this is a TODO" from "actually, this should never happen.
.map_err(|_| unimplemented!()).unwrap()
or something. (I hope you get the gist of that, as I'm 99% sure that code doesn't compile. Error(E) -> !, Ok(T) -> T)
I've thought about this, it just requires some effort, particular if you want the result to be grep-able. (And, in >1 person groups, then it requires coordination, although perhaps also all of those should be remove prior to committing…)
I think anything you do here is going to require some kind of effort.
Personally, I think it would be better to stop thinking about 'unwrap()' as its own special thing. 'unwrap()' is just a common manifestation of something much more fundamental: runtime invariants. Think about those.
Maybe it would have been a good idea to have two names for unwrap, one which would mean "I'm certain that this value will always be okay", and another which would mean "I'm taking a shortcut because I'm writing a script or just want it to compile for now". Maybe a longer name like "assert_valid()" for the one where you're sure it's okay. That might make it easier to find the places where shortcuts were taken and forgotten.
Consider using anyhow. At that point, unwrap isn't a shortcut anymore except for one place: you are in a deeply nested function call and you suddenly realize you need to bubble up an error a few layers. Now you have to change a bunch of function signatures.
I don't find myself in that position too frequently. Certainly not enough to warrant two identical but differently named functions.
In that case, I would suggest coming up with your own pattern. Perhaps an unwrap() with a FIXME comment. Or a expect("FIXME").
That's a non-sequitur. Just because it would be more effective doesn't mean it's actually worth trying to standardize. I see no accounting of trade offs. I see no accounting for how common the pattern is in practice. I see no accounting for the confusion that will result when there are two methods that behave identically, are named different and should be used in two very subtly different and nuanced ways.
I'd prefer if you just leave me alone to be honest.
Maybe if Rust code wasn't so hard to get working at all, then more of it would work correctly (including checking errors instead of giving up and unwrapping).
not sure what you're referring to, but yes having a first iteration of code with unwrap here and there, and then a second one with proper error handling is a viable strategy. the "loudness" of unwrap makes it super easy to notice any leftover in code review
I'm confused, you seem to be saying opposite things. You're saying authors don't use unwrap when it should panic, and in the next sentence, you say they use unwrap (which causes a panic) when the failure could be gracefully recovered.
When writing a library, wouldn't the prudent thing to do be to return a Try/Either or have the method accept a function to invoke in case of an error (like an IOC solution)?
But luckily when that time does come to refactor the code to handle the error case, the Rust compiler does a lot to help you make sure it’s handled properly.
One runtime panic I wish was a compile-time error in the rust standard library is the use of an incorrect memory order, eg Ordering::Release with AtomicBool::load().
It would have been fairly trivial to set up generic constraints specifying if a read, write, or read-write ordering semantic is expected and to fail to compile if it wasn’t met.
Likely only once we finally get const parameters specified as regular arguments.
It doesn’t need const at all, though. You just need three traits and either first class enum variants as types or a pseudo enum (mod/struct Ordering with ZST structs Acquire, Release, etc)
I think the reason it's not a compile-time error is that it's actually possible to select your ordering at runtime, like this:
use std::sync::atomic::*;
let x = AtomicU64::new(0);
let ordering = if rand::random() {
Ordering::Relaxed
} else {
Ordering::SeqCst
};
x.fetch_add(1, ordering);
Yes, I tried (a few months ago) mocking a PR for this that refactored the enum variants into ZST structs and that was the only sticking point for backwards compatibility.
When you've gotten to the point of saying, "well, it's ok to panic in example code" you've already lost the game. Novice programmers learn from example code, and novice programmers are an order of magnitude more common than experienced ones. A programming ecosystem that depends on 9 out of every 10 people being able to intuit that which the other 1 understands is not an ecosystem that's going to produce good code.
Rust is a minefield of bear traps laid by experts, and I fear for the future of our industry if Go and Java programmers are required by some quirk of network effects or first mover advantage or whatever to starting programming (badly) in Rust.
So you're saying that the 'expect()' message when a regex compilation error occurs should be a translation from a terse domain specific language to bloviating prose? :-)
Have you ever seen a Regex::new(..).unwrap() fail? It sounds like maybe not. It also sounds like you haven't seen an 'unwrap()' fail either.
> Without it, you just know that some regex somewhere is invalid.
That's bologna. As I discuss in the blog post, 'unwrap()' tells you the line number at which it panicked. There's even an example showing exactly this. There's even another example showing what happens when you call 'Regex::new(..).unwrap()' and it fails[1]:
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/rust-panic`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
foo\p{glyph}bar
^^^^^^^^^
error: Unicode property not found
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)', main.rs:4:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrac
So you get the line number (as is standard with any 'unwrap()') and you get the actual error message from the regex. No need to even enable the backtrace. You just don't need 'expect()' here.
When you unwrap/expect on an error, the inner error value is already printed. Are you suggesting something more? Reference the implementation details from the article:
impl<T, E: std::fmt::Debug> Result<T, E> {
pub fn unwrap(self) -> T {
match self {
Ok(t) => t,
Err(e) => panic!("called `Result::unwrap()` on an `Err` value: {:?}", e),
}
}
}
> When checking preconditions, make sure the panic message relates to the documented precondition, perhaps by adding a custom message. For example,
> `assert!(!xs.is_empty(), "expected parameter 'xs' to be non-empty")`.
This panics with
> thread 'main' panicked at 'expected parameter 'xs' to be non-empty', src/main.rs:79:5
Without the custom message it's
> thread 'main' panicked at 'assertion failed: !xs.is_empty()', src/main.rs:79:5
Given that panics should be for bugs, i.e. interpreted by developers, I'd say the second message is clear enough and a custom message just adds noise in the source code.
I don't disagree. Definitely depends on how opaque the assertion test is. In this case, yeah, probably don't need a message. Pithy illustrative examples are hard.
Hah. Ironically, set_var might be deprecated at some point and replaced with an unsafe alternative. (Long story. Short story is that it's currently unsound. If you have C code trying to read from the environment at the same time you end up with UB. If everything is Rust code though, then I believe you're fine.)
I was trying to find a "turn on backtrace" as an official api but couldn't find one in 90s of looking. I see the backtrace crate for catching them in user space in process, which is nice.
What would you recommend? Forking and execing yourself to set the env var? backtrace::enable_full() or something to that effect would be nice.
fork-exec is probably the most robust way. But having some std API to enable backtraces seems reasonable too. Someone just needs to put in the design work and champion it.
tl;dr: Author convincingly argues that Rust `unwrap()` (Java `Optional.get`, Haskell `fromJust`) is fine when you either have checked that the call will not fail, or when you're in a unit test or similar where a panic is a helpful result.
(And, conversely, that it's not fine to use it to avoid doing real error handling)
Nope. But people are thoroughly confused by it. Comes up all of the time. And a lot of people have more extremist positions. Banning unwrap. Or banning any panicking branches at all. That's why my blog post covers an example where we convert a function that never panics (sensible) to a function whose signature says it might return an error, but it actually will never return an error (bonkers).
People advocate for this. After publishing this article, it almost seems like people are more confused than I thought. Idk.
Anyway, no, this blog is not meant to be controversial. It is meant to untangle knots.
Maybe people didn’t read all the way to the end of your article?
I mean, my gut feel response was to the first half was, this is crazy talk.
Look, given an api that when I call it, it either a) panics or b) returns an error, I’m pretty clear on which one I prefer in general.
…but the point (or what I got out of it anyway, by the end) isn't “It’s fine to panic”.
It’s not fine to panic arbitrarily.
Code should not panic generally speaking.
…but there are also times when it a unavoidable, or pointlessly pedantic (eg. Slice index, FSM example) to avoid it completely.
I certainly started reading the top section (aka “Therefore, when runtime invariants arise…”) and my response was: yes, but most of the time you don’t have those.
…and if you don’t, why would you use unwrap?
I can say, at least, you made me have a good think about it.
One thing worth pointing out, is that I did try to head off an experience like what you had by stating my position up front. In the first section after the ToC, the third bullet point reads:
> If a Rust program panics, then it signals a bug in the program. That is, correct Rust programs don’t panic.
I suspect you might have just accidentally skipped over that. Because I think it re-affirms that you should indeed not panic arbitrarily.
As for runtime invariants, I have them a lot in Rust. I tend to work on lower level text primitives (regex engines) where perf is quite important. Perf tends to add complexity in the form of an increased number of runtime invariants. I do my best to push them to compile time where I can, but it just isn't always possible.
No, that isn't buried in the "or similar" of the GP. They mention that explicitly with "checked that the call will not fail." The "or similar" refers to documentation examples and prototyping/one-off scripts.
my problem with panic is that it is like walking in a mine field, you don't know which function will blow up at any given time, and it's not checked by the compiler.
If there was some sort of signal to mark a function as panicking (& vice versa), that would be nice.
Not really. It's possible to verify that a call graph can't call panic anywhere, and there exist 3rd party hacky solutions for this already. Rust is just lacking first-class features for this.
But I stand by what I said. My perspective here is like this. "Okay, so you're complaining about panics, what's your alternative?" No, really, like what do you instead of panicking? That's what my blog explores.
So what I'm saying is, if you're going to complain about all the various panic branches, and assuming those panic branches are legitimate (i.e., hitting them would be a bug), then to me, that's just like complaining about bugs in general. At least with panics, they're a lot more visible.
Even something like 'untrusted'[1] (used in crypto libs like 'ring') uses 'unwrap()' in its implementation. What else are you going to do to remove that panicking branch? Complaining about its existence, to me, is basically like complaining that bugs exist and that they're hard to find. (Except panics are better, because panicking branches are much easier to find than arbitrary bugs.)
Assurance there are can be no panics has practical benefits, beyond the "but what about bugs?" question. Exception safety is a complication for unsafe code, and has been a source unsoundness in Rust (to the point there's an RFC proposing to remove ability to unwind from Drop entirely). It has performance costs (code bloat, inhibits code motion and autovectorization).
No-panic as an assertion has a value of ensuring that your expectations match. You can ensure that an infallible function really is infallible. You can ensure that a function that returns Result fails only this way. You can ensure your code running in non-Rust stack frames doesn't need a double catch_unwind sandwich.
Even when the code correctly uses panics for what it considers bugs, maybe the code's contract needs to be changed. For example you can have a function that requires a convex polygon as an input. If you pass it a non-convex polygon, it's clearly your bug. But if you get points from an untrusted source, adding an `is_convex` check may be insufficient, because due to rounding errors your check and function's check may disagree and you get a DoS vector, and you pay cost of checking twice. If you need it in real-time graphics, a non-panicking garbage-in garbage-out approach may be better.
That all sounds fine and good. Of course assurance that there are no panics has practical benefits. Sign me up.
I'm still not sure what that has to do with complaining that Rust makes it too easy to panic. What's the alternative? If you're suggesting "garbage-in-garbage-out" everywhere, then I don't think that's a plausible answer for things like 'slice[i]' or 'RefCell::borrow()'. Whether garbage-in-garbage-out makes sense as a strategy depends a lot of the specifics of the situation. (And your polygon example sounds like one of those.) But that to me doesn't suggest something systemically wrong with the frequency of panicking branches in Rust code.
The alternative is to have control, so you can choose (and enforce) the right strategy for your context: unwinding, or aborting, or only infallible/Result/GIGO (with turning `slice[i]` and `RefCell::borrow()` into compile-time errors where necessary). Rust's default of "anything can unwind anywhere at any time time" is just one of these strategies, and it's not always the right one. And when it's not the right one, it does create a "mine field" as the OP says.
I think I see what you're saying now. I guess time will tell, but I don't think the alternative you pose will get rid of the so-called "mine field." So I still stand by what I said: complaining about too many panics is like complaining about too many bugs. I think a different complaint, i.e., "Rust doesn't give enough visibility and control with respect to panics," is a different story entirely and is valid. (Although I don't particularly suffer from its absence.) With that said, it would be cool to have the level of control you're talking about, but it's not clear to me the extra language complexity required for it is worth it. But hard to say at just a conceptual level.
Your focus on unwrap/expect seems arbitrary. Why aren't you commenting on the ease with which 'slice[i]' and 'x * y' fail? Or alloc failure? Is slice index syntax not also too easy to reach for?
I ask because I suspect you've got a bit of motte and bailey going on here. The motte is "hey let's make unwrap/expect more verbose because we want people to be REALLY sure," but the bailey is "let's actually make everything that can panic a lot more verbose and totally change the character of the language and make it a lot less practical."
Your response is kind of harsh, and you're making a bit of an unsympathetic straw man out of what I said.
IMHO there's a qualitative difference in the programmer's expectations when indexing a slice vs calling a function which has been explicitly written to return an error condition. Years of convention causes us to expect indexing errors and to write defensively. But the implied contract of a Rust function with a Result is that the user should do probably do something with the result other than panic in most cases.
I agree that panicing is a legit option. And I agree with the scenarios laid out in the article. And I also don't think lint is the right way to handle it.
But I'm currently in a codebase that is full of unwraps all over -- which the developers did for expediency "get this thing shipped" reasons -- and that (and other codebases I've seen) is what leads me to the conclusion that the ergonomics of putting unwrap right out there in our faces aren't ideal.
Hell, even calling it "result_or_panic" would have perhaps made casual users of it pause and think about what they were doing. There are likely syntactical tools that could have been put in place to really make the user think before creating a panic.
(FWIW safety isn't my primary reason for preferring Rust. I'd be fine with "C++ with a ML-style type system." The general tamping down of footguns is great, though)
Fair enough. Sorry about the harshness. Although you respond to the slice indexing problem with some reasoning about expectations (I'm not so sure I buy that), you do leave off the alloc and arithmetic aspects. I don't think arithmetic, for example, is something we've been trained to be guarded about. But it can be a source of bugs too.
Anyway, I'm sympathetic to the plight of finding yourself in a codebase full of bugs. But I think 'unwrap()' gets a bad rap here. It just so happens that it makes those bugs easily visible and it happens to be a common manifestation of it.
Personally, I don't think a different name for 'unwrap()' would have helped things. One common suggestion is 'or_panic()'. If we could go back and do it all over again, I think 'or_panic()' would be worth a shot. But I think that ship has sailed.
> Is slice index syntax not also too easy to reach for?
It absolutely is. Modern language design should be discouraging getting an element by index; there are usually better alternatives e.g. iterating through a datastructure, or using combinators like zip to build the datastructure/view you need.
Go build that language, build a few high performance tools like I have and call me when you're done. Or point me to existing tools with a benchmark comparison. Then I'll take a look at seriously considering your claims. Until then, your comments read like nonsense to me.
I've been writing valuable software (with corresponding recompense) for over ten years and have never seen a case where this kind of thing was vital for meaningful performance. Yes, languages without this are never going to write the world's fastest grep (or at least, not without further work that is not currently a priority), but as far as I can see that's just as much of an artificial microbenchmark as measuring how fast your chosen language can loop 1000 times.
If you elide such a feature from the entire language then you also won't be able to build a fast (standard) library. One of the reasons you usually don't have to overly worry about performance is that the standard library (as well as other libraries) are typically pretty optimized. If you take away the possibility for that optimisation then the entire language becomes slower, by virtue of having a slower standard library.
"Artificial microbenchmarks" are very useful if there's a good chance the code might be called a few thousand times in a loop. If you write generic code that may get used by thousands or even millions of applications then these small differences do matter, and add up.
Also I think a lot of code will be needless awkward too; sometimes I really just want to get the first or second or last index. I don't want or need to iterate: I just want exactly nth entry, nothing more, nothing less. Yes, you need to be careful with it, but entirely removing them is not much of a solution.
I'll be sure to let my artificial users know that.
Like, my god man, next time just come out and say, "I do not consider your work or the use cases you care about valid, and thus I am going to dismiss everything you say on that basis." At least be honest.
There's a lot more stuff out there besides grep tools that need to go as fast as possible. I'm glad Rust exists for that and I'm glad its design is full of practical trade offs that people like you don't seem to see as reasonable.
You're soc right? There's a reason I have you blocked on every social media web site for which we both are a part and for which it is possible. Go away.
The author notes that API simplicity might be a reason to avoid pushing invariants to compile time:
> What do I mean by “API simplicity?” Well, this panic could be removed by moving this runtime invariant to a compile time invariant. Namely, the API could provide, for example, an AhoCorasickOverlapping type, and the overlapping search routines would be defined only on that type and not on AhoCorasick. Therefore, users of the crate could never call an overlapping search routine on an improperly configured automaton. The compiler simply wouldn’t allow it.
> But this adds a lot of additional surface area to the API. And it does it in really pernicious ways. For example, an AhoCorasickOverlapping type would still want to have normal non-overlapping search routines, just like AhoCorasick does. It’s now reasonable to want to be able to write routines that accept any kind of Aho-Corasick automaton and run a non-overlapping search. In that case, either the aho-corasick crate or the programmer using the crate needs to define some kind of generic abstraction to enable that. Or, more likely, perhaps copy some code.
> I thus made a judgment that having one type that can do everything—but might fail loudly for certain methods under certain configurations—would be best. The API design of aho-corasick isn’t going to result in subtle logic errors that silently produce incorrect results. If a mistake is made, then the caller is still going to get a panic with a clear message. At that point, the fix will be easy.
What I gather from this is that the author chose to define a type (call it A) with an attribute that when set in a certain way will cause certain functions to panic. This was preferred to the alternative (two types, A and B) with functions specific to each and where panic was not possible.
This kind of design decision comes up a lot, so understanding the reasoning here could be helpful in a lot of situations. Unfortunately, the passage is less than clear due to lack of source code inline and the highly-specific nature of the problem. An example with source code using more accessible algorithms might be an improvement here.
That said, I'm skeptical that the full range of approach was considered. I sometimes find that the presence of unwrap is a smell pointing to types that have not been fully fleshed out.
As an extreme case, consider a struct whose fields contained diverse data (numbers, colors, enumerated values), but which are all defined as strings. It will be very easy to put this struct into an inconsistent runtime state because nothing can be checked at compile time. The type itself is anemic. Replacing strings with more constrained types eliminates opportunities for panic - possibly all of them.
I get that the whole point is "at what cost?" All I'm saying is that the tradeoffs aren't clear from the example in the passage.
It also turns out that some match kinds are more amenable to other types of searches, such as overlapping searches. Overlapping searches report every possible match, but leftmost "match kinds" specifically prune certain matches from the automaton. An overlapping search with a "leftmost" match kind produces weird results that are difficult to characterize.
So, when an automaton is configured with a "leftmost" match kind, you have a choice: allow overlapping searches or disallow them. I chose to disallow them. Once you make that choice, you then must choose whether to disallow them at compile time or disallow them at runtime. I chose runtime, for the reasons stated.
If I chose compile time, then I'd need a new `AhoCorasickOverlapping` type which provides the overlapping search routines in addition to the non-overlapping search routines. Then I could get rid of the overlapping search routines on `AhoCorasick`. I'd then also need to add a new build method[1] to the `AhoCorasickBuilder` that let you build an overlapping automaton.
> That said, I'm skeptical that the full range of approach was considered. I sometimes find that the presence of unwrap is a smell pointing to types that have not been fully fleshed out.
I am a fallible human. I might be wrong. So sure, be skeptical!
> As an extreme case, consider a struct whose fields contained diverse data (numbers, colors, enumerated values), but which are all defined as strings. It will be very easy to put this struct into an inconsistent runtime state because nothing can be checked at compile time. The type itself is anemic. Replacing strings with more constrained types eliminates opportunities for panic - possibly all of them.
I'm not sure what this is a case of? Like yeah, I agree, that sounds bad?
> I get that the whole point is "at what cost?" All I'm saying is that the tradeoffs aren't clear from the example in the passage.
Understood. Small illustrative examples are hard. Especially API design. API design is a very domain specific thing. I could probably write an entire blog post on the API design of just the aho-corasick crate. It has gone through many iterations and many lessons have been learned. (And I still have at least one more iteration to go.) I tried to distill down one small part of it in order to talk about the idea of not pursuing literally every possible compile time restriction because sometimes keeping invariants maintained at runtime leads to a simpler API. If you accept that principle already, then all is well.
But some people think the entire farm should be bet on pushing every possible thing to a compile time invariant, regardless of the cost. I am not one of those people and I think it leads to bad API design. And it's very relevant to this topic because if you don't push something to compile time, then, well, you probably need a panicking branch somewhere.
Does this mean using C inside of rust is ok? I'm pretty sure the original team kept hitting memory problems and it was in fact not ok. Browsers crash often enough that I don't want unwrap making it worse
And how do you propose they do that? What you're saying is tautological. It's like saying, "I don't want browsers to use runtime invariants because they can be broken and thus causing my browser to terminate."
Like... yeah, cool. How do you do that exactly?
Also, please do consider reading the blog post I wrote. I wrote it to try to clarify a lot of confusing around this topic. It is nuanced, and it just can't be untangled in a few sentences in a HN comment.
I think you missed the point of the article. In fact, the article specifically covers catching panics, so it doesn't bring down the entire application. As well as covering when panic is okay. It doesn't say you should just unwrap everything and treat panic as error handling.
I see 3 mentions of catch, all in the same paragraph. It says you can catch it, offers no reasons why you want to then provides an example of output when you don't
That doesn't sound like it "specifically covers catching panics"
When invariant violations or mistakes by programmers (aka bugs) are detected, the program should halt as it is in an inconsistent state and continuing could be very dangerous (think privacy/security/data corruption). Otherwise, don't halt (handle it or have the caller handle it).