The Scourge of Error Handling

pcwalton · on Dec 8, 2012

Error handling is indeed very hard to get right. In Rust we've been experimenting with different mechanisms:

* Much of our code uses the Result<T,E> type for local error handling, which is very similar to the way Haskell handles exceptions with the Error monad.

* For long-distance, fatal error handling, Graydon was very influenced by the "crash-only software" paper [1]. There is a `fail` statement which brings down the task permanently with no chance of recovery. (The only code that executes after a `fail` expression is evaluated is the set of destructors attached to the data the task owns.) Of course, other tasks can continue executing and might restart the crashed task.

* For long-distance, nonfatal error handling there is a new condition system like Common Lisp's -- you register a handler and that handler gets called whenever an error happens. The handler could tell the function that signaled an error to restart with a new value, to return a value of the handler's choice, or to fail the task (the default in most cases).

The hope is that this is a more robust and performant model than the traditional exceptions model, while not being particularly verbose.

[1]: https://www.usenix.org/conference/hotos-ix/crash-only-softwa...

joe_the_user · on Dec 9, 2012

Crash-only code sounds good.

It is worth remembering that the convention in early c development was to check for null values whenever doing memory allocation. In most PC programs, this convention added considerable overhead more or less for nothing since once a program runs out of memory, recovery really possible (oh, and Linux doesn't even return zero even when out of memory, it just boot programs).

_3u10 · on Dec 9, 2012

Checking for null values from malloc does not create serious overhead, it creates minor overhead for your branch predictor, unless for some reason malloc frequently returns null.

A single context switch back to the kernel generates far more load than the null checks could ever hope to accomplish.

joe_the_user · on Dec 9, 2012

I mean all those checks create considerable mental overhead from code-bloat.

michaelfeathers · on Dec 8, 2012

I feel the same way about error handling, but I think it is more of a design issue than a language issue. Ideally, an application has a barrier that deals with anything from the outside world that can cause an error. Past that barrier, code can concentrate on the main goal, not errors.

Bertrand Meyer once said that exceptions are for cases where you can't tell whether an operation will succeed or not before trying it. Generally, that happens in I/O, system calls and input validation. The problem with a lot of error handling is that it moves beyond that realm and mixes with the logic of the system.

qznc · on Dec 8, 2012

A compiler usually has three parts, usually called front-, middle- and backend. Errors can usually only happen at the frontend (syntax errors, type checking) and rarely at the backend (no enough space for output). During the middle-end, the optimization, no errors can happen. However, the compiler is usually not perfect and might contain bugs. So for debugging purposes some error handling is helpful. Should we use different error handling mechanisms for internal and external errors? For performance reasons, we could remove the internal error handling for release builds.

Another problem is to distuingish between inside and outside. A library usually does some input checking, because the caller is considered "outside". To remove this overhead, the library could document some restrictions and leave it to the users, which usually does not end very well. For example, memcpy requires that src and dst must not overlap, but in reality this causes problems [0]. This only works, if the programmer who writes the caller has the possibility and skills to adapt the callee. In other words, there is no inside-outside difference for her.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=638477#c38

stcredzero · on Dec 8, 2012

> The problem with a lot of error handling is that it moves beyond that realm and mixes with the logic of the system.

What is the motivation for this? What are the perceived pains that programmers are trying to cure which are not, "cases where you can't tell whether an operation will succeed or not before trying it?"

yock · on Dec 8, 2012

Most often I think it's due to poor separation of concerns. The programmer is about to push an input down a stack of method calls several layers deep. There's the success condition, the failure condition, then the exception which captures unexpected behavior from the rat's nest of code he just called.

stcredzero · on Dec 8, 2012

How often can this be addressed with orthogonal finite state machines? (For example: one which embodies whatever business process, and another that embodies errors and failures with IO and the network.)

VexXtreme · on Dec 9, 2012

I personally avoid reporting errors with null return values, though that approach might actually be useful in some situations. I've seen cases where less-than-skilled developers had an irrational fear of exceptions and just treated them like undesirable error states instead of an error reporting and handling mechanism. It resulted in systems where every layer of the call stack was polluted with blocks such as:

if (result != null) { } else ...

This usually results in the real cause of an error being completely and utterly lost somewhere in the call stack, as using this kind of binary approach to error reporting is inadequate to communicate what really happened.

Question:

How do you handle data concurrency issues in big multi-user systems? For instance, if your lowest level service (usually data access layer) throws a data concurrency exception due to some other user modifying the data that the current user is trying to modify, how do you communicate to your user that the data has changed and that they are supposed to refresh the web page, if you are masking the real cause of the problem by just returning null values from the lowest level of the call stack?

I personally handle cases like this by allowing the appropriate exception to propagate to the highest level so that I can make a judgment call whether it's the kind of error that needs to be reported to the user, silently logged or something else. On lower levels I still might have try-catch blocks for logging purposes and in certain situations I might rethrow exceptions if the error needs to be reported to a higher level.

sgt101 · on Dec 8, 2012

"the conventional use of a null return both as an indicator of an error condition and as an actual data item"

Yet, this is a thing that I have never done, nor would never do. Can someone explain to me how it makes sense for this to happen in any other mode than as a genuine error? If I want to indicate an error condition I throw an exception, perhaps even one that I've written a class for. This allows the compiler to check that I've put proper handling in the code to deal with this condition. If I am stupid and I handle it with { e.printStackTrace();) well, this gets me no where... but if I listen to the compiler and write in handling code that repairs the condition then all proceeds nicely, as if the problem has been properly dealt with and all..

Or is it just me?

return null; just says "your problem f*wad" to me...

Not acceptable.

Ygg2 · on Dec 8, 2012

Null value is used to determine several things: missing value, incorrect value or unknown value (there are other semantics).

A program that transforms one value into another value, may return a NULL value to denote that program processed value is missing or incorrect due to error. That was what it was in C, which had no exceptions.

Your example is nice, but if you are thinking exceptions aren't a problem, keep in mind they add several branching paths in a code that seems non-branching. This hides the errors and causes all other forms of confusion. In fact I do remember an anecdotal story of a team working on a Java system of critical importance, forbidding use of exceptions in order to guarantee that code would be run smoothly.

Maybe Java 8 will fix some pain points[1] of exceptions.

[1]http://java8blog.com/post/37385501926/fixing-checked-excepti...

Danieru · on Dec 8, 2012

In C you have no exception handling. The closest you can get is seqfualts. With a segfault the debugger can give you a backtrace at almost the exact issue point.

The easiest way to get segfaults is by dereferencing NULL. In C dereferencing NULL is invalid. Thus returning NULL on failure will result in something close to what raising an exception in other languages does.

In any language with real exception handling it makes much less sense, ignoring optimization.

apaprocki · on Dec 8, 2012

Dereferencing NULL is undefined -- on AIX the page at 0x0 is readable.

tptacek · on Dec 10, 2012

You don't need to fault the program to take a stack trace.

Daniel_Newby · on Dec 8, 2012

setjmp and longjmp can be used for exception handling, if you want to lose your mind.

jacques_chester · on Dec 9, 2012

Perhaps some taxonomy is required to help us deal with this.

Some errors are essential/problem domain errors. They represent an impossibility with regards to the purpose of the code. I place validation code under this heading.

Some are accidental/solution domain errors. They represent a failure of the environment or computing platform. They are present to allow the code to decide what to do when a design assumption (network availability, disk availability, RAM availability etc) is broken.

There's also the question of who must handle the error. Exceptions allow errors to propagate up a stack. Both C-style and multivalue return error handling styles force inline handling. Common Lisp conditions allow out-of-band error handling (some other agent freezes execution and steps in).

To me the core problem is this:

1. Error handling obfuscates the purpose of a piece of code. It hides the happy case and alternative cases.

2. Error handling is an unavoidable requirement of all code. Things go wrong.

How best to reconcile these problems?

It would be nice to have an environment that hides and shows code in terms of the path of execution. So you'd have a happy path view, views for each alternative path, a view for the malloc fail path and so on.

No idea how that could be done. Sounds dangerously like generated code; or alternatively a self-contained image environment à la Smalltalk/Lisp.