A Manifesto For Error Reporting

jerf · on March 5, 2013

I'd expand on one of his points and observe that one of the core problems is that they are called exceptions, which prejudices the discourse in advance. They aren't exceptions. They are a way of declaring a handler that is scoped to receive certain types of objects on a second control plane (beyond normal control flow), which when invoked, destructively unwinds the stack until a matching handler is found or the program terminates due to falling off the top.

That is what they are. One of the things this mechanism is used for is exceptions, but that is a separate concept. It is possible that exceptions are generally a good idea, but that is the wrong mechanism to use for them (see, for instance, Lisp conditions & restarts for a good argument to that effect [1]). It is also possible that exceptions in general are a bad idea and it is better to use inline error codes rather than the separate control plane but that there is some other valid use for this second control plane (see all the non-exception uses of "exceptions", and note this is not hypothetical; see Python's StopIteration exception [2]).

Conflating the feature with the most popular use just leads to lots of confusing debates with people talking past each other.

[1]: http://www.gigamonkeys.com/book/beyond-exception-handling-co...

[2]: http://www.python.org/dev/peps/pep-0234/ , use Find to search for "It has been questioned" for a direct response to this issue

RickHull · on March 5, 2013

> I'd expand on one of his points and observe that one of the core problems is that they are called exceptions, which prejudices the discourse in advance. They aren't exceptions. They are a way of declaring a handler ...

I can't figure out what you're referring to with they. I am guessing wrapped exceptions, but also maybe the example where nil was passed to the constructor instead of a hash?

jerf · on March 5, 2013

Exceptions are called exceptions. But they really aren't. They're two concepts conflated together.

RickHull · on March 5, 2013

> Exceptions are called exceptions. But they really aren't. They're two concepts conflated together.

> They aren't exceptions. They are a way of declaring a handler ...

Hm, if that's the case, then you described the handling of exceptions, which is distinct from the declaration of the exceptional case as well as the act of throwing or raising the exception.

I feel like your comment unhelpfully adds mystique rather than clarifies. I may be holding you to an "unfair" standard though, based on your comment history, of which I am an enthusiastic fan.

jerf · on March 5, 2013

It's hard to say "exceptions aren't actually exceptions" without essentially being guilty of equivocation in advance. The important part of my message was where I broke it down in two pieces, neither of which I call exceptions in a desperate and apparently failed attempt to avoid further confusion. Sorry. There's the control flow construct, and there's the use of the control flow construct for handling certain types of errors, and the two get bundled together under one word in a way I think is misleading.

mcherm · on March 5, 2013

> If you catch an exception, I need to know about it unless you’re really goddamn sure I don’t

I don't think I agree with this one. We agree that when a piece of of code encounters a problem it should trigger an exception with stack trace and useful information. But you seem to believe that when an exception handler catches an exception it should log it, then do something about it. I think that exception handlers should do one of two things:

(1) Catch the exception, add additional information that the lower-level function didn't have access to (eg: what file was being processed when it occurred), then re-throw the exception.

(2) HANDLE the problem somehow.

(Anything else and you just shouldn't handle the exception.)

Now, (2) has several possible things. Perhaps if the web service is down we can fall back on using the cached values from the database -- that's an example of FIXING the problem. Perhaps the value isn't really needed and we can leave it out -- that's AVOIDING the issue. And the most common is to show an error message to the user in one form or another -- that's REPORTING it.

If you REPORT the problem, I believe you should always output the exception and stack trace someplace. But if you FIX or AVOID the problem, then it may or may not be appropriate to log it. A FIX or AVOID situation that occurs quite rarely is probably worth logging; one which occurs under normal circumstances (the web service goes down for maintenance for several hours each week) may only need a counter in some admin console.

(PS: using exceptions as control flow is an extreme case of FIX -- reaching the end of the list was an exception that is FIXED by moving on to the next piece of work.)

DRMacIver · on March 5, 2013

Yes, in retrospect I'm not sure I agree with this one as strongly as I wrote it. Perhaps better to say that you should log by default.

But that being said, problems that the code needs to fix may be symptoms and you may need them for debugging. I think they're worth logging more often than not. Maybe just less noisily than your normal error reporting.

pswenson · on March 5, 2013

exceptions usually happen because of one of two reasons: 1) bugs 2) system failures.

bugs you can't recover from, you need to fix them.

system failures can sometimes be recovered from, but often they need manual intervention. usually it's best to just report them and let someone fix the problem rather than spending a bunch of resources trying to code some way around the problem (which introduces more bugs).

there are situations where it might be appropriate to try to recover. If an HTTP call fails, no reason not to retry a few times. But if your DB is down - sorry you are F'ed. Alert it - make it visible.

gingerlime · on March 5, 2013

I try to apply this to my logger statements. For example:

    unless signals.has_key? key.to_sym
      logger.error("wrong signal received: #{key.inspect} not in #{signals.inspect}")
      raise ActiveRecord::RecordNotFound

Classifying what's an `error`, `warning` and `info` can be confusing some times, but I have quite clear guidelines, and it helps to better deal with errors overall.

error/fatal - anything that I simply can't recover from. e.g. if a parameter is missing, there's no way to even guess. error logging is almost always accompanied by an exception being raised. (btw, on our rails app we use the logging-rails[1] gem, that emails those errors to us)

warning - something seems wrong, but we can somehow still continue, or it's an error that I don't want an email about. For example, blocking spam on a form submission.

info - useful stuff to know what's going on with the app. User registered / logged in, payment received. Those are also sent to graphite for measuring

debug - all the other stuff you need when writing code.

[1]https://github.com/TwP/logging-rails

adrianmsmith · on March 5, 2013

Yes I've thought about this topic as well.

All loggers in all languages have these different levels. But rarely is it defined which to use when! It's considered "obvious" but, unless it's actually defined, each programmer will find it obvious in a different way. Reading the logfile in production or using rules to only display logs beyond a certain severity won't be useful if every piece of code uses different levels to mean different things.

Here are the rules I came up with:

http://www.databasesandlife.com/which-log-levels-to-use-when...

gingerlime · on March 5, 2013

looks like we have the same idea of which levels makes sense. It's surprising how many developers don't bother thinking about it or realising the importance of logging in general, and consistent logging in particular.

pjungwir · on March 5, 2013

That Ruby logging framework looks really nice. But from reading the READMEs, something seems missing: is there a way to set the log level for different appenders (rather than different classes)? For example, you want the `file` appender to keep everything, but the `email` appender should only report errors.

gingerlime · on March 5, 2013

yes, that's the way it works pretty much. Appenders are independent and you can pick and choose. I've defined it to only send an email when errors are logged, but everything goes to a file.

It also has a nice feature that it won't bombard you with an email for each error, but instead it collates the errors and only sends an email after 60 seconds (configurable too).

I really recommend this gem. While you're at it, try out lograge gem too, which makes logs much more compact. Both gems work well together.

pjungwir · on March 5, 2013

Do you have to do anything special to get that behavior from the email appender? I can't find anything in the docs about setting appender-specific log levels.

henrik_w · on March 5, 2013

Lots of good advice. I am constantly amazed at how many error messages don't contain any dynamic information (as mentioned in the article - what the offending value was, and why it was wrong).

One of the best fixes is for developers to have to spend time debugging/trouble shooting. If nothing else, it teaches you the importance of good error/log messages.

praptak · on March 5, 2013

Python specific advice on re-raising upon catching and keeping the original trace: in Python 2.7+, please use chained exceptions: http://www.python.org/dev/peps/pep-3134/

For older Python versions please see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/

masklinn · on March 5, 2013

> For older Python versions please see http://blog.ianbicking.org/2007/09/12/re-raising-exceptions/

This is super important, I recently rediscovered this with a colleague (the company's tool was fucking up exceptions everywhere, either over-logging things or throwing stacktraces out in an attempt to rewrap them and we decided to fix this).

Although the second-to-last example:

    new_exc = Exception("Error in line %s: %s"
                    % (lineno, exc or exc_class))
    raise new_exc.__class__, new_exc, tb

would be much better written as:

    raise Exception, "Error in line %s: %s" % (lineno, exc or exc_class), tb

Roboprog · on March 5, 2013

God knows I've seen enough "two year old" error messages: "I don't like it! <spits out>". Well, what would you like, you sniveling little diaper wetting sot of a program?!?

I could not agree more with his comment about "Bad value {X} should be ...".

Oh, and the part about the amazing disappearing stack trace -- I've seen way too much "print e.toString()" which discards all that wonderful "where" information.

moe · on March 5, 2013

disappearing stack trace

In many (most?) OO-languages it takes a surprising amount of gymnastics to properly chain exceptions. Especially if you're a library-author depending on other libraries.

The blame goes squarely to the language designers here. This feature should be baked into the core of every language because adding it with a 3rd party library is far from trivial in most.

Here's an example of such a library (ruby): https://github.com/pangloss/nested_exceptions

Use it!

Roboprog · on March 6, 2013

For all my grumbling about Java, I guess exception chaining is one thing they actually did get right from the beginning:

throw new RuntimeException( "App feature X broke ...", e);

Could be worse, could be C -- setjmp/longjmp :-)

Actually, longjmp is pretty useful, but it typically makes for "course grained" error handling, and you better have a good error message logged before jumping back. As well as protect yourself from resource leaks...

islon · on March 5, 2013

When I'm coding I normally think about 2 things:

- What should I do here to help me and other developers understand what's gone wrong. This generally results in one of the three categories:

    - Is it a "shouldn't happen" error like hardware malfunction, internet down, etc.
    - Is it a invalid data error, someone screwed data and I got the error here.
    - Is it a algorithm error, I got something wrong in my code and this wasn't supposed to happen.

- What should I show to my user. Divided in three categories too:

    - Unrecoverable error: should show a big red screen.
    - Recoverable error: should show a informative message with instructions.
    - Errors that don't affect the user: no need to show anything, just log the error.

Roboprog · on March 6, 2013

Who is this "user" you speak of? :-)

I'd extend that a bit for batch systems: invoke some kind of tool to file a trouble ticket to get the operations/support group's attention, as there won't be somebody sitting at a console to watch. Alternately, you could view this as generating a big red screen, just realize that any users are sitting "someplace else".

Yeah, some of us still have batch operations going, as well as a web app or two.

HeyLaughingBoy · on March 5, 2013

There's a third thing to consider: how to recover from the error. In many cases, the recovery is something the software has to do itself: e.g., server down, switch to alternate server.

My preference is to consider error reporting as something completely separate from error handling. This is even more the case if you have different classes of users.

stinos · on March 5, 2013

all well, but that 3rd one highly depends on context. If I have an application for collecting/analysing data that is going to be used to publish papers and whatnot, it is absolutely critical the algorythms I write are ok no matter what. If they are not, it's time to go berserk (as in the 1st one). Ignoring it would be the worst I can do at that point.

TheOnly92 · on March 5, 2013

What I think is that, depending on your users though, they do not understand your backend enough to make proper error reporting.

The "Bad argument" example, they don't know how many different situation triggers the same error, they don't know if what they did before will help you figure out their problem. In short, they don't know enough to be helpful.

I don't have any great solution to this, but putting a middleman who, although can't help you with programming, knows what is helpful and what is not enough to provide useful error reporting to you might solve it.

evolve2k · on March 5, 2013

Tools like https://crashlog.io can be good to help you collect error reporting as a dev team and help you debug easily also. Maybe that's a good approach, raise the exception but capture it and send it to the devs.

DRMacIver · on March 5, 2013

I agree that this can be a problem.

I think a good default here is that if you don't know enough about what you should report, you should report as much as you reasonably can - for example if you can't say what's wrong with a value, at least say what that value was.

eksith · on March 5, 2013

This is why we had a logger running in the background that received all anomalous events from warnings to exceptions (the class, method and passed data including passed parameters and the relevant line). And we had a catch-all trigger for when something goes really, really, really, badly wrong and there's nothing else available to dump to the logger.

Annoying during development, a godsend in production. No one is immune from stupid mistakes.

DRMacIver · on March 6, 2013

Yes, indeed, we have the same, but it becomes much more useful if the exceptions being thrown by your dependencies are of high quality.

br1 · on March 5, 2013

Windows developers have it easier because debuggers are better. The same debugger can work on VB6 code calling C++ calling C#. Memory dumps also seems to be more common and useful in Windows than in Unix. If you get dump for all crashes, stack traces seem almost useless in comparison.

wilmoore · on March 5, 2013

The only tragedy about this thread is that more people aren't commenting. Either people are taking the advice and moving on or they don't care. I really hope it is the former :)