Error Handling in Node.js

latch · on Nov 4, 2016

My non-node specific suggestions:

1 - Don't catch errors unless you can actually handle them (and chances are, you can't handle them). Let them bubble up to a global handler, where you can have centralized logging. There's a fairly old discussion with Anders Hejlsberg that talks about this in the context of Java's miserable checked exception that I recommend [1]. This is also why, in my mind, Go gets it wrong.

2 - In the context of error handling (and system quality), logging and monitoring are the most important thing you can do. Period. Only log actionable items, else you'll start to ignore your logs. Make sure your errors come accompanied by a date (this can be done in your central error handler, or at ingestion time (via logstash or what have you)

3 - Display generic/canned errors to users...errors can contain sensitive information.

4 - Turn errors you run into and fix into test cases.

[1] - http://www.artima.com/intv/handcuffs.html

btschaegg · on Nov 4, 2016

When it comes to 1.), a better way to state things is may be that you shouldn't ignore errors unless you were able to completely handle them.

Catching exceptions to throw exceptions with better messages is something I would stronly suggest, since almost no exceptions are useful without contextual information. I.e. which file was not found? The config, not the input or output. Things like this. This is especially useful in C++, where you don't get stack traces, but also in other languages you'll want to present non-technical (i.e. non-dev) users with meaningful messages. Stack traces will just frighten them off.

Your comment on sensitive information also plays into this.

I'll agree that just swallowing errors and going on is a recipe for disaster. This is something that regularly bugged me in most of the C code I've encountered so far.

narag · on Nov 4, 2016

Catching exceptions to throw exceptions with better messages is something I would stronly suggest

Also not every user in the world understand English.

I catch exceptions in two levels. One is the user initiated action. I seldom see this specified, but it seems obvious to me: if the user decides to do X, either X is done or a clear and meaningful message is shown, explaining what and why could not be done.

There is another finer grained location to do what you say (collecting details), logging and then re-raising up to the other level.

Swallowing errors is evil and no, not limited to C code.

btschaegg · on Nov 4, 2016

Interesting point. I have to confess that I've never seen logging done usable in a i18n sense. Of course, for UI applications, you're absolutely right.

When it comes to logging details, you have a point. But I still think a clear final error message is something to aspire to - especally if you're writing very busy and potentially multithreaded (or - even worse - async) services.

On this note, filtering logs according to thread IDs can be very helpful here. I wonder if that is also easily possible with "fibers".

> Swallowing errors is evil and no, not limited to C code.

Of course, nothing ever is - it's just where I've seen this thing the most (highly anecdotal evidence, I know :-) ). Unfortunately, the nature of many C APIs also makes it very non-obvious if you're skipping through the code, whereas an empty catch statement stands out somewhat.

dozzie · on Nov 4, 2016

> Catching exceptions to throw exceptions with better messages is something I would stronly suggest, since almost no exceptions are useful without contextual information.

Much more important is to wrap these exceptions generated in internal code in custom types. List of raised exceptions is a part of code's interface, and one shouldn't generally expose this kind of internal details as public API.

btschaegg · on Nov 4, 2016

If you're wrapping third party code that uses custom exception types, absolutely. However, regarding custom types, I found that you can get astonishingly far with most standard error types defined by many languages. So, if the internal code already uses those, I wouldn't see them as much of a problem - a FileNotFoundException essentially may as well stay one.

Of course, default exception types won't work if you have to convey more contextual information that is missing in the according interface. Also, using the same default exception type for different underlying errors can be problematic (an example for this would be C#'s AppSettingsReader.GetValue()[1] - it makes it impossible to distinguish between a parsing error or a missing key via the API).

[1]: https://msdn.microsoft.com/de-de/library/system.configuratio...

pjc50 · on Nov 4, 2016

You can sometimes get stack traces in C++, but it's platform and compiler specific. Gcc's system is pretty good.

(I have some code for WinCE that walks stack traces in conjunction with SEH so that crashes in production - segfault etc - get logged in a useful manner. It does rely on parsing and decoding instructions ...)

gbog · on Nov 4, 2016

> Only log actionable items

Easy to say but much harder to implement. For example, if you communicate with another service, a few network errors are usually not actionable and you'd have some fallback mechanism in you code. But tons of network errors (e.g. > 20%) is a problem that needs to be fixed now. So would you log the network error or not?

lmm · on Nov 4, 2016

Set a threshold, and log only once you hit that threshold.

softawre · on Nov 4, 2016

And keep track of that state across 20 different instances?

What we do is just log the failure and have a system like New Relic monitoring everything so that it can alert us when we hit 20% network failure.

lmm · on Nov 4, 2016

Sure - but then the developer-facing "log" is the New Relic interface, and your instances transmit failure information to it via some API (I mean I suppose you could have one program output a plain-text log file and then another program or service parse that to figure out how many errors were happening, but you wouldn't do that for any other kind of inter-system communication).

wyager · on Nov 4, 2016

Monad transformers offer a more disciplined and pleasant alternative to your #1. You should handle errors at the value level, not with some magic error passing system provided by your language. You can catch the errors at whatever level you like, and you are forced to deal with them properly by construction.

jstimpfle · on Nov 4, 2016

Are you talking about things like EitherT? IMO here isn't so much of a difference to exception handling. Both approaches make it hard to see at a glance where most code could fail, and you can (but are not encouraged to) transform errors explicitly.

Add Java's checked exceptions, now the practical differences are quite subtle. Of course it's nice to be able to be able to do transformations with higher level functions.

fnl · on Nov 4, 2016

Standard "canned" reply: Java doesn't force you to annotate exceptions or even to handle them all. Typed returns do. And as to the "where", well, in the originating function.

But that gets us to some real criticism here: You run the danger of getting an Either Monad out of about every function in your code-base... So maybe throwing exceptions isn't the worst thing after all. (/me ducks all the stuff being thrown my way from hardcore [EDIT:] ~Haskell~ functional programming fans :-))

EDIT: s/Haskell/functional programming/ (because its unfair - Haskell can throw stuff: http://www.randomhacks.net/2007/03/10/haskell-8-ways-to-repo...)

jstimpfle · on Nov 4, 2016

Source? I was under the impression that it does (apart from things like NullPointerException, which of course has Haskell equivalents). And I've had to do a significant project in Java.

BoorishBears · on Nov 4, 2016

You mean Unchecked Exceptions (that extend from RuntimeException) vs Checked Exceptions?

wyager · on Nov 4, 2016

You can manipulate value-level things like EitherT in ways that are not convenient or possible in Java.

EitherT and friends also force you to handle exceptions explicitly before getting a value out. In Java, you can usually just ignore it and "let it bubble up" as someone suggested earlier.

jstimpfle · on Nov 5, 2016

"Manipulate", that's what I meant by "transformations by higher order functions". The point is acknowledged, but I feel in practice it's not always beneficial. There are typically not many more points of use than different transformations. Doing the stuff inline (with catch blocks) is often better since there's less conceptual overhead.

"Bubbling up", that's exactly what the monad instance gives you. How EitherT is supposed to be used. That's why I said there's not much difference from a practical standpoint. So no, EitherT does not force any better style than checked exceptions (but it makes it really inconvenient to traverse regions of code with differing sets of exceptions).

I still feel that the C-style way of handling error codes is superior in most situations, from a writeability and readability perspective. The big problem is it doesn't enforce error handling. Another problem is it's totally unsuited for quick and dirty scripts like I can write with Python "unchecked" exceptions: Just do it, and tell me if there were errors (I might or might know which ones are possible) only at runtime.

koonsolo · on Nov 4, 2016

For point nr 4, at our company we tracked how many bugs that were fixed, came back afterwards (project with 80 developers for more than a decade old code base). We saw it's very rare that the same bug (after the fix) will pop up twice, and therefore decided not to add or write regression tests on them, and focus our testing efforts where it mattered more.

hvidgaard · on Nov 4, 2016

It depends on what kind of error it is. If it is an off by one error, there isn't much you can do to test it, but if it is an input edge case you didn't handle correctly, then a test make a lot of sense, if for nothing else to verify the fix.

lisivka · on Nov 4, 2016

Of course, it is rare that same bug will pop up twice, but bug demonstrates areas which are not covered by tests, so new test(s) will cover these areas and will catch an other bug(s).

flukus · on Nov 4, 2016

> This is also why, in my mind, Go gets it wrong.

Considering the amount of places I see c# code catch errors and continue as though nothing happened I have to wonder how many wild go and c programmers simply ignore error codes?

biztos · on Nov 4, 2016

Once I got used to it I found I kind of like the Go paradigm of checking for errors every time something could go wrong, and (usually) passing the first one up the chain with some additional context info.

However, the fact that "ignore error" is an easy and built-in paradigm that even shows up in the official docs:

    fragileThing, _ := scary.MightNotWork()

that fills me with dread.

Ntrails · on Nov 4, 2016

Could you elaborate?

Assuming that scary.MightNotWork() is some kind of ancillary function that is non-essential, why would I want to let it impact the main program. The example that comes to mind would be logging. If I have set up my own "Write logs into network share" call, I'd never ever expect it to throw errors that took down the app. Share down? Don't care. logfile locked/corrupt. Don't care. Try and log, if you can't, fail silently without impacting the main purpose of the application/service/whatever.

iotscale · on Nov 4, 2016

> Try and log, if you can't, fail silently without impacting the main purpose of the application/service/whatever.

Are you being serious? I agree that there can be "some kind of ancillary function that is non-essential" but in the case of failed logging you should try sending an email / showing some warning if you have a GUI / try other outputs / crash with a meaningful error especially if you are running under some sort of supervisor... etc.

Of course that doesn't invalidate your main point.

biztos · on Nov 4, 2016

Sure. If your function does things internally that might return errors, the normal thing to do is have your function also potentially return an error, namely the first error it finds.

If you call a function the error of which isn't a big deal to your function, you'd normally check the return value to make sure it meets your expectations. If it does, fine -- proceed accordingly. If it doesn't, then send the error up the chain.

So -- totally bogus example -- say you want to return the mod time of a file or, if the file doesn't exist, the epoch. The file not existing is an error, but not one you'd abort on; other file errors though would be problematic:

    // ModOrEpoch returns the modification time of the file at path, or the epoch
    // time if there is no such file.  Unexpected file conditions are returned
    // as errors.
    func ModOrEpoch(path string) (time.Time, error) {

    	epoch := time.Unix(0, 0)
    	info, err := os.Stat(path)
    	if err != nil {
    		if os.IsNotExist(err) {
    			return epoch, nil
    		}
    		return time.Now(),
    			fmt.Errorf("File error for %s: %s", path, err.Error())
    	}
    	if info.IsDir() {
    		return time.Now(),
    			fmt.Errorf("File is a directory: %s", path)
    	}

    	return info.ModTime(), nil

    }

https://play.golang.org/p/AMSJMV3tks

I suppose it's possible you might really, really not care about an error, but that would be extremely un-idiomatic in Go. And, I would argue, a very bad habit in any language where the error could possibly be other than what you expect; sort of like

    try { foobar(); } catch(e) { /* global thermonuclear war? */ }

The thing that bugs me about seeing the errors ignored in official docs is that the Go world puts a lot of emphasis on writing idiomatic code, and the Go "idiom" is much less flexible than, say, Perl's. You might reasonably look to the official documentation to learn what is and isn't idiomatic. And you would hopefully figure out soon enough that ignoring errors isn't, but then again you might not.

Ntrails · on Nov 4, 2016

> I would argue, a very bad habit in any language where the error could possibly be other than what you expect

Assuming that ModOrEpoch was being used to populate a mouseover somewhere on a ui that literally could not be less important, any error propagation is going to be degrading service considerably more than swallowing any error and returning epoch time.

Unless there is (and there could well be) a subset of errors which actually cause serious concerns but don't cause errors in any other function in the application? Do you have any examples/thoughts of what that might entail. :)

biztos · on Nov 4, 2016

I think I might not be getting my point across here.

I'm not saying you should propagate the error all the way up to the end-user in a UI.

I'm saying that you very, very rarely would be so uninterested in a lower-level error that you wouldn't even want to log it, except for a set of expected errors.

So when we imagine a "subset of errors which actually cause serious concerns," we should be realistic about what "serious" means -- but I say that for the "subset" that is all unexpected errors, the minimum level of "serious" is that you want to know it happened. In my contrived example code, imagine the web server process doesn't have permission to stat the given file. You'd want to knwo about that, right?

If you're not doing the mouseover properly N% of the time, you have a more-or-less serious problem with your UI, and you presumably want to know that's happening and why. Maybe what you do about it is expand your set of known errors. But at least you have the option of doing something about it.

woah · on Nov 4, 2016

I think that's in the docs mostly for conciseness. I don't think anyone thinks that it's a good practice in real code.

aikah · on Nov 4, 2016

So you admit Go error as value is verbose. Because they are. The best system ihmo is Java's. If a method throws then IT should be part of the method signature and dealing with the error (try/catch) should be mandatory. The irony is that Go has some form of (inferior) try/catch with panic/recover. So it has both but still pretends exceptions are bad.

jerf · on Nov 4, 2016

The problem with checked exceptions is nothing more and nothing less than they didn't work. Yes, in theory, or at least some theories, checked exceptions ought to be awesome. But they aren't in practice. Go's error handling in practice works better than Java checked exceptions. Where fact and theory conflict, fact wins.

(I emphasize "checked" because there is a much more robust and interesting discussion about exceptions in general, and then of course a number of sidecar discussions about other error handling mechanisms like Either/Option, etc. I'm only making this claim about checked exceptions. Which is kinda shooting fish in a barrel; arguably C's error handling worked better than Java's checked exceptions and I think C's "error handling" isn't even worthy of the term.)

eikenberry · on Nov 4, 2016

If you can't handle an error, why would you allow it. It makes no sense. As a developer you have control of the errors/exceptions that get raised and raising an error that you can't deal with seems.. bad.

btschaegg · on Nov 4, 2016

I'm not sure if I understand you correctly. What do you mean by "allow it"?

I would even define an error to be a situation the code can't handle properly, and those things are usually not under your control (if I rip out a harddisk while some program is running, the developer can't do anything against that - but I would hope that the program fails accordingly and states why it failed).

I'm obviously paraphrasing here, but things like this do happen (USB devices get disconnected, remote services go down etc).

Edit: Obviously, this applies to different code at different levels. Serialization code might fail due to input - but that also is not under the control of the dev writing the serialization logic. Thus, it should fail.

eikenberry · on Nov 4, 2016

You "allow" it by doing something that can potentially give an error without handling it there. I'm referring to the parent's recommendation "Don't catch errors unless you can actually handle them" .. and why you wouldn't handle them. You should always handle them. To put it another way, I disagree with the design decision of depending a top level/global exception handler.

barrkel · on Nov 4, 2016

Concurrency means you can't prevent errors. Every time you open a file, it could have been deleted out from underneath you, in a race with some other process.

Most files a program opens are not as a result of user action: configuration, libraries, resources, etc. And usually it doesn't make sense to catch these errors at the point of occurrence, because they'll be all over the codebase. And there's very little you can do in response to them.

hueving · on Nov 4, 2016

Giant post about the nightmare that is making robust code in node.js. Summary, don't use a language in large projects that makes it so easy to leak errors and exceptions. There's something to be said about the compiler forcing you to declare what exceptions your code can throw to force you to think about this stuff up front.

Illniyar · on Nov 4, 2016

Not really, just normal error handling issues like most other languages.

You should always know how the api you are using returns an error and use it - most langauges have multiple common ways of notifying of an error.

As to leaking, coming from Java's terrible error handling, you just start using runtime errors all the time anyway, at least with js the result is more concise.

ralusek · on Nov 4, 2016

Node is actually best used with Promises, which aren't even mentioned in this post.

Promises, aside from being far more concise with a huge amount of utility, do not leak errors or exceptions.

    doAMillionThings()
    .catch((err) => handleAnything(err));

kbart · on Nov 4, 2016

What better alternatives do you know? I don't know any programming language for real world projects that would make error handling trivial.

wyager · on Nov 4, 2016

Trivial? No. But you can at least make it better than the horrible nightmare described in the OP. Monadic error handling in a strongly typed language gives you a huge leg up on safely and sanely managing the complexity of error handling, because it provides simple value-level error semantics and clear type-level indications of exactly what kinds of errors you have to deal with and how you have to deal with them.

akssri · on Nov 4, 2016

For me, error handling has a major flaw: stack unwinding - extremely annoying thing to happen when the program state took many many hours to achieve. I don't think there is any language other than CL that allows restarts etc. to be defined; slime-repl too is invaluable when debugging.

http://www.gigamonkeys.com/book/beyond-exception-handling-co...

chaitanya · on Nov 4, 2016

Right! A long time back we came up with a really nice way of validating CSV files using restarts. I wrote a bit about it: http://lisper.in/restarts

lispm · on Nov 4, 2016

Nice. One thing you might want to add. From reading you present two ways to deal with restarts:

1) interactive debugger

2) programmatical

There is another one:

3) restart dialog

The program presents you a list of restarts, for example in a GUI dialog, and the end user can select a restart - without interacting with a debugger.

The debugger is just one program, which may display the restarts.

That's how one used it in applications on a Lisp Machine. To call a debugger could be an option in the list of restarts. For real real end users, even the call to the debugger might not be available and all they can do is to choose an option from the list of restarts. Symbolics offered something called 'Firewall', which did all it can to hide the underlying Lisp system from the end user - here the end user should not interact with a debugger or Lisp listener.

But even in a Lisp listener, if you used the 'Copy File' command you might get a dialog shown with the typical options: abort, try again, use other file, enter debugger, ...

chaitanya · on Nov 4, 2016

Nice idea! I will add this suggestion.

akssri · on Nov 4, 2016

Neat. I use it for things like linesearch and regularization (in optimization),

https://github.com/matlisp/matlisp-optimization/blob/master/...

As a fellow Indian lisper, are you by any chance using CL for work ?

Last I heard, the only big CL shop, cleartrip, moved all their codebase to Ocaml.

chaitanya · on Nov 4, 2016

Nah, not using CL for work these days :-(

I used to work at Cleartrip a long time back, before they moved off Lisp.

akssri · on Nov 4, 2016

Ah, that's too bad :( CL is very very underrated as a language.

Any insider info on why cleartrip moved away from Lisp ?

chaitanya · on Nov 4, 2016

Since it happened after I left, I am not privy to the exact reasons for the move. However, I guess they were worried about their Lispers moving away (which to some degree had already happened) and them not being able to find new ones.

Silhouette · on Nov 4, 2016

Some of this looks like horrible advice, particularly the defeatist attitude towards what the article calls "programmer errors". Statements to the effect that you can never anticipate or handle a logic error sensibly so the only thing you should ever do is crash immediately are hard to take seriously in 2016. What about auto-saving recovery data first? Logging diagnostic information? Restarting essential services in embedded systems with limited interactivity? This article basically dismisses decades of lessons learned in defensive programming with an argument about as sophisticated as "It's too hard, we should all just give up".

As others have already mentioned, much of the rest is quite specific to Node/JS, and many of the issues raised there could alternatively be solved by simply choosing a better programming language and tools. The degree to which JS has overcomplicated some of these issues is mind-boggling.

alkonaut · on Nov 4, 2016

> What about auto-saving recovering data?

Basically the argument is that once you reach a logic error (e.g. NullReferenceException, IndexOutOfBounds etc) you already potentially corrupted the application state, so using any part of the application state is dangerous, and saving it to be used once the program has been restarted makes it worse - then you load the corrupted state into your restarted program. So while saving data is prudent - it should be done at regular intervals so that after a logic/programmer error is detected, the program can reload saved data from before the error occurred, not after.

One can also imagine having nested "top level" handlers for the various contexts where errors in one type of context is not as serious as others. Example: in a graphical application an exception arising from a mistake in UI code does not affect the "document" the user has open, so might be possible to "handle" this error by simply reinitializing the UI and reloading the active document (since we know the active document). An exception due to a logic error thrown during a transaction on the document on the other hand should probably be considered corrupting, so the application must try to reload some document state from earlier instead. If there is no state then the correct thing to do is to tear down the application even if it means losing the document. It's better to lose the work and let the start over, than allow the user to continue working with data he isn't aware is corrupt.

nostrademons · on Nov 4, 2016

The "let it crash" philosophy assumes that there is some external system monitoring & restarting the program that crashes. They mention this explicitly in the article, but it's worth repeating: you need this external system anyway. Your program may stop executing for all sorts of reasons other than a bug in your program, from bugs in your dependencies to uncaught errors to infinite loops to cosmic rays to someone tripping over the power cord to an earthquake destroying the entire U.S. west coast. Your distributed system needs to handle these as operational errors, and in extreme cases you might not even have power available for 1000 miles; there is no possible way that a single process could recover from that.

They also recommend configuring Node to dump core on programmer error, which includes (literally) all of the diagnostic information available on the server.

spc476 · on Nov 4, 2016

What about auto-saving recovering data?

It really depends upon the language and environment used. I work with C (almost legacy code at this point), and if the program generates a segfault, there is no way to safely store any data (for all I know, it could have been trying to auto-save recovery data when it happened). About the best I can hope for is that it shows itself during testing but hey, things slip into production (last time that happened in an asynchronous, event driven C program, the programmer maintaining the code violated an unstated assumption by the initial developer (who was no longer with the company) and program go boom in production). At that point, the program is automatically restarted, and I get to pour through a core dump to figure out the problem.

I'm not a fan of defensive programming as it can hide an obvious bug for a long time (I consider it a Good Thing that the program crashed otherwise we might have gone months, or even years, with noticing the actual bug).

Logging is an art. Too little, and it's hard to diagnose. Too much and it's hard to slog through. There's also the possibility that you don't log the right information. I've had to go back and amend logging statements when something didn't parse right (okay, what are our customers sending us now? Oh nice! The logs don't show the data that didn't parse---the things you don't think about when coding).

And then there are the monumental screw-ups that no one foresaw the consequences of. Again, at work, we receive messages on service S, which transforms and forwards the request to service T, which queries service E. T also sends continuous queries (a fixed query we aren't charged for [1]) to E to make sure it's up. Someone, somewhere, removed the fixed query from E. When the fixed query to E returned "not found," the code in T was written in such a way that failed to distinguish "not found" with "timedout" (because that fixed query should never have been deleted, right?) and thus, T shut down (because it had nothing to query), which in turn shut down S (because it had nothing to send the data to), which in turn meant many people were called ...

Then there was the routing error which caused our network traffic to be three times higher than expected and misrouted UDP replies ...

Error handling and reporting is hard. Maybe not cache invalidation and naming things hard, but hard none-the-less.

[1] Enterprise system here.

visionscaper · on Nov 4, 2016

> I'm not a fan of defensive programming as it can hide an obvious bug for a long time (I consider it a Good Thing that the program crashed otherwise we might have gone months, or even years, with noticing the actual bug).

Not when you do it the right way! You should only mitigate unexpected situations if you also log it, monitor it and handle it with error callback etc.

Also see my other comment in this thread : https://news.ycombinator.com/item?id=12871541

Kristine1975 · on Nov 4, 2016

>and if the program generates a segfault, there is no way to safely store any data

FWIW Inkscape tries to save the current document to (IIRC) the user's home directory, displays a message to tell the user about it and quits.

MaulingMonkey · on Nov 4, 2016

> I'm not a fan of defensive programming as it can hide an obvious bug for a long time (I consider it a Good Thing that the program crashed otherwise we might have gone months, or even years, with noticing the actual bug).

I've had segfaults "hidden" for a long time because my artist coworkers weren't reporting crashes in their tools. They assumed a 5 minute fix was something really complicated. Non-defensive programming is no panacea here. Worse, non-defensive programming often meant crashes well after the initial problem anyways, when all sane context was lost.

My takeaway here is that I need to automatically collect crashes - and other failures - instead of relying on end users to report the problem. This is entirely compatible with defensive programming - right now I'm looking at sentry.io and it's competitors (and what I might consider rolling myself) to hook up as a reporting back end for yet another assertion library (since none of them bother with C++ bindings.) On a previous codebase, we had an assert-ish macro:

  ..._CHECKFAIL( precondition, description, onPreconditionFailed );

Which let code like this (to invent a very bad example) not fatally crash:

  ..._CHECKFAIL( texture, "Corrupt or missing texture - failed to load [" << texturePath << "]", return PlaceholderTexture() );
  return texture;

Instead of giving me a crash deep in my rendering pipeline minutes after loading with no context as to what texture might be missing. Make it annoying as a crash in your internal builds and it will be triaged as a crash. Or even more severely, possibly, if simply hitting the assert automatically opens a bug in your DB and assigns your leads/managers to triage it and CCs QA, whoever committed last, and everyone who reviewed last commit ;)

> Logging is an art.

You're right, and it's hard. However. It's very easy to do better than not logging at all.

And I think something similar applies to defensive programming. You want null to crash your program? Do so explicitly, maybe with an error message describing what assumption was violated, preferably in release too instead of adding a possible security vulnerability to your codebase: http://blog.llvm.org/2011/05/what-every-c-programmer-should-... . Basically, always enabled fatal asserts.

This might even be a bit easier than logging - it's hard to pack too much information into a fatal assert. After all, there's only going to be one of them per run.

zeeg · on Nov 4, 2016

Please, please, don't roll your own. It seems like an easy problem at a glance, but its far from it. The more fragmentation in these communities the worse off we all are. Sentry's totally open source, and we have generous free tiers on the hosted platform. Happy to talk more about this in detail, but if there's things you dont feel are being solved let us know.

MaulingMonkey · on Nov 4, 2016

> Please, please, don't roll your own. It seems like an easy problem at a glance, but its far from it. The more fragmentation in these communities the worse off we all are.

I've rolled my own before, for enough of the pieces involved here, to confirm you're entirely correct. There's a reason I'm looking at your tech ;)

> Happy to talk more about this in detail, but if there's things you dont feel are being solved let us know.

No mature/official C or C++ SDK. Built in support for native Windows and Android callstacks would be great - I see you've already done some work for handling OS X symbols inside the Cocoa bindings at least. Plus hooks to let me integrate my own callstack collection for other platforms you haven't signed the NDAs for (e.g. consoles) and whatever scripting languages we've embedded.

All the edge cases. I want to receive events:

* When my event reports a bug in my connection loss handling logic (requiring resending it later when the connection is restored.)

* When my event reports I've run out of file handles (requiring preopening files or thoroughly testing the error handling.)

* When I run out of memory (requiring preallocating - and probably reserving some memory to free in case writing a file or socket tries to allocate...)

* When I've detected memory corruption.

* When I've detected a deadlock.

Some of these will be project specific - because it's such an impossibly broad topic that sentry's SDKs can't possibly handle them all.

No hard crash collection - this might be considered outside of sentry.io's scope, though? It's also hideously platform specific to the point where some of the tools will be covered by console NDAs again. Even on windows it's fiddly as heck - I've seen the entire pipeline of configuring registry keys to save .mdmp s, using scripts to use ngen to create symbols for the unique-per-machine mscorlib.ni.dll and company - so you can resolve crashdumps with mixed C++/C# callstacks - and then using cdb to resolve the same callstack in multiple ways... it's a mess. I could still use the JSON API to report crash summaries, though.

On a less negative note, I see breadcrumbs support landed in unstable for the C# SDK.

EDIT: And then there's all the fiddly nice-to-haves, ease-of-use shorcuts, local error reporting, etc. - some of which will also be project specific - but rest assured, the last thing I want to do is retread the same ground that sentry.io already covers. And where there are gaps, pull requests are one of the easier options...

sjellis · on Nov 4, 2016

At work, we regard exception collecting as essential for both development and production - if an application reaches internal QA, it's already reporting to an exception collector. This is separate to whatever logging is going on.

Sentry.io is one of the services that we use, but I don't have any connection beyond being a customer. I would echo the sentiment about not rolling your own, though: you want your exception collector to be a thoroughly battle-tested bit of code, and if it's reporting to a remote service, you want that to be as separate as possible from the application infrastructure, and extremely reliable.

woodruffw · on Nov 4, 2016

The main conclusion I drew from this is that node.js has three "standard" ways to return/propagate an error, along with "traditional" methods (return code, global errno, etc).

What's the deal? To someone who programs primarily in C and Ruby, this feels like a tremendous complication of the normal programming process.

treve · on Nov 4, 2016

Part of the problem in JS is that there's pretty much 2 classes of functions. Asynchronous functions and synchronous functions. Both are extremely common.

Async/await solves this to some extent, because you can just go back to 1 way of error handling, which is throwing and catching exceptions.

The third way (working with EventEmitter) is an odd pattern, but it's really more for specialized use-cases. Wouldn't really call this standard. Imagine a long-running operation that can occasionally broadcast that a non-fatal error occurred.

A global error number is a terrible idea, and return codes are not just not idiomatic.

So really there's just two: one for synchronous and one for asynchronous operations.

You'd be in a very similar situation with C. I don't know C too well, but I imagine that most asynchronous operations would be done with threads, and for those operations you also can't just return an error code.

Does Ruby have concurrency or async primitives? I don't know it really well. If it doesn't, it's also obvious why you wouldn't have this problem. If it does, how do you handle exceptions in asynchronous operations? To me it seems that Javascript, Ruby, C, PHP, Java are all pretty similar in these regards and JS is not at all unique.

Go gets this right. The equivalent of this ES7 function call in javascript:

await foo();

In go is a straight up regular function call:

foo();

But not waiting for the result in javascript:

foo();

Is actually handled with the go keyword:

go func();

This, to me, is the major difference in the asynchronous model between Go and Javascript. In javascript (with ES7) blocking is opt-in, in Go it's opt-out. Go is by far the saner model for a programming language that relies heavily on 'green threads' / reactor pattern.

prodigal_erik · on Nov 4, 2016

The trouble with "go foo()" is that it's fire-and-forget; foo's return value is literally discarded. When you need to know what happened (which should be nearly always), foo and every caller all have to opt-in to passing any result and/or error and/or panic value over a channel or something. It's one of many places where Go gives you tiny pieces of the right thing and makes you assemble them yourself.

deoxxa · on Nov 4, 2016

Either that or you wrap it up in function that makes a channel, calls the function with it, then waits on that channel for the return value. Basically you can go back/forth between async and sync(ish) in go much more easily than in JavaScript.

In saying that though, if you have to do it a lot it probably means some of those functions should have been synchronous in the first place.

prodigal_erik · on Nov 4, 2016

Node runs all your Javascript on a single thread so it strongly discourages writing f(g(x)) if g is slow or expensive. Instead you write g(x, f) in continuation-passing style so the framework can start g (send a request or whatever), give up control, and do something useful when the result is available (a response arrives or whatever). But if g fails, either f has to expect to receive an error object that came from g, or you need some glue that checks that g succeeded before invoking f.

Eventually Javascript will probably let you write f(await g(x)) and transform one async function into a chain of Promises and continuation functions (await will throw if g fails), but it's not yet a standard part of the language and not everyone wants to preprocess this experimental dialect into something Node can run today.

lsm · on Nov 4, 2016

I think the main confusion is the callback thing as you can feed a function to get the returned result or emit an event and using another function somewhere else to receive that result. Javascript was designed for browser (GUI) which means its main job was to handle users' actions. So the event system was invented to decouple the "triggers" from the "actions". You can implement the exactly same paradigm in any other languages (actually a lot of GUI SDKs have equivalent facility). And if treat the callback as the return in other languages then everything would be clear.

The conclusion is 1. Throw an exception (which will stop the programm) if it's a programming error. 2. Handle it in place using callback just like you return the error code in other language if that's an operational error (e.g. user didn't input password when login). Emit an error event is really a special case of the callback when you want to handle error somewhere else (maybe globally).

So I think the article is more about when to "throw an exception" vs "return the error" if you take out those javascript special juices.

Illniyar · on Nov 4, 2016

Why the suggestion to use an error's name rather then instaneof and the error's class?

prodigal_erik · on Nov 4, 2016

Error.prototype.toString() reads e.name, not e.prototype.constructor.name, so you can't rely on everyone to have subclassed Error.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Illniyar · on Nov 4, 2016

I'm not following, why can't I use:

   e instanceof Error

or:

   e instanceof MyError

why does toString() have anything to do with this?

prodigal_erik · on Nov 4, 2016

It's very likely someone did

  const e = new Error('bad stuff happened')
  e.name = 'MyError'

without actually creating a MyError class to check with instanceof.

Illniyar · on Nov 4, 2016

Unless it's common in popular libraries/packages, I don't see why I need to take it into account.

Which popular libraries do this?

If it's just in a few places, it should be handled specifically, and use sane choices in other places.

pags · on Nov 4, 2016

This is what I do when working with my own Error types.

cutler · on Nov 4, 2016

It beats me why Node.js is anywhere near as popular as Elixir if real concurrency and error handling are a priority. Is programming just a fashion industry? What's popular certainly doesn't seem to have any connection with engineering principles.

romanovcode · on Nov 4, 2016

I blame non-technical managers who push "microservices" and "node js" because they went to some conference and heard that it's "the best".

jeffmcmahan · on Nov 4, 2016

If you're going to check every argument's type and throw on failure, either use a statically typed language or adopt a concise way of type checking. Many of the examples have big groups of assert() calls at the top. Gross.

morghus · on Nov 4, 2016

Can anyone explain why this pattern doesn't work? Or point me to some resource?

  function myApiFunc(callback)
  {
    /*
     * This pattern does NOT work!
     */
    try {
      doSomeAsynchronousOperation(function (err) {
        if (err)
          throw (err);
        /* continue as normal */
      });
    } catch (ex) {
      callback(ex);
    }
  }

iotscale · on Nov 4, 2016

Try/catch is not async and exceptions do not bubble up through the async context, which makes sense as the caller moved on with execution. Try this in your console:

    try {
        console.log("see, ");
        setTimeout(() => {throw new Error("oops")}, 100);
        console.log(
            "I can't assume here that"
            + " the prev. line succeeded"
        );
    } catch(e) {
        console.log("error!");
    }

mikevin · on Nov 4, 2016

There's a footnote about this: https://www.joyent.com/node-js/production/design/errors#fn:1

morghus · on Nov 4, 2016

much obliged

novaleaf · on Nov 4, 2016

Forcing the party line of callback hell as a high quality "Production Practice" is an incredible disservice by not introducing the user to the concept of Promises. It already assumes a basic knowledge of exception handling, so at least they should hint at what is a saner choice.

djanowski · on Nov 4, 2016

I'm surprised such an in-depth article doesn't even mention promises. Upcoming async/await (already available via transpilation) will make error handling in Node sane again.

visionscaper · on Nov 4, 2016

This article promotes the fail-fast approach, something I very much dislike (against popular opinion it seems).

I'm very much in favor of the opposite approach, defensive coding. Often when I read opinion pieces about how bad defensive coding is, they almost always seem to forget that defensive coding without proper logging, error-handling and monitoring is NOT defensive coding. It is extremely dangerous to just detect error conditions without any feedback: you have no idea what is going on in your system!

IMHO properly applied defensive coding, works as follows:

* Detect inconsistent situations (e.g. in a method, expected an object as input argument, but got a null)

* Log this as an error and provides feedback to the caller of the method that the operation failed (e.g. through an error callback).

* The caller can then do anything to recover, (e.g. reset a state, or move to some sort of error state, close a file or connection, etc.).

* The caller should then also provide feedback to its caller, etc. etc.

This programming methodology gives the following advantages:

* You are made to think about the different problems that can occur and how you should recover them (or not)

* Highly semantic feedback about what is going wrong when an issue occurs; this makes it very easy to pinpoint issues and fix them

* Server application keeps on running to handle other requests, or can be gracefully shut down.

* Client side application UIs don’t break, user is kept in the loop about what is happening

Of course you will need to keep a safety net to catch uncaught exceptions, properly logging and monitoring them (and restart your application if relevant)

The fail-fast approach, as I have seen it applied, doesn’t do any checking or mitigation, with the effect that:

- you are thrown out of you normal execution path, losing a lot of context to do any mitigation (close a file, close a connection, tell a caller something went wrong)

- you only get a stack trace from which it can be hard to figure out what went wrong

- there can be big impact on user experience : UIs can stop working, servers that stop responding (for all users).

I have very good experiences with using the defensive coding paradigm, but it takes more work to do it right; for many, especially in the communities that use dynamic typing, such as the JS community, this seems to be a too big a hurdle to take. This is unfortunate because it IMO it could greatly improve software quality.

Any feedback is welcome!

(Edit: formatting to improve readability) (Edit: clarified defensive coding as an opposite approach to fail-fast)

Silhouette · on Nov 4, 2016

FWIW, I wouldn't suggest the term "defensive coding" as the opposite to "fail fast". It's very similar to the established term "defensive programming", which IMHO is more about designing systems to make fewer assumptions. How you then handle a situation where you do detect that some expectation has not been met, including the fail-fast strategy, seems like a related but separate issue.

Terminology aside, though, I agree with much of what you say. The idea that it's generally acceptable for buggy code to just crash out seems to be making an unwelcome return recently, often among the same kinds of developers who don't like big design up front or formal software architecture because they want everything to be done incrementally and organically, and in the case of web apps specifically, often among developers who also consider code that runs for a year or two to be long-lived anyway.

visionscaper · on Nov 4, 2016

With defensive coding I indeed meant defensive programming. You always want to fail fast (faster also means that you can fix more bugs), but this is often interpreted as "fail hard": no prevention or mitigation what so ever. In this sense I meant it is the opposite of defensive programming.

What I notice is that developers who also have a background in statically typed (system) languages, are much more disciplined when it comes to defensive programming and logging/error handling. (I'm afraid this also correlates with age).

BTW, I like your description, "designing systems to make fewer assumptions", for defensive programming!

basicplus2 · on Nov 4, 2016

On error goto...

prodigal_erik · on Nov 4, 2016

This isn't stuff every programmer should know, it only concerns people who are trying to write complex non-blocking Javascript without async/await (which are already implemented in Babel and proposed for ES7). It also focuses on Node-only idioms which IMHO should be deprecated in favor of ES6 Promises (which Node's LTS release supports natively!)

Illniyar · on Nov 4, 2016

The title of the post is just "error handling in node.js".

Probably should change the submission title.

logicallee · on Nov 4, 2016

if you're not writing complex non-blocking Javascript in 2016, what are you even doing with your life? /s

shark0 · on Nov 4, 2016

This is a joke

wyager · on Nov 4, 2016

I'm not really sure how to phrase this constructively, but this is horrible. Not the article, just the fact that humans expose themselves to this sort of stuff. Why would you choose to use a language that makes something as mundane as error handling this ridiculous and unpleasant?

Illniyar · on Nov 4, 2016

There is nothing mundane about error handling, in fact it's one of the hardest things to get right in a programming language (see Rust's error handling saga for instance).

There is no language I know of where error handling is both simple and not overbearing.

alkonaut · on Nov 4, 2016

The most annoying thing about all this is that the central argument of this article "Separate recoverable errors from bugs" never made it to a widely used imperative language. C# had the opportunity but blew it.

Java mixed the two kinds of exceptions up completely and checked exceptions just added insult to that injury.

The best implementation I have seen for an imperative language is in Midori (The language used in Microsofts research OS with the same name).

http://joeduffyblog.com/2016/02/07/the-error-model/#bugs-are...

It's basically "C# done right". The blog post is well worth reading.

Silhouette · on Nov 4, 2016

That the blog post is indeed very interesting reading.