Great work Loius! These essays should become a little book some day.
> This is called intentional programming.
I am stealing that phrase. Often when trying to explain the coding style in Erlang, I kind of describe what you don't do "don't handle error" in band, but this makes it into a positive prescription -- do what the intent of the code is.
And note that these things are very easy and straight forward in Erlang. Erlang is one of the few language run-time built with this in mind. Fault tolerance was at the top of the todo list. That is what makes it stand out of the crowd.
But if you want, you can still copy this pattern in our system. For example in Python use a green thread and a queue to emulate an actor. If an exception is thrown (and you don't use linked custom C modules, which will screw you over in Erlang as well!), signal a supervisor thread and let it restart the original thread.
You can apply this to a large system. Use OS processes. This is the good 'ol Unix way. Build watch dogs that watch your processes and restart them on failure. But now you'll be building also the messaging system. So there is some work you must do. But you can if you want to.
You know how they say "learn functional programming because it will help you program better". Well the same can be said about Actor and Fault Tolerant programming. Learn it because it will help you program better. Even if you don't end up using Erlang.
You, the author, and Joe Armstrong may confuse some of us. I know "intentional programming" from Charles Simonyi[1] and I suspect the two uses are completely unrelated.
On the other hand, this use is more useful and in my opinion, better.
The OP's concept of "data flow" is also unrelated to how that term in usually used. (I think he is using it to refer to pattern matching when the match can fail.)
> Build watch dogs that watch your processes and restart them on failure.
I tried to think about watchdog and processes before and concluded that this isn't easy because it only works well if your process is an event loop, otherwise no: it's quite difficult to know if a multi-threaded process is working or is failed..
Excellent writing; it really makes me want to pick up some Erlang at some point. It's also giving me some interesting ideas in regards to the language I'm currently working on.
I'll also echo a sentiment brought up by another poster which is that the Go style of error handling seems really unattractive. I'm not sure I'm totally on-board with the Erlang style of "just let it die" (although it seems to work great for Erlang), but being forced to deal with exceptions right away in all cases seems like it would clutter your code and cause a lot of headache. I'm also curious as to what happens if you neglect to handle the error? If you wrote something like
What would happen if an error occurred? Would `file` be nil? If so, wouldn't that easily propagate up, even to functions which, on their surface, shouldn't be expected to fail?
First of all, that wouldn't work in go, it would complain that you don't use the err variable, you could change it to:
file, _ := os.Open("file.go")
The underscore tells the compiler to ignore the error. And yes, file is <nil> now!
As a sidenote, it would be more idiomatic to do this:
if _, err := os.Open("file.go"); err != nil {
//handle error
}
// Do stuff with file
This has bitten me before - if you use a non-existent key on a map of strings (or dictionary, or hash, however you call it) you get "" and an error back.
my_hash := make(map[string]string)
value, ok := my_hash["no_key"]
value is "", ok is 'false'.
What type you get back exactly from a bad request depends on the type of call, but you always get an error back! Maps storing ints return 0, maps storing strings return "", etc.
Go always forces you to look at each error, since errors are always different.
Erlang's "let it die" philosophy is surrounded by other support structures that make it work. It works better in that language than it would in most others. I do find that my production-quality Erlang code does somewhat often have to catch errors so it can properly log them, though the relatively recent addition of line numbers is a big help there. (Prior to that, you could end up with a "badmatch" identified only by what function it occurred in. This is not necessarily a problem, but when you see dozens of them per minute filling up your log it made it tricky to figure out what was really going wrong so you could fix it. Even with line numbers it's not always quite enough, though.)
In Go, for that code snippet you'd be looking at one of two possibilities, looking at it a bit more generalized than just the specific Open call since I think this is a philosophical question rather than a literal "what does this code literally do?" question. Either "file" would come back nil, in which case the return line would crash with a nil access, or you'd get something that would fail to work correctly. Probably the zero value for the type, but only probably; it would be syntactically legal for the function to half-construct a value and end up returning it to you, expecting you to pick up the err value and know not to use it. I actually don't know what the common practice is here... obviously when I put it that way, returning a half-constructed object is not a good idea, but I don't know in practice whether many things do that, because you always should deal with the error. There is at least one call in the shipped library that can return both an error and a result, which is the Reader interface's Read call: http://golang.org/pkg/io/#Reader It is legal (though not mandatory) to both return that some bytes were read, and that we've hit the EOF. (It is mandatory that the next call will return only EOF as the error in that case.)
While Erlang and Go are at the opposite ends of the spectrum on error handling, and I think it's easy for either side to look at the other and think it's crazy, but I also think they're both solid ways of operating; in practice I find I produce similar levels of reliability with similar amounts of cognitive effort in both regimes. It's one of those cases where the extremes are viable, and it's the options in the middle that suck. In Erlang, you don't worry much about errors, the system is built around that, and the fact that "=" is not assignment but pattern-match-and-bind makes:
{ok, Value} = complicated_function()
into an efficient assertion that the function correctly completed and the correct thing is in Value xor I crashed. It makes a naive (in the computer science sense, not the insulting one), "sweep it under the rug" approach to errors do what it should. This is a huge positive... making simple code that anyone can write do the right thing is a very powerful approach, and failing to consider this is one of the Great Software Engineering Sins, in my opinion.
On the Go side, it mostly sticks errors in your face and makes damn sure you do something sensible with them. It's not perfect (it's possible to ignore an err... "someday" I'd like to build a go vet extension to help check for this), but it mostly doesn't let you ignore errors and write idiomatic code. Sure, it's more code than it needs to be, but, then, Erlang is not exactly a master of concision either (the number of multi-ugly-line Erlang functions that I could collapse down to one line of Haskell, sigh). This also works.
Partisans can have fruitful debates about the pros and cons of each. What doesn't work is making it so that naively-written code ends up completely ignoring errors and just then just skipping dealing with them. That said, honestly, nowadays I can only really think of C that still works that way, though. Most everything else has some solution to this problem (exceptions, mostly).
I have imagine (for my toy language) that is weird to force one way or another (always handle or always check), but instead use the unix idea of STDOUT/STDERR.
A function return values in STDOUT, and exceptions always go to STDERR:
This is GO style:
file | maybeError? := open('thisnotexist.ever')
And is totally ok to say, python style:
file := open('thisnotexist.ever')
and upper in the chain is possible to catch it:
match MyAwesomeFunction():
StdIn:
StdErr:
Or maybe
try/catch.
The idea is that if stderr is consumed is handled right there, but if not it "throw" and work like a exception.
Coding towards a single flow path and crashing whenever you go off that path has been the hardest and most enjoyable adjustment I've made as I've been learning the Erlang VM (through Elixir). It's definitely weird at first, but very refreshing when you get to leave out all those if/elses and exception handling. I'm still quite bad at Erlang/Elixir, but I'm enjoying it a lot more.
I find myself writing libraries (like xmerlrpc for XML-RPC). The rule of "just let it die" doesn't apply too much in such code.
You typically don't want to decide to die in library code. You want to postpone the decision until the actual application/service is written.
It's easy to convert error reported by returned value to error reported by exception: just cause badmatch with guard. But to convert an exception to a value is more troublesome. You need to use try..catch.
To properly catch errors in library, one could either use global try..catch hidden in the library code or be much more precise on errors and carefully intercept all the errors that could happen. The latter approach gives an opportunity to give very specific error messages and allows hypothetical bugs in code used by the library to bubble up instead of being disguised as "invalid argument".
As the article says, there are cases when rule of a thumb needs to be broken. Writing a library is such case.
The idiomatic erlang way to deal with such problems is to generate an exception so that the caller can deal with the problem at the next level up that is set up to handle trouble. This could be the level immediately above the library, but conceivably it is much higher up, or the (erlang)process may in fact terminate due to an uncaught exception.
The decision to bail or try to recover should be made at the lowest level capable of making that decision and libraries are simply not empowered to make that decision. You also don't want to force people to catch exceptions around every call to every function in the library.
Is it? A lot of the library code I've seen in Erlang (especially in the standard library) is to avoid exceptions, and instead return either {ok, Val}, or {error, Reason}.
Edit: To clarify tone, I actually would like to know, because I've run into this exact dilemma before. Per the original poster, it seems bad form to throw an exception that has to be handled somewhere in the calling process if I want to be able to send that data to another process. That is, process A makes a library call, cares nothing about the return, just wants to pass it to process B. If it's an error tuple, it can just do that. If it's an exception, it has to explicitly handle it, wrap it in an error tuple, and pass ~that~ along instead, which seems inelegant from library code. It also prevents the library from being able to declare what it returns in the event of a problem via a Dialyzer spec; an error tuple you can fully enumerate what sorts of errors you can return.
Sometimes Erlang standard library code is not the prototypical Erlang development code. For example file_open() is a library function returning an error tuple since it is one of the obvious results file_open() will return.
Now in your code it depends on what you expect or what your guarantees are. If this is a library that expects to open a config file to that is always there, well then a match is better and if it doesn't work an exception is thrown, report written and maybe your process gets restarted.
But now imagine your are writing a configuration file parser. Now you code acts like a library so the input file not being there is a common case and maybe code above needs to decide if it should blow up or not on it.
Anyway that is my amateurish understanding of it. Maybe someone who knows more can chime in and correct me.
On a tangent line, one thing I dislike about Go is the error handling. I find it really annoying to have to check for errors at every single step. Erlang has probably the most elegant error handling mechanism. Just let if fail, and then restart the whole process/subsystem.
You're going to have a hard time maintaining consistent state but if you manage your bookkeeping very well that might work in a long lived process. If not you might end up leaking resources. Go routines and erlang processes are not mapping 1:1 with each other and I (maybe naively) assume that resources such as file descriptors and other subtle state modifications could easily survive a naive implementation of such a scheme causing eventual resource depletion.
That is true, and I suppose the fact that third party go libraries won't be written with this philosophy in mind means that you would have to audit all third party library use pretty carefully, which would make using this approach on projects of significant size untenable.
Defers are still called when the stack is unwound due to a panic. It's standard practise to clean up resources with a defer, so maybe it wouldn't be too unreasonable if you're mainly dealing with the standard libraries.
Would you bet your phone switchboard, nuclear plant or assembly line to that strategy?
Go is very much a work in development, it is a vast improvement over C but I highly doubt that this strategy will get you out of every corner case. It might get you from 'crash now' to 'crash a (little) while later' but I think you will still end up having an unpredictable element in there. It all depends on how ugly the crash is and once things become unpredictable leakage (even between go-routines) is not to be ruled out categorically.
Think of erlang processes to be about as well isolated as unix processes and think of go processes to be a bit more isolated than unix threads but not much more. The trick here is that erlang is essentially an OS inside a process and that go-routines are co-operative multi-tasking inside a process aided by one or more cpu threads. That's a lot closer to the C multi-threading model than erlang and that implies there are some risks.
It's worth noting that Erlang is only a soft-realtime language. I wouldn't bet my nuclear plant and only some assembly lines on it. (There's plenty I do bet on Erlang, though.)
Soft is the new hard ;-) If you reason about hard realtime you pretty soon end up with the question: how high can the probability be that it misses a deadline. If you run on modern embedded CPU's with pipelines and caches its often all you can do (I am aware that some are modeling the CPU with caches, memory and everything to achieve 100% but thats very expensive and if you switch CPU's there is half a year worth of modeling down the drain)
The real world hard realtime systems often are built like: lets test it (with deadline miss triggering a failure) and if it works make sure we still have 10% of safety margin (which will later be melted down because of features ;-)
That's valid, hard real time problems require guarantees that erlang can't give, that was an unwarranted exaggeration on my part. But anything that needs to be very long running (years or more) will need ironclad guarantees that it won't be leaking resources and I can't prove but suspect that erlang will do a lot better than go for those applications. Let's leave the nuclear plants and the assembly lines to RTos and QnX then :).
"require guarantees that erlang can't give" ... at the moment, working on it.
In the meantime note that the percentage of stuff that really needs hard realtime in many systems is quite low. What I do at the moment is run Erlang on RTEMS (see my other posts) keep the hard realtime parts as simple as possible (always a good idea), write them in C and run them on a higher prio than the Erlang runtime which handles all the complicated things. Works like a breeze in practice
I have a project planned which when funded will give us hard realtime guarantees for certain Erlang processes.
This can only work of course if the underlying OS is also hard realtime capable. I have ported Erlang to the open source RTEMS http://www.rtems.org already. More details under http://www.grisp.org (sorry website not up to date but soon will be).
It's an interesting question. I think you'd need at least:
* a way to define scheduling requirements on a process level (beyond current, coarse, priority settings),
* bounded process message queues (this would be handy in soft-realtime Erlang),
* deadline accurate receives,
* accurate control over resource allocation
I've been meaning to sit down with some theory and think through this more analytically, but the above are my BART thinking-time derived set of requirements. I'm sure there are cases--some which would be obvious, in retrospect--that are not covered by the above.
You'd have to run erlang directly on the hardware without an intermediary OS and you'd have to schedule the individual erlang processes using pre-emption and by giving them priorities. You'd also have to add a ready-list per priority.
After all, as long as BEAM is running as a child process of a host os that is not hard realtime you can't guarantee much of anything.
Of course. Apologies, I assumed that was a given. A large part of the appeal of Erlang the Language is Erlang the VM and the two aren't often divorced.
You'd also need to assert every time you called a function to make sure you got what you expected. I know basically nothing about Go, but I found this that says that it doesn't even have assertions:
Great article, very much appreciate this guy and his blog as I continue to learn Erlang.
A question: the pattern of let it crash makes sense to me. However, I struggle with it when implementing RESTful web services. Letting the process crash will typically yield a 500 - the monitor on the connection process in the web server library ensures that. Clearly though, a status code and some additional information is a more appropriate response to the client. An example is returning a 400 for "missing arguments". In a contrived example of idiomatic Erlang, I feel like I'd write this:
And if the key username did not exist in Request, request:get_argument/2 would return undefined or {error, Reason}, giving me a bad match error. Or if the password didn't match, I'd get the same. In order to intercept that and return a reasonable status code, I would have to catch that in handle_request. My question, then, is am I missing a best practice on how to handle this? Or is this just the place where I do need to catch errors and process them? And if that is so, isn't it at odds with the whole concept of writing intentional code?
1. Use a finite-state-machine REST framework like webmachine or cowboy_rest. This will help you in the long run once you grok how they work.
2. Your intention here is that the user might have done something silly. Write a helper which can load arguments from the request and fail if some of them are missing. The best approach is to shuffle as much as possible into a routing layer and then let the routing layer return the 4xx responses. This only leaves up optional arguments, where an undefined option is what you want to handle. Look at how, e.g., cowboy is doing this.
You can essentially avoid all of this boilerplate, if you construct your HTTP RESTful API correctly.
db:find/3 should probably return either {ok, UserRec} or not_found. Something along the lines of
case db:find(my_db, users, #{<<"username">> => Username}) of
{ok, #{<<"password">> := Password} = UserRec}} -> respond(200, ...);
{ok, UserRec} -> respond(401, ...);
not_found -> respond(404) % or something more appropriate
end.
Or is this just the place where I do need to catch errors and process them? And if that is so, isn't it at odds with the whole concept of writing intentional code?
No it's not. The author touches on this:
Note the word intentional. In some cases, we do expect calls to fail. So we just handle it like everyone else would, but since we can emulate sum-types in Erlang, we can do better than languages with no concept of a sum-type.
So handle errors you expect explicitly and let the rest crash.
case user:authenticate(Username, Password) of
{ok, UserRecord} -> ...;
{error, notfound} -> ...;
{error, badpass} -> ...
end,
Think about the client point of view. It is much better to return "401 Unauthorized" or some other reason rather than 500 .
i would do a case on db:find and proceed to check password if you get user or fail with 401 and same with password case and if its correct go on and if not 401 :)
These are excellent points, and I have passed them along to my colleagues. There's something that a lot of these essays and tutorials about Erlang kind of gloss over that I'd like to see, though:
Ok, I let it crash. Now what? So many of them seem to stop here and think that everything is great. It isn't: if I have some kind of long-running program, like, say, a web site, I should probably do more than just happily let everything crash as the error propagates its way up the call chain (no, it doesn't do that immediately, but as each thing starts and fails again enough times, it does propagate). For instance, displaying something human readable on a UI, or sending email or something other than logging a difficult to read Erlang error.
More examples of what real world programs do after the crash, please!
Crashing in Erlang is a strategy that lets you handle unexpected errors. In your web server example it corresponds to HTTP 500. Other problems (e.g. 404) you need to handle yourself, not relying on crashing.
Ok, right, but that means that something is catching that crash and returning a 500, and not just crashing the entire web server. I'd like to see more people delve into that aspect of Erlang architecture.
with the term "error kernel", but not a lot of space is dedicated to practical examples, compared to the amount of writing dedicated to how great it is to let things crash.
In fact, it is actually crashing the whole "web server". As the whole "web server" in Erlang is an Erlang process. And you can have millions of them. Every single request is handled by its own process "web server" spawned on demand for that request. And it's still reasonably fast.
I get what you are trying to say, but: not really. Something like cowboy is an Erlang "application", and cowboy most assuredly does not crash on one bad request: it has a try/catch in order to deal with errors without tearing everything down.
Somewhere in an Erlang system, there has to be a judicious use of try/catch in order to keep things running when some kind of persistent error occurs.
You are technically correct and wrong in the same time.
Cowboy is an application, but cowboy is not the "web server". It is an application that spawns and supervises multiple "web servers". Every one of them can crash, without bringing cowboy down, because that's the way its supervisor is configured. It's called simple_one_for_one strategy.
And no, it is not mandatory to have "try/catch". Well, if you mean in the general sense, yes of course. But that is done by OTP or cowboy (I'm not sure about cowboy). What is significant though is that the programmer does not have to deal with that at all. There is a huge difference in the way you write Erlang code and PHP code. And that's the whole point of jlouis' article.
We try to not have "persistent errors". Like things that crash thousands of times. We usually crash only in processes that don't live forever. In fact we try to have most our processes as disposable.
Also, what he refers to as "kernel" is a bit hard to explain to non-erlangers. It is usually either a library or a process/server that is strictly deterministic and testable. So we don't expect to have bugs/crashes there and we almost never have. If we cannot make it this way, then we have several communicating kernels, that are again deterministic per se.
We try to have the "crashing parts" in non-persistent processes. Like the http handlers. That is usually the huge part of the code and there we can safely "let it crash". But for example we don't want to have shaky code in the parts dealing with the database. So we have something like:
1. db communicators. Can't crash. Very little code without bugs. This is one kernel
2. data converters. Usually deterministic with unit tests. Does not crash but is written in "intentional style" as jlouis calls it. If this crashes, then we have a bigger problem. We should fix the bug
3. Data dispatchers. For example, per user database lock (to a non ACID data store in our case). This is a process that does not crash (because of problems in its code). A second kernel
4. http handlers. Can crash and should crash, especially on unexpected events, for example - corrupted user input. Written in very aggressive intentional style
So what we usually do is in 4. we make sure to crash if something unexpected happens. Then 4. sends a message to 3. like "execute function A (part of 2.) with user's data, here are the arguments, we are pretty sure they are correct". 3. retrieves user data from 1., applies it (with arguments) to function A of 2., saves the result and returns a reply to 4.
Currently we don't shield 3. from errors in 2. If we have a bug in 2. we want to know about it. And especially we don't want 3. to write corrupted data, which is very possible if we catch errors from 2. So if something unexpected happens in 2., data dispatcher (3.) will crash, we see the logs and fix the bug.
And please note. All this is (except for 1.) one user only! Even if we have some persistent problem with that user, others are usually not affected. We fix the user's account and write some edge case code to fix the bug.
Cowboy uses try/catch because even with supervisors, if the child crashes enough, the supervisor will eventually crash too! LYSE talks about using try catch in the error kernel.
> db communicators. Can't crash. Very little code without bugs. This is one kernel
DB connections most certainly can go down. I do know a thing or two about that:
Great job on the write-up! I have precisely no experience with Erlang but I always enjoy a good programming article. I won't speak to Erlang's philosophies or paradigms, but I would like to respond to the bit about Go's handling of errors.
I recall Rob Pike making an argument that errors are not, or should not be, special things, and so Go doesn't treat them as such. If you don't care about an error and don't want to deal with it, simply ignore it:
image, _ = jpeg.Decode(r)
If you care about an error and need to do something to handle it (a log, a panic, or something more complicated), you can signal that to the compiler.
One advantage of this is you still have access to the returned bit of data, however mangled or incomplete it _might_ be, which can be useful in certain instances.
I'm not sure I consider Go errors "silly" unless all you're doing is a panic every few lines. In actual practice, though, I have found I can pick and choose when I care about an error, and if I do care about it, there's an action specific to that error I want to perform. Hardly silly - cumbersome perhaps.
As for your statement, "On the Go side, it mostly sticks errors in your face and makes damn sure you do something sensible with them ... " I reject the notion that in Go you are forced to deal with errors, but I also reject the notion you shouldn't do something sensible when you are dealing with them. :)
The Erlang practice of letting things fail is also SOP for experienced implementors of any large SOA system. You just let things fail. If some process getting messages from a queue can't talk to the Database, don't retry; just exit and let the supervisor deal with it.
Erlang is great for writing robust software because this type of error handling is a first-class feature of the language and runtime.
> This is called intentional programming.
I am stealing that phrase. Often when trying to explain the coding style in Erlang, I kind of describe what you don't do "don't handle error" in band, but this makes it into a positive prescription -- do what the intent of the code is.
And note that these things are very easy and straight forward in Erlang. Erlang is one of the few language run-time built with this in mind. Fault tolerance was at the top of the todo list. That is what makes it stand out of the crowd.
But if you want, you can still copy this pattern in our system. For example in Python use a green thread and a queue to emulate an actor. If an exception is thrown (and you don't use linked custom C modules, which will screw you over in Erlang as well!), signal a supervisor thread and let it restart the original thread.
You can apply this to a large system. Use OS processes. This is the good 'ol Unix way. Build watch dogs that watch your processes and restart them on failure. But now you'll be building also the messaging system. So there is some work you must do. But you can if you want to.
You know how they say "learn functional programming because it will help you program better". Well the same can be said about Actor and Fault Tolerant programming. Learn it because it will help you program better. Even if you don't end up using Erlang.