Archive for 2009/09/14

Let it crash (the right way…)

I do a lot of useless stuff on the interwebz; Like facebooking, twittering, blogging, redditing etc… but once in a while do some really useful stuff as well E.g.,, /r/programming/ + many other things… and among all this stuff I read a lot of programming articles and blogs. The latest one I’ve come across which is pretty interesting (even though it is basically the same record spinning) is this podcast interview with Joe Armstrong by SE Radio.

I have at several occasions spoken to Joe, even if he probably doesn’t remember, hehe. As they say in Sweden; “Alla känner apan men apan känner ingen” which means “Everyone knows the monkey but the monkey doesn’t know anyone” and I know his speeches/rhetoric and this one was not very different. However, this time he mentioned something that made me remember a very common misunderstanding that I’ve see in the years I have been doing Erlang consulting, namely “Letting everything crash” in the name of “Let it crash”-philosophy. Allow me to elaborate:

Joe mentions the following things which I have learnt and is constantly advising others to follow: “only program the happy case, what the specification says the task is supposed to do”, when talking about “happy case programming”. He goes on saying “when writing code from a specification, the specification says what the code is supposed to do, it does not tell you what you’re supposed to do if the real world situation deviates from the specification” and finally “so what do the programmers do.. they take ad hoc decisions” when talking about what the programmers do when they program defensively in order to solve this problem. And even though Joe sometimes have some strange ideas mostly he makes a lot of sense and the topic of defensive programming and “Let it crash” philosophy did hit the spot.

Personally, when ever I make a branch in my code (not like Git branches, but more like ifs and cases) to handle an “error” or “fault” then I always ask myself one single question, which is what this whole thing boils down to;

Do I know how I’m supposed to handle an error?

The answer is simple; Yes -> Implement, No -> Don’t implement, Let it crash. Well, let me take some of that back… the world is rarely that naive, there are a few things you need to consider but in the end it is all dependent on your spefication. Like Joe said, people take ad hoc decisions and I keep asking myself why? Why do people insist on wrapping many of their function calls in case-clauses, match on error and then return the same error?! I understand it if it says in the specification that this is what’s expected but not otherwise. If the answer to the above question is No then the first follow up question should logically be; “Well, should I know?” and then define it in the specification. The “defensive” part of the code is not that you are matching on the error (because your specification might say that you must!), the “defensive” part is that you are matching on the error because you don’t know what to do.

Here are a few other questions I ask myself when facing a case or if or similar (like a function clause to “handle” all):

  • Do I know how I’m supposed to handle an error here?
  • If not, then should I handle it? (Thus going back to specification)
  • If something goes wrong in this function call, can I continue? E.g. Line 2 depends on a file descriptor from line 1
  • When I restart, can I recover from the crash here? If not then it is a good indication that you need a specification for how to handle the error. Can I reset my state? Do I need to clean up? etc are also good questions to ask yourself
  • Am I crashing in a valid place? E.g. you should never crash (intentionally) inside a gen_server unless it is some kind of a worker process. A “main” process shouldn’t really be allowed to crash in the same sense as a worker process can. A worker process, say for an HTTP request, shouldn’t really be very defensive (perhaps a top-level try-catch to return some useful error)

I’m far from writing truly “happy case programming” but I’m going there, it is unbelievable how many LOC you save just on seeing things more postive (Note to self: Perhaps this applies to real life; More positive attitude == Less unnecessary brain activity :)). An important note to all these fancy philosophies is that you need to design your software to be ready to let stuff crash, if anyone says to you that it is easy then they are lying… the only difference is that Erlang makes it possible which is a huge advantage.

Still today I give myself a virtual slap on the wrist inside my mind when ever I type something like:

handle_call(action, _From, State) ->
    %% ... do something
    {reply, ok, State};
hande_call(Cmd, From, State) ->
    io:format("Unhandled: ~p From: ~p ~n",[Cmd, From]),
    {reply, ok, State}.

Why do I put the “catch-all-calls” clause there? Where would this call come from anyway? And if I get one, why am I not aware of it?!?

Categories: Erlang Tags: ,