Home > Erlang > Let it crash (the right way…)

Let it crash (the right way…)

I do a lot of useless stuff on the interwebz; Like facebooking, twittering, blogging, redditing etc… but once in a while do some really useful stuff as well E.g. trapexit.org, stackoverflow.com, /r/programming/ + many other things… and among all this stuff I read a lot of programming articles and blogs. The latest one I’ve come across which is pretty interesting (even though it is basically the same record spinning) is this podcast interview with Joe Armstrong by SE Radio.

I have at several occasions spoken to Joe, even if he probably doesn’t remember, hehe. As they say in Sweden; “Alla känner apan men apan känner ingen” which means “Everyone knows the monkey but the monkey doesn’t know anyone” and I know his speeches/rhetoric and this one was not very different. However, this time he mentioned something that made me remember a very common misunderstanding that I’ve see in the years I have been doing Erlang consulting, namely “Letting everything crash” in the name of “Let it crash”-philosophy. Allow me to elaborate:

Joe mentions the following things which I have learnt and is constantly advising others to follow: “only program the happy case, what the specification says the task is supposed to do”, when talking about “happy case programming”. He goes on saying “when writing code from a specification, the specification says what the code is supposed to do, it does not tell you what you’re supposed to do if the real world situation deviates from the specification” and finally “so what do the programmers do.. they take ad hoc decisions” when talking about what the programmers do when they program defensively in order to solve this problem. And even though Joe sometimes have some strange ideas mostly he makes a lot of sense and the topic of defensive programming and “Let it crash” philosophy did hit the spot.

Personally, when ever I make a branch in my code (not like Git branches, but more like ifs and cases) to handle an “error” or “fault” then I always ask myself one single question, which is what this whole thing boils down to;

Do I know how I’m supposed to handle an error?

The answer is simple; Yes -> Implement, No -> Don’t implement, Let it crash. Well, let me take some of that back… the world is rarely that naive, there are a few things you need to consider but in the end it is all dependent on your spefication. Like Joe said, people take ad hoc decisions and I keep asking myself why? Why do people insist on wrapping many of their function calls in case-clauses, match on error and then return the same error?! I understand it if it says in the specification that this is what’s expected but not otherwise. If the answer to the above question is No then the first follow up question should logically be; “Well, should I know?” and then define it in the specification. The “defensive” part of the code is not that you are matching on the error (because your specification might say that you must!), the “defensive” part is that you are matching on the error because you don’t know what to do.

Here are a few other questions I ask myself when facing a case or if or similar (like a function clause to “handle” all):

  • Do I know how I’m supposed to handle an error here?
  • If not, then should I handle it? (Thus going back to specification)
  • If something goes wrong in this function call, can I continue? E.g. Line 2 depends on a file descriptor from line 1
  • When I restart, can I recover from the crash here? If not then it is a good indication that you need a specification for how to handle the error. Can I reset my state? Do I need to clean up? etc are also good questions to ask yourself
  • Am I crashing in a valid place? E.g. you should never crash (intentionally) inside a gen_server unless it is some kind of a worker process. A “main” process shouldn’t really be allowed to crash in the same sense as a worker process can. A worker process, say for an HTTP request, shouldn’t really be very defensive (perhaps a top-level try-catch to return some useful error)

I’m far from writing truly “happy case programming” but I’m going there, it is unbelievable how many LOC you save just on seeing things more postive (Note to self: Perhaps this applies to real life; More positive attitude == Less unnecessary brain activity :)). An important note to all these fancy philosophies is that you need to design your software to be ready to let stuff crash, if anyone says to you that it is easy then they are lying… the only difference is that Erlang makes it possible which is a huge advantage.

Still today I give myself a virtual slap on the wrist inside my mind when ever I type something like:

handle_call(action, _From, State) ->
    %% ... do something
    {reply, ok, State};
hande_call(Cmd, From, State) ->
    io:format("Unhandled: ~p From: ~p ~n",[Cmd, From]),
    {reply, ok, State}.

Why do I put the “catch-all-calls” clause there? Where would this call come from anyway? And if I get one, why am I not aware of it?!?

Categories: Erlang Tags: ,
  1. Vince
    2009/10/12 at 14:17

    Thanks for the post. I don’t code in Erlang (Python, actually). I’ve only recently got into exception handling, and your post explains how to handle exceptions (i.e. errors) well, especially the “questions I ask myself” part.

  2. Jack9
    2009/10/12 at 20:11

    I think most ppl still use wristslaps because when you’re implementing a new feature, especially when it’s an app you’ve long forgotten about.

  3. Johnny Boy
    2009/10/13 at 02:10

    In java, the same principle applies. It also makes development faster. The best thing to do when jotting down java code is to throw any checked exceptions all the way through to main. Unless you are absolutely certain you know what to do (which you never are). Finally have main throw a generalized exception.
    As you throw your test cases into the system, you can decide how to handle odd and exceptional cases.

  4. 2013/09/14 at 14:22

    Very interesting post, thank you. I really much like the idea of only handling errors you can handle as well as handle them only in that part of the code that knows what to do. Actually in safety-related (embedded) systems the traditional approach is still to use and handle (or ignore) return codes. Some colleagues and me did an evaluation regarding the applicability of “let it crash” (LiC) in safety related systems. I would like to ask you to take a look (http://bit.ly/Z-Blog_let-it-crash) and tell me what you think. Say a thank you for this great idea to Joe next time you see him 😉

  5. heXa_ger
    2015/05/05 at 10:30

    I recently discovered this for shell scripting. Every single command goes trough one function which basically does: cd into directory; check return value; eval $CMD; check return value; on error log $CMD and a “stacktrace” then exits the shellsript;

  6. kp
    2015/05/05 at 12:06
    • mazenharake
      2015/12/31 at 23:15

      Thank you kp

  1. 2009/10/13 at 04:47
  2. 2010/10/31 at 20:37
  3. 2014/01/07 at 04:40
  4. 2017/06/22 at 03:12
  5. 2019/07/03 at 07:19

Leave a comment