My New Employeer and Erlang, Java, C, C++, Python and C#

2011/02/16 3 comments

So a few days ago I started working with HiQ here in Göteborg. Looking forward to some interesting project we have talked about. It sounded like an interesting company to work for. We spoke a bit about Erlang and how Göteborg doesn’t have much use of it :) but we also spoke about more mainstream technologies like Java, C, C++, Python, C# and how they shouldn’t be a problem except frustration that they (in some fields) suck compared to Erlang. Anyway I’m interested in what is coming. Somewhere next week I’m hoping to get assigned somewhere and hopefully I’ll be allowed to say were. I really felt bad for another company I turned down because the guy I spoke to was very interesting to talk to and we had a great conversation… but I’m happy with my decision.

So about Erlang; This latest development does NOT mean that Erlang is out the window on the contrary I will now have more time for doing things I want to do myself in Erlang and I’m more motivated to do them since I’m not sitting with Erlang every day also I’m free to release everything I wish (obviously as long as it is not code written on company time and/or company equipment or for company purpose). The past week I’ve thought out a few ideas I would like to implement along with one idea I’ve had since University which actually doesn’t have anything to do with Erlang but I think would be really cool to do. More about that… well… whenever who’s waiting anyway :)

At work my initial projects will probably be in C++ which can be interesting. For me C++ is a language which I “understand” and have seen and heard a lot about but never actually used. I’ve taken a more close look at it now and it doesn’t seem that bad until I reached templates and overloading operators that’s when shit started getting weird. It’s like someone had a sadistic sense of humour and wanted to screw peoples’ mind up but then again that is what I’ve heard about Erlang syntax and functional programming in general (Clojures, Currying, Higher order functions, Lazy evaluation and recursion anyone? =)) so this will absolutely be interesting.

As for my open source projects I have plans to continue a little on Cecho and Entop because for some mysterious reason I lost interest in them for a while and didn’t touch them despite having many ideas for them. We’ll see how it goes, besides since I’m working with C++ now I was thinking about creating a small scalable 2D game server in Erlang and use C++ (or perhaps even Java) to create the clients for it; idea should be rather simple RPG like running around like Link and collect coins and killing monsters and getting exp etc… The graphics will be a stretch because I’m not very good at it but who knows.

I have so many ideas right now it is ridiculous. There are so many projects, products and gadgets I would like to try out or implement and I probably won’t have time for even 1% of them but I like that I’m not short of ideas.

This weekend though I’m doing a post on generating Mazes in Erlang based on this great blog post by a fellow named Jamis Buck. I’m also taking his tip on making it my default project for new languages which means I’ll be implementing this in Object Oriented C++ at some point. Haha! THAT should be interesting :P

Categories: Personal

From Cairo to London to Göteborg.

2011/02/10 4 comments

My 2½ years in Cairo came to a sudden end. When the protests began we simply just stayed in, we thought that maybe it will be better tomorrow. I called my wife early that Friday morning to tell her that my Vodafone account was blocked and that my Mobinil account was probably going to get blocked as well. I told her to call my parents to let them know I’d stay in the apartment all the time and that they shouldn’t worry, it was “just going to be a demonstration today”, I had even said to my colleagues that “I’ll see you on Sunday” (working day in Egypt).

When they cut the international calls, the mobile network and the Internet it started getting serious. When we saw burning cars and building, chanting and violence/looting we started to get scared. It wasn’t a nice end to an otherwise so great part of my life. From the 19th floor apartment we could see far into the city and the end of the 6th of October bridge which leads into Tahrir square, and we saw everything burning and smoke coming from at least a dozen places.

At around 00:30 local time Friday, I managed to get an international call through calling my parents. After a discussion with my father we decided I’d stay put and see how the next day went. Everywhere the same message came across “Saturday is key… if Saturday is calm it will be ok!”, as it turned out later neither Saturday nor the rest of the week was “Calm”. The next day I went early to the shop to buy some food preparing to stay in for a few days buying bread and butter and cheese and other basic stuff like pasta and chicken and nuts and of course water. When I got into the shop it was like the opening of IKEA in Saudi Arabia… Whole of Zamalek seem to have had the same idea as me… “just in case”. When I got there the shop was trying to bake as much bread as the people would grab them… water shelves completely empty I had to go with small bottles the ones I managed to get my hands on.

After some shopping I got the word that I should get out of the country. I was 3 days away from leaving anyway, it was end of assignment… I couldn’t shake the idea of how bad the timing was, I even joked a bit about it with my friends saying “They knew I was leaving maybe they are angry because of that” and “I can’t believe the timing, couldn’t they have postponed it just one week?”. When I called the London office they quickly arranged a booking on a flight back to London through Athens, I’m glad I got through and that it worked so quickly. This was Saturday, the mobile network was jumpy but worked. I got a booking for the next day… so… no work on Sunday I guess. No proper good byes to friends.

That Saturday was one of the longest to date. It seemed to never end. I was sitting with some friends in their flat, we were all discussing possible ways out… the only thing which made us hesitate was if it was safe to get to the Airport. The police had pulled back from the streets, previously I had been worried that if I went out I would be mistaken for a protester and get the crap beaten out of me but now it was worse. I could reason with a police (to some degree) and at least show that I’m Swedish but to a criminal, which of many was rumoured to be out on the streets, I could not reason with (probably). Stories were going around about people hijacking cars and stealing everything in it.

I decided that the next day I had to try to go to the airport no matter what. I called my driver and he hesitated, understandably, about coming across town just to pick me up and get me to the airport. After a small discussion he said he would come the next morning. He came in a taxi, he couldn’t come in his own car. The curfew had been lifted 30 minutes ago we had to move quickly. On our way to the airport we went across the 6th of October bridge… Empty. All that was left was 3 burned cars on the side of the roads, one of them a police transporter. The evidence on the streets showed that there has been a big battle here, the same battle I had seen on TV the day before. I was relieved it was so calm. On the way we had been stopped 3 times by neighbourhood groups that had formed to protect their areas. The first one was worst as I didn’t know what to expect from these “good guy”-groups but it went well. The bridge leading to the airport was so jammed it felt like going to work in the morning. Good. Military has a small checkpoint. Good, no problems here.

Arriving at the airport I had only 100EGP notes and some coins. Taxi asked for 51EGP I gave him 71EGP “Allah ma3ak” (God be with you) and then I left. I went with my driver into the airport; chaos. People fighting at the ticket offices to buy tickets; they wouldn’t let anyone in to the check-in desks without a physical ticket. I didn’t have one… I tried to convince the first guard that the booking reference and the booking number I was showing him on my smart-phone WAS a ticket he wouldn’t accept it saying it had to be a paper. Giving up I went to another entrance, bingo, this guy was a bit more clever. I said good bye to my driver, put my hands in my pocket and took up all the cash I had left. He has risked something, leaving his family at home, to come and get me. I don’t know how much I gave him exactly but around 400EGP, “Ntibih 3a 7alak…” (Take care of yourself).

After getting in the chaos was worse. People running back and forth, virtually every flight was delayed or cancelled 2 fights broke out at two different check-in desks children crying while their mothers dragging them to hurry somewhere. I didn’t feel scared at any point, just worried that I would have to stay the night like I had heard that many others have had to. After the check-in and past passport security I went to my gate, bought a coffee on the way, an older women asking the person at the till if he accepts cards he said no, only cash. She wanted to buy some water… “What do you need?” I said, she said I only want to buy a coffee and a bottle of water I took up my wallet and looked through all the currencies I always keep as a buffer when travelling, “Here, it is on me..”, giving her $10. Shit I realized afterwards how much that actually was and that she didn’t need that much. Too late now, besides, she might need it later so I justified it with that and forgot about it.

When the plane took off 4 hours later I was realized what had happened. It was a surreal feeling about it, I had not really felt how bad it had been until I was sitting there about to take off. The adrenaline started pumping like mad, which it always does for me when the plane takes off. I forgot about Egypt for a few minutes concentrating on the sound of the air plane engines. Thoughts like “I’m glad they don’t build those like people build software” and “I’m sure that screeching sound is normal” crossed my mind. When we reached cruising altitude the adrenaline wore off. I started feeling “safe”.

Back in London was nice. I meet all my friends and colleagues and told them about my “Escape from Cairo” re-iterating it several times. Constantly watching the news… keeping myself updated, “My friends are still there”. My last day with Erlang Solutions consisted of making sure I didn’t leave any stuff behind and saying good bye to everyone. It was time for me to move back “Home” and starting building a life with my Wife. Talk about timing; Jan 25 uprising in Cairo coincides with my last days of 5 years with Erlang Solutions, I couldn’t help thinking that it would be a good story to tell my grand children some day.

When I landed at Göteborg it felt as if I was on vacation, as if I was going back to Cairo or London soon. It has now sunk in that I won’t. I’m now Home. Chapter 4 starts now…

/M

Categories: Personal

TopSwop

I’ve been working lately on a problem known as “topswop”. “Solving” it is not the hard part, in fact it is ridiculously easy (4 lines in Erlang). The problem of it is to find a long sequence of numbers.

This is the problem:

Imagine you have a list of non ordered positive integers from 1 to n. All numbers are unique, no number occur in the list twice. E.g.
[3,1,4,2,5]
Now take the first number (in this case 3) and extract that many numbers from the head of the list (in this case [3,1,4]) and then reverse them (becoming [4,1,3]) and join them with the tail again. This would produce the list [4,1,3,2,5].

Now repeat the same procedure until the number 1 is first in the list. E.g. (from the beginning):

1) [3,1,4,2,5] -> [4,1,3] [2,5]
2) [4,1,3,2,5] -> [2,3,1,4] [5]
3) [2,3,1,4,5] -> [3,2] [1,4,5]
4) [3,2,1,4,5] -> [1,2,3,4,5] <- solved

Think of it as a deck of cards facing up and you get the idea. This particular sequence took 4 iterations to solve. Even though this problem is easy to solve it is really difficult to find long sequences. The longest sequence for n = 5 is 7.

currently I’ve gotten good results up to n = 17 but after that it is really difficult to find optimal solutions. Those of you that are interested there is a contest at http://azspcs.net/ which is now running (with a prize and everything :)) but I’m mostly in it for fun.

I’m looking mostly at how Erlang’s capabilities can be used wisely to solve problems such as these and currently looking at a distributed genetic algorithm but I haven’t come very far yet. If I have the time (or interest) to finish it I will post it here. If you enter the contest and use Erlang in a clever way then let me know :)

My solve function looks like this:

solve([1|_], N) -> N;
solve([I|_] = List, N) ->
    {L1, L2} = split(I, List),
    solve(append(reverse(L1), L2), N + 1).
Categories: Erlang Tags: , ,

Erlang User Conference 2010 and late night ideas

Erlang User Conference; The 16th of November. The place where everyone who is serious about Erlang will be. I’ll be there… (obviously :P) If you feel like having a chat just look me up! But DON’T give me verbal bug-reports ok! promise!? good :)

I will also be running a tutorial on dbg (and a little about ttb), it will be the day before on the conference, the 15th. You can look at what tutorials there are and the schedule for the tutorials here: http://www.erlang-factory.com/conference/testingtutorialworkshop2010 and if you haven’t registered… you are missing out.

Tomorrow I’m running a dry-run for my colleagues. I’m sure there will be good feedback.

On other news… I got the idea to hook up ttb to a sequence diagram software that then creates some DSL code which then (using a system specific generator, call it “compiler”) compiles a test case. That way one can run a trace, if this trace is successful (everything looks as it should) one could argue that a test with the same input should always work (depending on what factors that are dynamic) and thus can use it as a system regression test. Hmmmmm….

Sounds too complicated to fly though, I’ll see what grows out of this one. I hate having hundreds of ideas and not enough time/motivation to realize many of them.

See you at the EUC! Don’t be late! :P

Categories: Erlang Tags: ,

9 Erlang pitfalls you should know about

Mistakes are a pure time waste and I believe one should invest time in doing things thoroughly rather than messy. But since much of our work requires fast delivery times we must find tools to help us minimize the mistakes we do. We need to cover as much of the “mistake spectra” as we can and our most important tool in the end is going to be experience.

Here is a list of 9 mistakes which are easily made in Erlang that I consider to be important to know about. Some of them are really subtle in the sense that they don’t necessarily cause a problem at compile time or when tested. This makes them dangerous in live systems and can potentially be really big time hoggers but using good tests when knowing these pitfalls should help raise confidence in the system. The order of the list below doesn’t mean anything, it just to helps you count to 9 ;)

1. Forgetting the new state.

Imagine you have a gen_server and that gen_server handles a few handle_calls and a state of some sort. At some point you want to manipulate the state and return it as the “new state” by returning it to the gen_server. Consider the following example:

handle_call(cmd, _From, State) ->;
    ...
    NState = manipulate_state(State),
    ....
    {reply, ok, State#state{ last_update = now() }}.

The problem in this case is that it is sometimes easy to forget the new state when updating the state a second time on the last line (or several times). In the above example the state is updated with a time stamp right before the returning and this is done with the state State and not NState. This usually happens because one sees the two updates separately and when writing the last statement the previous update has been forgotten.

Solution: Write shorter functions and update as much of the data you can in one place rather then “staged” updates.

2. Accidentally using pattern matching.

So lets say you have a function which binds some variables in the header. Usually this is something that represents a value which is related to what the function does. Inside the function however you can have more logic which binds more variables (e.g. by using pattern matching). This can present a problem because if you accidentally use the same variable but intended it to be a different one then you are pattern matching and not binding. Consider:

read_update_records(Table, N) ->
    NewRecords = lists:map(fun foo/1, db:read(Table, N)),
    %% something, something...
    case db:update(DbCon, NewRecords) of
        {ok, N} ->
            log("Yay, updated ~p number of records, [N])
        {error, Reason} ->
            log("Fail!",[]),
            exit({error, Reason})
    end.

In this example lets say 10 records are read; they are manipulated and then written back. Assume for argument’s sake that the N variable returned is the number of records updated. Now usually this will be the same and thus no problems (N matches in all places) but that is just pure luck it is a somewhat weak assumption. One might get away with this but eventually it will probably fail. I have seen this type of bugs in production code which has lasted for months waiting for the right moment to strike; usually it comes at a time when one is celebrating your great success in creating a system with such an impressive uptime ;)

Another example is that the _Variable syntax is a valid variable and gets bound; the only difference is that the compiler doesn’t complain about them not being used so you might think they are safe. A variable _Variable should not be confused with the “don’t-care-variable”. An example:

1> _Foo = 1.
1
2> _Foo = 2.
** exception error: no match of right hand side value 2
3> _ = 1.
1
4> _ = 2.
2

Solution: Don’t give your variable names too generic names; ideally the code should be “self documenting” by the use of good variable names. Also: short functions and don’t re-use “Don’t-care” variables. If you run into this bug it is usually a good indicator that your functions are too big or doing too many things.

3. "private property" -- "property" /= "private "

This bites people who are not used to how Erlang handles strings. As you might know, strings are just lists and assuming two lists A and B then this case can be read like this:

For each element in B, find it in A. If it exists in A then remove the first occurrence of it from A. Return the rest of A.

If we assume A = "private property" and B = "property" then this means:

1> "private property" -- "property".
"iva pert"

Solution: Remember that strings are lists… it is as simple as that.

4. Guard tests silently fail

Exactly what the title says; be careful about this. An example:

1> F = fun(List) when length(List) > 0 -> ok; (_) -> not_ok end.
#Fun<erl_eval.6.13229925>
2> F([1,2]).
ok
3> F({1,2}).
not_ok

length({1,2}) should result in a bad argument but it doesn’t. This also affects lists comprehensions and has surprised many

1> [ X || X <- [1, 2, foo, bar, 5, 6], X rem 2 == 0 ]. 
[2,6]

Normally foo rem 2 will result in a bad argument but in this case it is just skipped. This is a logical behaviour but sometimes not so obvious. I most often hear complains about this one when a set of corrupted data is being worked on and the guards silently fail not taking that failing piece of data into account. An example is that we want to update 10 rows in a database and we only update 9 it could turn out to be that one of the values we are iterating over is not an integer.

Solution: Learn it, get over it ;)

5. Returning arbitrary {error, Reason}

This is a very common mistake and is bound to cause problems at some point in time. This is one of the most apparent cases where the Let it crash philosophy applies but still even some experienced developers fail to recognize it. Ponder the following example: Assume that you have a function that does something in a database and for the sake of argument lets say that you have manipulate some data before you update the database E.g:

update_db_value(Key, Value) ->
    case db:connect() of
        {ok, Con} ->
            NValue = term_to_special_format(Value),
            case db:write(Con, Key, NValue) of
                ok ->
                    ok;
                {error, Reason} ->
                    log(Reason),
                    db:disconnect(Con),
                    {error, unable_to_write_value}
            end;
        {error, Reason}
            log(Reason),
            {error, unable_to_connect}
    end.

Now there are several things to consider; are you suppose to return the errors? Will they (the errors) be understood by the above layer? Are you writing a library and thus the above layer adjusts to you? etc…

If we assume that this code is under a management application and that this code is glue code then we could argue that returning arbitrary error messages is a bad idea. If you find ourself in a situation like this then you need to ask ourself if you don’t need to go back to specification and define what to do rather then just returning “error-something”.

Usually, however, the choice depends on higher layers since a crash will propagate. Tests might not be enough, for example 1) your stubs of the management application might accept the return values but the real one might not. 2) There might be too many cases where you haven’t considered returning an error tuple and your code becomes very inconsistent (sometimes crashing, sometimes not) or 3) You might be defining the wrong behaviour in code and thus there is an inconsistency between code and specification/design. In my humble opinion I think the following would be better (assuming the circumstances allow):

update_db_value(Key, Value) ->
    {ok, Con} = db:connect(),
    db:write(Con, Key, term_to_special_format(Value)),
    db:disconnect(Con).

This particular point can be a subject for endless discussion so I’ll just stop here; my point is just simply: be careful about what you return from a function because error tuples don’t just disappear.

Solution: Read this and don’t blindly return {error, Reason}

6. #record{} ambiguity in function clause and function body

When you write #record{} in a function body the compiler will replace that with a tuple of the same arity as there are number of fields in the record definition and put the different positions in the tuple to the default values which are also specified in the record definition.

-record(foo, { bar = 1, baz }.

in a function body becomes:

{foo, 1, undefined}

Now you might have expected it to be replaced the same way in a function clause… well no not really. When the #record{} notation is used in a function clause it is replaced by a set of guards which are used to match the function clause. However, any variable you bind in the function clause will however still bind to the actual tuple and the correct values. An example, consider this:

-module(foo).
-record(foo, { bar, baz }).
function() ->
     r(#foo{}),
     r(#foo{ bar = 1 }).

r(#foo{ baz = undefined } = Foo) -> io:format("1: bar == ~p~n",[Foo#foo.bar]);
r(#foo{ bar = 1 } = Foo) -> io:format("2: bar == ~p~n",[Foo#foo.bar]).

If we didn’t know better we would think the output would be

1: bar == undefined
2: bar == 1

but when we run the example it shows

1: bar == undefined
1: bar == 1

In the source code example above; line 4 is replaced with r({foo, undefined, undefined}), and logically matches the first clause of the r/1 function on line 7. However on line 5 the line will be replaced with r({foo, 1, undefined}) but it will still match on the first function clause on line 7. We would think that it should match on line 8 because of an assumption that the record in the function clause on line 7 is replaced with {foo, undefined, undefined} which is not what we were trying to match (namely {foo, 1, undefined}) thus line 7 doesn’t match and we go to line 8. So what is going on?

Well what happens is that the record, as mentioned, is replaced with different things at different places. In a function body the #record{} notation is just replaced with {foo, undefined, undefined} but in a function clause the function clause is extended with a series of guards. The guards that are specified are derived from what we wrote in code, the rest are not checked. E.g. #foo{} is replaced with guards to check that the argument passed is a tuple, that it is the same arity as specified and that the first element is the atom foo but it doesn’t check anything about element 2 or 3. This means that if we write #foo{ biz = undefined } it will add a guard to check that biz == undefined (or rather the position known as biz). Not including biz is not the same as including it and setting the value to undefined (even though that is usually the “default” value). This means that when you run the above example the second statement doesn’t have any affect since the value in position bar does not change the matching of the function on line 7.

To show my point more clear you can compile the module above like this:

$> erlc -S foo.erl

A part of the output file for me shows:

...
{function, r, 1, 4}.
  {label,3}.
    {func_info,{atom,foo},{atom,r},1}.
  {label,4}.
    {test,is_tuple,{f,3},[{x,0}]}.
    {test,test_arity,{f,3},[{x,0},3]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {get_tuple_element,{x,0},2,{x,3}}.
    {test,is_eq_exact,{f,3},[{x,1},{atom,foo}]}.
    {test,is_eq_exact,{f,5},[{x,3},{atom,undefined}]}.
    {move,{atom,foo},{x,1}}.
    ...
  {label,5}.
    {test,is_eq_exact,{f,3},[{x,2},{integer,1}]}.
    {test,is_tuple,{f,6},[{x,0}]}.
    {test,test_arity,{f,6},[{x,0},3]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,6},[{x,1},{atom,foo}]}.
    ...

Note line 11 and 12 do not test the middle value ({x, 2}) thus this clause will match.

Solution: N/A, just learn the difference and don’t make assumptions about record values in a function clause.

7. gen_server, trap_exit and terminate/2

The terminate/2 callback function in gen_server is suppose to be considered the opposite of the init/1 function; Setup/Tear down. The truth however isn’t that the terminate/2 always runs, there are some preconditions that we need to be aware of.

If anything happens inside the gen_server itself or it issues a stop-tuple as a return then terminate/2 will always be called, which is logical. However if it is under a supervisor the documentation says:

If the gen_server is part of a supervision tree and is ordered by its supervisor to terminate, this function will be called with Reason=shutdown if the following conditions apply:

  • the gen_server has been set to trap exit signals, and
  • the shutdown strategy as defined in the supervisor’s child specification is an integer timeout value, not brutal_kill.

So in other words (but still very similar ones): If the the gen_server is shut down and it is not trapping exits then the terminate/2 function will not be called. This might seem strange to some because one always expects the gen_server to get a chance to “clean up” after itself but if you think about it it is logical; if a process gets an exit signal it should die with the same reason if it didn’t trap the signal. The gen_server code has a case clause which explicitly checks for exit signals and only then allows terminate/2 to be called.

The last statement wasn’t entirely true though. The documentation further states that:

Even if the gen_server is not part of a supervision tree, this function will be called if it receives an ‘EXIT’ message from its parent. Reason will be the same as in the ‘EXIT’ message.

Note: from its parent. In other words what was written in the previous section only applies to exit messages coming from the gen_server’s parent process which means that (if the processes is supervised) the supervisor is the parent. This means that if you start a gen_server process and that process in turn starts another process (which it links to) and that second process crashes then the first gen_server process will die without calling terminate/2, unless of course it traps exits and in this case it will only receive a message (received by handle_info/2).

All according to predictable behaviour but can be overlooked so think twice when it comes to restart strategies and trapping exits. E.g:

In the following examples I will use this module:

-module(gensrv).
-compile(export_all).

start_link(BoolFlag) -> gen_server:start_link(?MODULE, BoolFlag, []).

init(BoolFlag) ->
    process_flag(trap_exit, BoolFlag),
    {ok, undefined}.

handle_call({spawn_link, BoolFlag}, _, _) ->
    {ok, Pid} = gen_server:start_link(?MODULE, BoolFlag, []),
    {reply, {ok, Pid}, Pid}.
    
handle_info({'EXIT', _Pid, _Reason}, St) ->
    {noreply, St}.

terminate(_Reason, _St) ->
    ok.

If we use this module to first start a gen_server and then spawn a linked process under it we can observe this behaviour previously described. The below example shows a gen_server spawning another process and finally being ordered to shut down by its parent (the shell). In both processes terminate/2 is called.

> process_flag(trap_exit, true).     
false
> {ok, P1} = gensrv:start_link(true).
{ok,<0.389.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, true}).
{ok,<0.391.0>}
> exit(P1, shutdown).
true
(<0.389.0>) call gensrv:terminate(shutdown,<0.391.0>)
(<0.391.0>) call gensrv:terminate(shutdown,undefined)
> flush().
Shell got {'EXIT',<0.389.0>,shutdown}
ok

In this following scenario we start the two processes like before but the second one doesn’t trap exits (so we can kill it using reason shutdown from the shell).

> {ok, P1} = gensrv:start_link(true).
{ok,<0.407.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, false}).
{ok,<0.409.0>}
> exit(P2, shutdown).                                 
(<0.407.0>) call gensrv:handle_info({'EXIT',<0.409.0>,shutdown},<0.409.0>)
true
> exit(P1, kill).
true
> flush().
Shell got {'EXIT',<0.407.0>,killed}

Here we can see that even if we do trap exits the first process won’t shut down because it wasn’t the parent process that sent the exit signal, it was the process it spawned. Since the first process is still alive we kill it off at the end.

This third scenario shows the common misunderstanding about the terminate/2 function. In this example we start one gen_server which in turn starts another one (just like before) but this time the first one doesn’t trap exit but the second one does:

> {ok, P1} = gensrv:start_link(false).                
{ok,<0.422.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, true}). 
{ok,<0.424.0>}
> exit(P1, shutdown).
true
(<0.424.0>) call gensrv:terminate(shutdown,undefined)
> flush().
Shell got {'EXIT',<0.422.0>,shutdown}
ok

Even though both processes exit with shutdown they have different behaviour because one is trapping exists the other one isn’t. The first process receives an exit signal (reason shutdown) from its parent (the shell) but is not trapping exit and thus just exists with the same reason. The second process gets notified that its parent (the first process) exited with reason shutdown and since it is trapping exits it calls terminate/2.

This is all logical behaviour if you consider how processes and links work in general the only exception here are the rules added by the OTP behaviour of “shut down” signals which are really just a convention using the shutdown reason in an ‘EXIT’ message. Clever but can be confusing.

So in short; always remember:

  • If a gen_server process self terminates (I.e. it returns stop or an exit occurs inside the callbacks) then terminate/2 will always be called
  • A gen_server process will not have its terminate/2 callback called if it is not trapping exits
  • If a gen_server process is not trapping exits but its child processes are; then the child processes will have their terminate/2 functions called

Solution: Always spawn processes under a supervisor. If you don’t then make sure your own “top-level” process traps exits and cleans up after itself.

8. Trying to use record_info/2 in runtime

This pitfall will appear as an error when you compile but can waste time if you don’t know the idea behind record_info/2. Since records don’t really exists then their fields don’t exist either, well… not their names anyway. Records only exist in code but not in runtime; as mentioned before the records are simply just replaced with something else (tuples and/or guards). Record fields (as seen in code) don’t exist either and are only references for the compiler to do the right thing. This means that if we specify the record -record(foo, { bar, baz }) and later use #foo{ bar = 1 } then the compiler uses the identifier bar to know in which position in the tuple it should put the value 1 in it does not know the name bar in runtime.

This can be tricky in the beginning because one might think that the “functions” record_info(fields, Record) -> [Field] and record_info(size, Record) -> Size can be used in runtime when they actually can not. These functions are simply replaced by the parse transformation made before compilation. In order for them to work the record has to be defined somewhere in the module (or header file) and the record name has to be given explicitly.

This example will not work:

get_record_info(RecordName) -> record_info(fields, RecordName).

because during compile time the record name is not known and therefore it can not expand to anything.

This example will work:

get_record_info() -> record_info(fields, foo).

because it will simply be replaced (according to the record definition) to:

get_record_info() -> [bar, baz].

This also means that you can not make “dynamic” records and get their field names, it has to be known at compile time.

Solution: Understand that records are not “objects” or runtime constructs; they are only syntactic sugar.

9. Using and/or when you mean andalso/orelse

and and or evaluate both sides before determining an expression’s truth value while andalso and orelse evaluates the left side first and depending on its value decides if it evaluates the right side. These are called short-circuit expressions but actually are just acting like one would normally expect.

Example:

> true or exit(1).
** exception exit: 1
> true orelse exit(1).
true
> false and exit(1).
** exception exit: 1
> false andalso exit(1).
false

Solution: Only and/or or if there is an absolute reason to otherwise use andalso/orelse

Conclusion

Test more thoroughly and don’t make too many assumptions.

Peace.

EDIT: Fixed a few mistakes and spelling errors.

Categories: Erlang Tags:

Hacking entop

entop was built to be extendible, mostly because one can’t build a monitoring tool which suits everyone and every project. In this post I’ll go through how to extend entop and how to show the information that is important to you and/or your project.

Future releases of entop will have a better way to specify the callback modules that one wish to use when starting entop. This will make use of your own modules without the need to recompile entop every time and/or save some generic callback modules in between projects. Currently though you have to recompile entop but I’m getting ahead of myself.

The UI

The entop UI (just like top) has two sections; headers and a table.

The headers consist of three parts. The first part is the static information gathered from the remote node. This is the kind of information that will never change (or very very rarely changes) in between the time you poll. This is static data and is displayed on the first line. This is not intended to be changed (unless you change it in the code). The second part consists of four rows of data and are intended to be customized by the callback function, we will come back to this part later. The third and last part contains some information on how the rows in the table are displayed and how long it took to fetch the data from the remote node. This is displayed on line five (last line) and is not intended to be changed (again, unless you change the code).

The table consist of two parts; columns and rows (yes, really!). Column titles show what the information in each column is suppose to show and each row after that is the item (row/data) itself. Non-brainer. Both the column and the row are intended to be customizable to show as many columns as you want with what ever information you want.

The collector module

When entop starts up it tries to connect to the node that was given as an argument to it. If successful it will read the static data from that node and then start polling the node for the dynamic data. Before the polling starts, entop pushes the binary code of the callback module to the remote node and does so every time entop establishes a connection. Which module that is used as the collector module is currently specified directly in code but future versions of entop will enable to change this name as a configuration or a CLI argument.

Note: Because a module is pushed and loaded on the other node it is important that the name of the module doesn’t clash with something else so be careful when naming your modules.

The collector module needs to have one function exported, get_data/0. This function is the one that will be called every time the node is polled. This function will return data that is later used by a format module (we’ll get to that module later). The get_data/0 function MUST return a tuple of three:

get_data() ->
    ...
    {ok, HeaderData, TableData}

First element is the atom ok. The second element can be any Erlang term, it will be used later on to format the header lines. The header data must be something that makes sense to the format module. The third argument is a list of Erlang terms where each item in that list is data which will be used to format each row in the table. Each piece of data in the list must be something that makes sense to the format module. The default collector module in entop is currently entop_collector.erl.

Note: HeaderData can be any Erlang term and TableData must be a list of Erlang terms.

The format module

When the data has been fetched entop will call a format module to format the information it will display on the screen. To do this the format module needs to implement the init/1, header/2 and row/2 functions.

The init/1 will be called when the callback is initialized (once when the application is started and a connection was successfully established with the remote node). It is called with one argument which is the node name of the target node and must return a tuple of three:

init(Nodename) ->
    ...
    {ok, {ColumnSpec, DefaultColumn}, State}

First element is the atom ok. The second element is a tuple of two in which the first element is a column specification and the second element the default column number to sort on. The third element is the user defined state.

A column specification looks like this:

    Columns = [ {"Title", Width, Options}, {"Title2", Width, Options}, ...].

Each column is specified by a tuple of three. The first element, the title, must be a flat io list. The second element, the width, is a number which reserves that many characters to display in the column and should be larger than three, excess characters will be cut off. The third element, options, is a property list with options that can be applied to that column; currently there is only one such option: {align, ...} which specifies if the text should be aligned right or left (default left) in the column.

After the data has been retrieved from the remote node the header/2 function will be called first. The first argument to this function is the HeaderData which was returned by the get_data/0 function and the second argument is the user state data which was returned by init/1. The return value of this function must be a tuple of three as described below:

header(HeaderData, State) ->
    Line1 = "First row in the header (second after the node info)",
    Line2 = "Second row in the header ...",
    Line3 = "Third row ...",
    Line4 = "...",
    {ok, [Line1, Line2, Line3, Line4], NewState}.

The second element of the return value must be a list of length == 4. Each line in that list must be a flattened io list with no newlines and they have to be formatted the way the user wish to have them; entop will not touch them. Each line will be shown on row 1-4 in the header section of the gui. If a line is too long for the width of the screen it will be cut off.

Side note: I am aware that having a list with a fixed size is not very intuitive and I will probably change this to be a tuple of size four later on but currently this is because I’m lazy in the background implementation ;)

After the headers have been processed the table will be populated. For each element in the TableData list, the function row/2 will be called together with the user state.

row(TableDataElement, State) ->
    ...
    {ok, {"Col1", [{test}], atom, 1337}, NewState}.

The function must return a tuple with the size of exactly the length of the column list specified in the init/1 function (size(ReturnTuple) == length(Columns)). The values in the tuple will be used to populate each cell in the table and can be any term, what ever term is put there it will be formatted, made into a string, flattened and truncated to the width of the column. If a particular line is to be skipped (not to be included in the table) the tuple {ok, skip, State} can be used.

Putting it together: An application viewer

So as a useful example let’s create a view that shows us all the applications on the target node which are loaded and which state they are in (if started or just loaded) and if they have a main process (I.e. the application is not a library).

I recommend starting by creating the view; this lets you think about what you want to show and how, it gives you a mock-up of the view. Start by creating the file entop_application_format.erl (never mind the corny name, what ever floats the boat for now :))

First we need to think about which columns we want to show. My idea was that we have six columns as we want to show the following information; Name, Description, Version, State (Loaded/Started), Type (permanent/temporary) and Pid (if it has one).

Enter the module and export attribute and implement the init/1 function as follows:

%% entop_application_format.erl
-module(entop_application_format).
-export([init/1, header/2, row/2]).

init(_Node) ->
    Columns = [{"Name", 13, []},
	           {"Description", 20, []},
	           {"Version", 8, []},
	           {"State", 8, []},
	           {"Type", 12, []},
	           {"Pid", 10, []}],
    {ok, {Columns, 1}, undefined}.

This specifies the six columns and their width. It also specifies that the first column is going to be the one to sort on by default and also that we don’t care about the state so we set it to undefined.

Usually I now create a dummy header/2 and row/2 function to give myself a mockup of how it will be (and perhaps to get the column width right) but I will skip that step here and go straight to implementation. Below is the header function which will be implemented:

header(HeaderProplist, State) ->
    SysMem = proplists:get_value(system, HeaderProplist),
    ProcMem = proplists:get_value(processes, HeaderProplist),
    AtomMem = proplists:get_value(atom, HeaderProplist),
    BinMem = proplists:get_value(binary, HeaderProplist),
    CodeMem = proplists:get_value(code, HeaderProplist),
    EtsMem = proplists:get_value(ets, HeaderProplist),

    Line1 = lists:concat(["System: ", SysMem, ", Process: ", ProcMem, ", Atom: ", AtomMem]),
    Line2 = lists:concat(["Binary: ", BinMem, ", Code: ", CodeMem, ", ETS: ", EtsMem]),
    Line3 = "Machine uptime: ",
    Line4 = proplists:get_value(machine_uptime, HeaderProplist),

    {ok, [Line1, Line2, Line3, Line4], State}.

To make the concept more clear I have bloated the function for readability. Lines 2-7 show various values that are read from a proplist; this means that we expect a proplist from the get_data/0 function which we implement later. During development I don’t make any assumptions on the data structures here but rather use dummy values and come back and change them to fit the real data structure after I have implemented the collector module. Lines 9-12 formats the four strings that we need and finally I return them as a list (remember; must be a list of four!).

After the header has been implemented we need to provide a callback to fill the rows in the table. The row function will be called for every element in the TableData. Here is the row function:

row({Name, Desc, Vers, Pid, Type}, State) ->
    {AppState, PidStr} =
        case Pid of
            not_running ->
                {"Loaded", "-"};
            undefined ->
                {"Started", "-"};
            Pid ->
                [_, Mdl, End] = string:tokens(erlang:pid_to_list(Pid), "."),
                {"Started", lists:concat(["<0.", Mdl, ".", End])}
        end,
    {ok, {Name, Desc, Vers, AppState, Type, PidStr}, State}.

As mentioned before I wouldn’t make any assumptions on the data structure I get in the function header but this time I happen to know ;). Most of this should be self explanatory but a note about line 9-10; if you retrieve a pid (as a pid type) from a remote node the pid reference on the local node will indicate that it isn’t a local pid by setting the first identifier to some number (other than 0). This is useful but ugly because in this case you want to make it seem as if you are on the other node showing its data. A hack to overcome this is to make the pid into a string on the remote side when you ask for it, this will return a string based on a local pid and thus the local reference format. Another hack is to get the remote pid, make it into a string, strip away the “remote” part of it and replace it with a “local” part. The latter is what I’m doing here. The return of this function is a tuple with a value for each column in the table, in our case exactly six since we have six columns.

The final part is to provide the collector module, call it entop_application_collector.erl:

%% entop_application_collector.erl
-module(entop_application_collector).
-export([get_data/0]).

get_data() ->
    HeaderProplist = [{machine_uptime, os:cmd("uptime")} |
                      erlang:memory([system, processes, atom, binary, code, ets])],

    AppInfo = application:info(),
    Loaded = proplists:get_value(loaded, AppInfo),
    Running = proplists:get_value(running, AppInfo),
    Started = proplists:get_value(started, AppInfo),

    MapFun = fun({Name, Desc, Vers}, Acc) ->
                      AppPid = proplists:get_value(Name, Running, not_running),
                      StartType = proplists:get_value(Name, Started, not_running),
                      [ {Name, Desc, Vers, AppPid, StartType} | Acc ]
                  end,

    Applications = lists:foldl(MapFun, [], Loaded),

    {ok, HeaderProplist, Applications}.

This function will first produce a proplist on which the header/2 function will be applied (line 6-7) and then produce a list of tuples for which the function row/2 will be mapped on (applied to every element produced in that list).

One last step: to specifiy the callbacks. Currently there is no easier way to do this but it will probably be part of future versions of the tool. To specify the callbacks edit the file entop.hrl and change the state record definition so that the field callback = entop_application_format and the field remote_module = entop_application_collector.

If you did it all right, it should look something like this:

If you want to continue experimenting try adding a “Crash’n Recovery” notification which says how many times an application has crashed and restarted during the time it has been monitored (Tip: Use the state to monitor the pids).

Good luck, have fun

Categories: entop Tags: , ,

Renaming ntop to entop

2010/08/16 1 comment

So the ntop application I released yesterday has received pretty nice feedback from friends and unknowns but apparently there is another ntop application out there; the “other” ntop is “Network Top” and is found at www.ntop.org. So I’m changing the name of my ntop to entop! It is a corny name… I know… but I liked the name ntop so this will just have to work + google doesn’t show anything software related when searching for entop (except for some Finish stuff which doesn’t look software related :D) so I just picked that name.

So… ntop is now entop which stands for “Erlang Node top”. Enjoy it here:

http://github.com/mazenharake/entop

Also; here is a new screen shot! It looks exactly the same as the previous one… but says entop :P

Categories: cecho, entop, Erlang, ntop, Software Tags: , , ,
Follow

Get every new post delivered to your Inbox.