From Cairo to London to Gรถteborg.

2011/02/10 4 comments

My 2ยฝ years in Cairo came to a sudden end. When the protests began we simply just stayed in, we thought that maybe it will be better tomorrow. I called my wife early that Friday morning to tell her that my Vodafone account was blocked and that my Mobinil account was probably going to get blocked as well. I told her to call my parents to let them know I’d stay in the apartment all the time and that they shouldn’t worry, it was “just going to be a demonstration today”, I had even said to my colleagues that “I’ll see you on Sunday” (working day in Egypt).

When they cut the international calls, the mobile network and the Internet it started getting serious. When we saw burning cars and building, chanting and violence/looting we started to get scared. It wasn’t a nice end to an otherwise so great part of my life. From the 19th floor apartment we could see far into the city and the end of the 6th of October bridge which leads into Tahrir square, and we saw everything burning and smoke coming from at least a dozen places.

At around 00:30 local time Friday, I managed to get an international call through calling my parents. After a discussion with my father we decided I’d stay put and see how the next day went. Everywhere the same message came across “Saturday is key… if Saturday is calm it will be ok!”, as it turned out later neither Saturday nor the rest of the week was “Calm”. The next day I went early to the shop to buy some food preparing to stay in for a few days buying bread and butter and cheese and other basic stuff like pasta and chicken and nuts and of course water. When I got into the shop it was like the opening of IKEA in Saudi Arabia… Whole of Zamalek seem to have had the same idea as me… “just in case”. When I got there the shop was trying to bake as much bread as the people would grab them… water shelves completely empty I had to go with small bottles the ones I managed to get my hands on.

After some shopping I got the word that I should get out of the country. I was 3 days away from leaving anyway, it was end of assignment… I couldn’t shake the idea of how bad the timing was, I even joked a bit about it with my friends saying “They knew I was leaving maybe they are angry because of that” and “I can’t believe the timing, couldn’t they have postponed it just one week?”. When I called the London office they quickly arranged a booking on a flight back to London through Athens, I’m glad I got through and that it worked so quickly. This was Saturday, the mobile network was jumpy but worked. I got a booking for the next day… so… no work on Sunday I guess. No proper good byes to friends.

That Saturday was one of the longest to date. It seemed to never end. I was sitting with some friends in their flat, we were all discussing possible ways out… the only thing which made us hesitate was if it was safe to get to the Airport. The police had pulled back from the streets, previously I had been worried that if I went out I would be mistaken for a protester and get the crap beaten out of me but now it was worse. I could reason with a police (to some degree) and at least show that I’m Swedish but to a criminal, which of many was rumoured to be out on the streets, I could not reason with (probably). Stories were going around about people hijacking cars and stealing everything in it.

I decided that the next day I had to try to go to the airport no matter what. I called my driver and he hesitated, understandably, about coming across town just to pick me up and get me to the airport. After a small discussion he said he would come the next morning. He came in a taxi, he couldn’t come in his own car. The curfew had been lifted 30 minutes ago we had to move quickly. On our way to the airport we went across the 6th of October bridge… Empty. All that was left was 3 burned cars on the side of the roads, one of them a police transporter. The evidence on the streets showed that there has been a big battle here, the same battle I had seen on TV the day before. I was relieved it was so calm. On the way we had been stopped 3 times by neighbourhood groups that had formed to protect their areas. The first one was worst as I didn’t know what to expect from these “good guy”-groups but it went well. The bridge leading to the airport was so jammed it felt like going to work in the morning. Good. Military has a small checkpoint. Good, no problems here.

Arriving at the airport I had only 100EGP notes and some coins. Taxi asked for 51EGP I gave him 71EGP “Allah ma3ak” (God be with you) and then I left. I went with my driver into the airport; chaos. People fighting at the ticket offices to buy tickets; they wouldn’t let anyone in to the check-in desks without a physical ticket. I didn’t have one… I tried to convince the first guard that the booking reference and the booking number I was showing him on my smart-phone WAS a ticket he wouldn’t accept it saying it had to be a paper. Giving up I went to another entrance, bingo, this guy was a bit more clever. I said good bye to my driver, put my hands in my pocket and took up all the cash I had left. He has risked something, leaving his family at home, to come and get me. I don’t know how much I gave him exactly but around 400EGP, “Ntibih 3a 7alak…” (Take care of yourself).

After getting in the chaos was worse. People running back and forth, virtually every flight was delayed or cancelled 2 fights broke out at two different check-in desks children crying while their mothers dragging them to hurry somewhere. I didn’t feel scared at any point, just worried that I would have to stay the night like I had heard that many others have had to. After the check-in and past passport security I went to my gate, bought a coffee on the way, an older women asking the person at the till if he accepts cards he said no, only cash. She wanted to buy some water… “What do you need?” I said, she said I only want to buy a coffee and a bottle of water I took up my wallet and looked through all the currencies I always keep as a buffer when travelling, “Here, it is on me..”, giving her $10. Shit I realized afterwards how much that actually was and that she didn’t need that much. Too late now, besides, she might need it later so I justified it with that and forgot about it.

When the plane took off 4 hours later I was realized what had happened. It was a surreal feeling about it, I had not really felt how bad it had been until I was sitting there about to take off. The adrenaline started pumping like mad, which it always does for me when the plane takes off. I forgot about Egypt for a few minutes concentrating on the sound of the air plane engines. Thoughts like “I’m glad they don’t build those like people build software” and “I’m sure that screeching sound is normal” crossed my mind. When we reached cruising altitude the adrenaline wore off. I started feeling “safe”.

Back in London was nice. I meet all my friends and colleagues and told them about my “Escape from Cairo” re-iterating it several times. Constantly watching the news… keeping myself updated, “My friends are still there”. My last day with Erlang Solutions consisted of making sure I didn’t leave any stuff behind and saying good bye to everyone. It was time for me to move back “Home” and starting building a life with my Wife. Talk about timing; Jan 25 uprising in Cairo coincides with my last days of 5 years with Erlang Solutions, I couldn’t help thinking that it would be a good story to tell my grand children some day.

When I landed at Gรถteborg it felt as if I was on vacation, as if I was going back to Cairo or London soon. It has now sunk in that I won’t. I’m now Home. Chapter 4 starts now…

/M

Categories: Personal

TopSwop

I’ve been working lately on a problem known as “topswop”. “Solving” it is not the hard part, in fact it is ridiculously easy (4 lines in Erlang). The problem of it is to find a long sequence of numbers.

This is the problem:

Imagine you have a list of non ordered positive integers from 1 to n. All numbers are unique, no number occur in the list twice. E.g.
[3,1,4,2,5]
Now take the first number (in this case 3) and extract that many numbers from the head of the list (in this case [3,1,4]) and then reverse them (becoming [4,1,3]) and join them with the tail again. This would produce the list [4,1,3,2,5].

Now repeat the same procedure until the number 1 is first in the list. E.g. (from the beginning):

1) [3,1,4,2,5] -> [4,1,3] [2,5]
2) [4,1,3,2,5] -> [2,3,1,4] [5]
3) [2,3,1,4,5] -> [3,2] [1,4,5]
4) [3,2,1,4,5] -> [1,2,3,4,5] <- solved

Think of it as a deck of cards facing up and you get the idea. This particular sequence took 4 iterations to solve. Even though this problem is easy to solve it is really difficult to find long sequences. The longest sequence for n = 5 is 7.

currently I’ve gotten good results up to n = 17 but after that it is really difficult to find optimal solutions. Those of you that are interested there is a contest at http://azspcs.net/ which is now running (with a prize and everything :)) but I’m mostly in it for fun.

I’m looking mostly at how Erlang’s capabilities can be used wisely to solve problems such as these and currently looking at a distributed genetic algorithm but I haven’t come very far yet. If I have the time (or interest) to finish it I will post it here. If you enter the contest and use Erlang in a clever way then let me know ๐Ÿ™‚

My solve function looks like this:

solve([1|_], N) -> N;
solve([I|_] = List, N) ->
    {L1, L2} = split(I, List),
    solve(append(reverse(L1), L2), N + 1).
Categories: Erlang Tags: , ,

Erlang User Conference 2010 and late night ideas

Erlang User Conference; The 16th of November. The place where everyone who is serious about Erlang will be. I’ll be there… (obviously :P) If you feel like having a chat just look me up! But DON’T give me verbal bug-reports ok! promise!? good ๐Ÿ™‚

I will also be running a tutorial on dbg (and a little about ttb), it will be the day before on the conference, the 15th. You can look at what tutorials there are and the schedule for the tutorials here: http://www.erlang-factory.com/conference/testingtutorialworkshop2010 and if you haven’t registered… you are missing out.

Tomorrow I’m running a dry-run for my colleagues. I’m sure there will be good feedback.

On other news… I got the idea to hook up ttb to a sequence diagram software that then creates some DSL code which then (using a system specific generator, call it “compiler”) compiles a test case. That way one can run a trace, if this trace is successful (everything looks as it should) one could argue that a test with the same input should always work (depending on what factors that are dynamic) and thus can use it as a system regression test. Hmmmmm….

Sounds too complicated to fly though, I’ll see what grows out of this one. I hate having hundreds of ideas and not enough time/motivation to realize many of them.

See you at the EUC! Don’t be late! ๐Ÿ˜›

Categories: Erlang Tags: ,

9 Erlang pitfalls you should know about

Mistakes are a pure time waste and I believe one should invest time in doing things thoroughly rather than messy. But since much of our work requires fast delivery times we must find tools to help us minimize the mistakes we do. We need to cover as much of the “mistake spectra” as we can and our most important tool in the end is going to be experience.

Here is a list of 9 mistakes which are easily made in Erlang that I consider to be important to know about. Some of them are really subtle in the sense that they don’t necessarily cause a problem at compile time or when tested. This makes them dangerous in live systems and can potentially be really big time hoggers but using good tests when knowing these pitfalls should help raise confidence in the system. The order of the list below doesn’t mean anything, it just to helps you count to 9 ๐Ÿ˜‰

1. Forgetting the new state.

Imagine you have a gen_server and that gen_server handles a few handle_calls and a state of some sort. At some point you want to manipulate the state and return it as the “new state” by returning it to the gen_server. Consider the following example:

handle_call(cmd, _From, State) ->;
    ...
    NState = manipulate_state(State),
    ....
    {reply, ok, State#state{ last_update = now() }}.

The problem in this case is that it is sometimes easy to forget the new state when updating the state a second time on the last line (or several times). In the above example the state is updated with a time stamp right before the returning and this is done with the state State and not NState. This usually happens because one sees the two updates separately and when writing the last statement the previous update has been forgotten.

Solution: Write shorter functions and update as much of the data you can in one place rather then “staged” updates.

2. Accidentally using pattern matching.

So lets say you have a function which binds some variables in the header. Usually this is something that represents a value which is related to what the function does. Inside the function however you can have more logic which binds more variables (e.g. by using pattern matching). This can present a problem because if you accidentally use the same variable but intended it to be a different one then you are pattern matching and not binding. Consider:

read_update_records(Table, N) ->
    NewRecords = lists:map(fun foo/1, db:read(Table, N)),
    %% something, something...
    case db:update(DbCon, NewRecords) of
        {ok, N} ->
            log("Yay, updated ~p number of records, [N])
        {error, Reason} ->
            log("Fail!",[]),
            exit({error, Reason})
    end.

In this example lets say 10 records are read; they are manipulated and then written back. Assume for argument’s sake that the N variable returned is the number of records updated. Now usually this will be the same and thus no problems (N matches in all places) but that is just pure luck it is a somewhat weak assumption. One might get away with this but eventually it will probably fail. I have seen this type of bugs in production code which has lasted for months waiting for the right moment to strike; usually it comes at a time when one is celebrating your great success in creating a system with such an impressive uptime ๐Ÿ˜‰

Another example is that the _Variable syntax is a valid variable and gets bound; the only difference is that the compiler doesn’t complain about them not being used so you might think they are safe. A variable _Variable should not be confused with the “don’t-care-variable”. An example:

1> _Foo = 1.
1
2> _Foo = 2.
** exception error: no match of right hand side value 2
3> _ = 1.
1
4> _ = 2.
2

Solution: Don’t give your variable names too generic names; ideally the code should be “self documenting” by the use of good variable names. Also: short functions and don’t re-use “Don’t-care” variables. If you run into this bug it is usually a good indicator that your functions are too big or doing too many things.

3. "private property" -- "property" /= "private "

This bites people who are not used to how Erlang handles strings. As you might know, strings are just lists and assuming two lists A and B then this case can be read like this:

For each element in B, find it in A. If it exists in A then remove the first occurrence of it from A. Return the rest of A.

If we assume A = "private property" and B = "property" then this means:

1> "private property" -- "property".
"iva pert"

Solution: Remember that strings are lists… it is as simple as that.

4. Guard tests silently fail

Exactly what the title says; be careful about this. An example:

1> F = fun(List) when length(List) > 0 -> ok; (_) -> not_ok end.
#Fun<erl_eval.6.13229925>
2> F([1,2]).
ok
3> F({1,2}).
not_ok

length({1,2}) should result in a bad argument but it doesn’t. This also affects lists comprehensions and has surprised many

1> [ X || X <- [1, 2, foo, bar, 5, 6], X rem 2 == 0 ]. 
[2,6]

Normally foo rem 2 will result in a bad argument but in this case it is just skipped. This is a logical behaviour but sometimes not so obvious. I most often hear complains about this one when a set of corrupted data is being worked on and the guards silently fail not taking that failing piece of data into account. An example is that we want to update 10 rows in a database and we only update 9 it could turn out to be that one of the values we are iterating over is not an integer.

Solution: Learn it, get over it ๐Ÿ˜‰

5. Returning arbitrary {error, Reason}

This is a very common mistake and is bound to cause problems at some point in time. This is one of the most apparent cases where the Let it crash philosophy applies but still even some experienced developers fail to recognize it. Ponder the following example: Assume that you have a function that does something in a database and for the sake of argument lets say that you have manipulate some data before you update the database E.g:

update_db_value(Key, Value) ->
    case db:connect() of
        {ok, Con} ->
            NValue = term_to_special_format(Value),
            case db:write(Con, Key, NValue) of
                ok ->
                    ok;
                {error, Reason} ->
                    log(Reason),
                    db:disconnect(Con),
                    {error, unable_to_write_value}
            end;
        {error, Reason}
            log(Reason),
            {error, unable_to_connect}
    end.

Now there are several things to consider; are you suppose to return the errors? Will they (the errors) be understood by the above layer? Are you writing a library and thus the above layer adjusts to you? etc…

If we assume that this code is under a management application and that this code is glue code then we could argue that returning arbitrary error messages is a bad idea. If you find ourself in a situation like this then you need to ask ourself if you don’t need to go back to specification and define what to do rather then just returning “error-something”.

Usually, however, the choice depends on higher layers since a crash will propagate. Tests might not be enough, for example 1) your stubs of the management application might accept the return values but the real one might not. 2) There might be too many cases where you haven’t considered returning an error tuple and your code becomes very inconsistent (sometimes crashing, sometimes not) or 3) You might be defining the wrong behaviour in code and thus there is an inconsistency between code and specification/design. In my humble opinion I think the following would be better (assuming the circumstances allow):

update_db_value(Key, Value) ->
    {ok, Con} = db:connect(),
    db:write(Con, Key, term_to_special_format(Value)),
    db:disconnect(Con).

This particular point can be a subject for endless discussion so I’ll just stop here; my point is just simply: be careful about what you return from a function because error tuples don’t just disappear.

Solution: Read this and don’t blindly return {error, Reason}

6. #record{} ambiguity in function clause and function body

When you write #record{} in a function body the compiler will replace that with a tuple of the same arity as there are number of fields in the record definition and put the different positions in the tuple to the default values which are also specified in the record definition.

-record(foo, { bar = 1, baz }.

in a function body becomes:

{foo, 1, undefined}

Now you might have expected it to be replaced the same way in a function clause… well no not really. When the #record{} notation is used in a function clause it is replaced by a set of guards which are used to match the function clause. However, any variable you bind in the function clause will however still bind to the actual tuple and the correct values. An example, consider this:

-module(foo).
-record(foo, { bar, baz }).
function() ->
     r(#foo{}),
     r(#foo{ bar = 1 }).

r(#foo{ baz = undefined } = Foo) -> io:format("1: bar == ~p~n",[Foo#foo.bar]);
r(#foo{ bar = 1 } = Foo) -> io:format("2: bar == ~p~n",[Foo#foo.bar]).

If we didn’t know better we would think the output would be

1: bar == undefined
2: bar == 1

but when we run the example it shows

1: bar == undefined
1: bar == 1

In the source code example above; line 4 is replaced with r({foo, undefined, undefined}), and logically matches the first clause of the r/1 function on line 7. However on line 5 the line will be replaced with r({foo, 1, undefined}) but it will still match on the first function clause on line 7. We would think that it should match on line 8 because of an assumption that the record in the function clause on line 7 is replaced with {foo, undefined, undefined} which is not what we were trying to match (namely {foo, 1, undefined}) thus line 7 doesn’t match and we go to line 8. So what is going on?

Well what happens is that the record, as mentioned, is replaced with different things at different places. In a function body the #record{} notation is just replaced with {foo, undefined, undefined} but in a function clause the function clause is extended with a series of guards. The guards that are specified are derived from what we wrote in code, the rest are not checked. E.g. #foo{} is replaced with guards to check that the argument passed is a tuple, that it is the same arity as specified and that the first element is the atom foo but it doesn’t check anything about element 2 or 3. This means that if we write #foo{ biz = undefined } it will add a guard to check that biz == undefined (or rather the position known as biz). Not including biz is not the same as including it and setting the value to undefined (even though that is usually the “default” value). This means that when you run the above example the second statement doesn’t have any affect since the value in position bar does not change the matching of the function on line 7.

To show my point more clear you can compile the module above like this:

$> erlc -S foo.erl

A part of the output file for me shows:

...
{function, r, 1, 4}.
  {label,3}.
    {func_info,{atom,foo},{atom,r},1}.
  {label,4}.
    {test,is_tuple,{f,3},[{x,0}]}.
    {test,test_arity,{f,3},[{x,0},3]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {get_tuple_element,{x,0},2,{x,3}}.
    {test,is_eq_exact,{f,3},[{x,1},{atom,foo}]}.
    {test,is_eq_exact,{f,5},[{x,3},{atom,undefined}]}.
    {move,{atom,foo},{x,1}}.
    ...
  {label,5}.
    {test,is_eq_exact,{f,3},[{x,2},{integer,1}]}.
    {test,is_tuple,{f,6},[{x,0}]}.
    {test,test_arity,{f,6},[{x,0},3]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {get_tuple_element,{x,0},1,{x,2}}.
    {test,is_eq_exact,{f,6},[{x,1},{atom,foo}]}.
    ...

Note line 11 and 12 do not test the middle value ({x, 2}) thus this clause will match.

Solution: N/A, just learn the difference and don’t make assumptions about record values in a function clause.

7. gen_server, trap_exit and terminate/2

The terminate/2 callback function in gen_server is suppose to be considered the opposite of the init/1 function; Setup/Tear down. The truth however isn’t that the terminate/2 always runs, there are some preconditions that we need to be aware of.

If anything happens inside the gen_server itself or it issues a stop-tuple as a return then terminate/2 will always be called, which is logical. However if it is under a supervisor the documentation says:

If the gen_server is part of a supervision tree and is ordered by its supervisor to terminate, this function will be called with Reason=shutdown if the following conditions apply:

  • the gen_server has been set to trap exit signals, and
  • the shutdown strategy as defined in the supervisor’s child specification is an integer timeout value, not brutal_kill.

So in other words (but still very similar ones): If the the gen_server is shut down and it is not trapping exits then the terminate/2 function will not be called. This might seem strange to some because one always expects the gen_server to get a chance to “clean up” after itself but if you think about it it is logical; if a process gets an exit signal it should die with the same reason if it didn’t trap the signal. The gen_server code has a case clause which explicitly checks for exit signals and only then allows terminate/2 to be called.

The last statement wasn’t entirely true though. The documentation further states that:

Even if the gen_server is not part of a supervision tree, this function will be called if it receives an ‘EXIT’ message from its parent. Reason will be the same as in the ‘EXIT’ message.

Note: from its parent. In other words what was written in the previous section only applies to exit messages coming from the gen_server’s parent process which means that (if the processes is supervised) the supervisor is the parent. This means that if you start a gen_server process and that process in turn starts another process (which it links to) and that second process crashes then the first gen_server process will die without calling terminate/2, unless of course it traps exits and in this case it will only receive a message (received by handle_info/2).

All according to predictable behaviour but can be overlooked so think twice when it comes to restart strategies and trapping exits. E.g:

In the following examples I will use this module:

-module(gensrv).
-compile(export_all).

start_link(BoolFlag) -> gen_server:start_link(?MODULE, BoolFlag, []).

init(BoolFlag) ->
    process_flag(trap_exit, BoolFlag),
    {ok, undefined}.

handle_call({spawn_link, BoolFlag}, _, _) ->
    {ok, Pid} = gen_server:start_link(?MODULE, BoolFlag, []),
    {reply, {ok, Pid}, Pid}.
    
handle_info({'EXIT', _Pid, _Reason}, St) ->
    {noreply, St}.

terminate(_Reason, _St) ->
    ok.

If we use this module to first start a gen_server and then spawn a linked process under it we can observe this behaviour previously described. The below example shows a gen_server spawning another process and finally being ordered to shut down by its parent (the shell). In both processes terminate/2 is called.

> process_flag(trap_exit, true).     
false
> {ok, P1} = gensrv:start_link(true).
{ok,<0.389.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, true}).
{ok,<0.391.0>}
> exit(P1, shutdown).
true
(<0.389.0>) call gensrv:terminate(shutdown,<0.391.0>)
(<0.391.0>) call gensrv:terminate(shutdown,undefined)
> flush().
Shell got {'EXIT',<0.389.0>,shutdown}
ok

In this following scenario we start the two processes like before but the second one doesn’t trap exits (so we can kill it using reason shutdown from the shell).

> {ok, P1} = gensrv:start_link(true).
{ok,<0.407.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, false}).
{ok,<0.409.0>}
> exit(P2, shutdown).                                 
(<0.407.0>) call gensrv:handle_info({'EXIT',<0.409.0>,shutdown},<0.409.0>)
true
> exit(P1, kill).
true
> flush().
Shell got {'EXIT',<0.407.0>,killed}

Here we can see that even if we do trap exits the first process won’t shut down because it wasn’t the parent process that sent the exit signal, it was the process it spawned. Since the first process is still alive we kill it off at the end.

This third scenario shows the common misunderstanding about the terminate/2 function. In this example we start one gen_server which in turn starts another one (just like before) but this time the first one doesn’t trap exit but the second one does:

> {ok, P1} = gensrv:start_link(false).                
{ok,<0.422.0>}
> {ok, P2} = gen_server:call(P1, {spawn_link, true}). 
{ok,<0.424.0>}
> exit(P1, shutdown).
true
(<0.424.0>) call gensrv:terminate(shutdown,undefined)
> flush().
Shell got {'EXIT',<0.422.0>,shutdown}
ok

Even though both processes exit with shutdown they have different behaviour because one is trapping exists the other one isn’t. The first process receives an exit signal (reason shutdown) from its parent (the shell) but is not trapping exit and thus just exists with the same reason. The second process gets notified that its parent (the first process) exited with reason shutdown and since it is trapping exits it calls terminate/2.

This is all logical behaviour if you consider how processes and links work in general the only exception here are the rules added by the OTP behaviour of “shut down” signals which are really just a convention using the shutdown reason in an ‘EXIT’ message. Clever but can be confusing.

So in short; always remember:

  • If a gen_server process self terminates (I.e. it returns stop or an exit occurs inside the callbacks) then terminate/2 will always be called
  • A gen_server process will not have its terminate/2 callback called if it is not trapping exits
  • If a gen_server process is not trapping exits but its child processes are; then the child processes will have their terminate/2 functions called

Solution: Always spawn processes under a supervisor. If you don’t then make sure your own “top-level” process traps exits and cleans up after itself.

8. Trying to use record_info/2 in runtime

This pitfall will appear as an error when you compile but can waste time if you don’t know the idea behind record_info/2. Since records don’t really exists then their fields don’t exist either, well… not their names anyway. Records only exist in code but not in runtime; as mentioned before the records are simply just replaced with something else (tuples and/or guards). Record fields (as seen in code) don’t exist either and are only references for the compiler to do the right thing. This means that if we specify the record -record(foo, { bar, baz }) and later use #foo{ bar = 1 } then the compiler uses the identifier bar to know in which position in the tuple it should put the value 1 in it does not know the name bar in runtime.

This can be tricky in the beginning because one might think that the “functions” record_info(fields, Record) -> [Field] and record_info(size, Record) -> Size can be used in runtime when they actually can not. These functions are simply replaced by the parse transformation made before compilation. In order for them to work the record has to be defined somewhere in the module (or header file) and the record name has to be given explicitly.

This example will not work:

get_record_info(RecordName) -> record_info(fields, RecordName).

because during compile time the record name is not known and therefore it can not expand to anything.

This example will work:

get_record_info() -> record_info(fields, foo).

because it will simply be replaced (according to the record definition) to:

get_record_info() -> [bar, baz].

This also means that you can not make “dynamic” records and get their field names, it has to be known at compile time.

Solution: Understand that records are not “objects” or runtime constructs; they are only syntactic sugar.

9. Using and/or when you mean andalso/orelse

and and or evaluate both sides before determining an expression’s truth value while andalso and orelse evaluates the left side first and depending on its value decides if it evaluates the right side. These are called short-circuit expressions but actually are just acting like one would normally expect.

Example:

> true or exit(1).
** exception exit: 1
> true orelse exit(1).
true
> false and exit(1).
** exception exit: 1
> false andalso exit(1).
false

Solution: Only and/or or if there is an absolute reason to otherwise use andalso/orelse

Conclusion

Test more thoroughly and don’t make too many assumptions.

Peace.

EDIT: Fixed a few mistakes and spelling errors.

Categories: Erlang Tags:

Hacking entop

2010/09/15 1 comment

entop was built to be extendible, mostly because one can’t build a monitoring tool which suits everyone and every project. In this post I’ll go through how to extend entop and how to show the information that is important to you and/or your project.

Future releases of entop will have a better way to specify the callback modules that one wish to use when starting entop. This will make use of your own modules without the need to recompile entop every time and/or save some generic callback modules in between projects. Currently though you have to recompile entop but I’m getting ahead of myself.

The UI

The entop UI (just like top) has two sections; headers and a table.

The headers consist of three parts. The first part is the static information gathered from the remote node. This is the kind of information that will never change (or very very rarely changes) in between the time you poll. This is static data and is displayed on the first line. This is not intended to be changed (unless you change it in the code). The second part consists of four rows of data and are intended to be customized by the callback function, we will come back to this part later. The third and last part contains some information on how the rows in the table are displayed and how long it took to fetch the data from the remote node. This is displayed on line five (last line) and is not intended to be changed (again, unless you change the code).

The table consist of two parts; columns and rows (yes, really!). Column titles show what the information in each column is suppose to show and each row after that is the item (row/data) itself. Non-brainer. Both the column and the row are intended to be customizable to show as many columns as you want with what ever information you want.

The collector module

When entop starts up it tries to connect to the node that was given as an argument to it. If successful it will read the static data from that node and then start polling the node for the dynamic data. Before the polling starts, entop pushes the binary code of the callback module to the remote node and does so every time entop establishes a connection. Which module that is used as the collector module is currently specified directly in code but future versions of entop will enable to change this name as a configuration or a CLI argument.

Note: Because a module is pushed and loaded on the other node it is important that the name of the module doesn’t clash with something else so be careful when naming your modules.

The collector module needs to have one function exported, get_data/0. This function is the one that will be called every time the node is polled. This function will return data that is later used by a format module (we’ll get to that module later). The get_data/0 function MUST return a tuple of three:

get_data() ->
    ...
    {ok, HeaderData, TableData}

First element is the atom ok. The second element can be any Erlang term, it will be used later on to format the header lines. The header data must be something that makes sense to the format module. The third argument is a list of Erlang terms where each item in that list is data which will be used to format each row in the table. Each piece of data in the list must be something that makes sense to the format module. The default collector module in entop is currently entop_collector.erl.

Note: HeaderData can be any Erlang term and TableData must be a list of Erlang terms.

The format module

When the data has been fetched entop will call a format module to format the information it will display on the screen. To do this the format module needs to implement the init/1, header/2 and row/2 functions.

The init/1 will be called when the callback is initialized (once when the application is started and a connection was successfully established with the remote node). It is called with one argument which is the node name of the target node and must return a tuple of three:

init(Nodename) ->
    ...
    {ok, {ColumnSpec, DefaultColumn}, State}

First element is the atom ok. The second element is a tuple of two in which the first element is a column specification and the second element the default column number to sort on. The third element is the user defined state.

A column specification looks like this:

    Columns = [ {"Title", Width, Options}, {"Title2", Width, Options}, ...].

Each column is specified by a tuple of three. The first element, the title, must be a flat io list. The second element, the width, is a number which reserves that many characters to display in the column and should be larger than three, excess characters will be cut off. The third element, options, is a property list with options that can be applied to that column; currently there is only one such option: {align, ...} which specifies if the text should be aligned right or left (default left) in the column.

After the data has been retrieved from the remote node the header/2 function will be called first. The first argument to this function is the HeaderData which was returned by the get_data/0 function and the second argument is the user state data which was returned by init/1. The return value of this function must be a tuple of three as described below:

header(HeaderData, State) ->
    Line1 = "First row in the header (second after the node info)",
    Line2 = "Second row in the header ...",
    Line3 = "Third row ...",
    Line4 = "...",
    {ok, [Line1, Line2, Line3, Line4], NewState}.

The second element of the return value must be a list of length == 4. Each line in that list must be a flattened io list with no newlines and they have to be formatted the way the user wish to have them; entop will not touch them. Each line will be shown on row 1-4 in the header section of the gui. If a line is too long for the width of the screen it will be cut off.

Side note: I am aware that having a list with a fixed size is not very intuitive and I will probably change this to be a tuple of size four later on but currently this is because I’m lazy in the background implementation ๐Ÿ˜‰

After the headers have been processed the table will be populated. For each element in the TableData list, the function row/2 will be called together with the user state.

row(TableDataElement, State) ->
    ...
    {ok, {"Col1", [{test}], atom, 1337}, NewState}.

The function must return a tuple with the size of exactly the length of the column list specified in the init/1 function (size(ReturnTuple) == length(Columns)). The values in the tuple will be used to populate each cell in the table and can be any term, what ever term is put there it will be formatted, made into a string, flattened and truncated to the width of the column. If a particular line is to be skipped (not to be included in the table) the tuple {ok, skip, State} can be used.

Putting it together: An application viewer

So as a useful example let’s create a view that shows us all the applications on the target node which are loaded and which state they are in (if started or just loaded) and if they have a main process (I.e. the application is not a library).

I recommend starting by creating the view; this lets you think about what you want to show and how, it gives you a mock-up of the view. Start by creating the file entop_application_format.erl (never mind the corny name, what ever floats the boat for now :))

First we need to think about which columns we want to show. My idea was that we have six columns as we want to show the following information; Name, Description, Version, State (Loaded/Started), Type (permanent/temporary) and Pid (if it has one).

Enter the module and export attribute and implement the init/1 function as follows:

%% entop_application_format.erl
-module(entop_application_format).
-export([init/1, header/2, row/2]).

init(_Node) ->
    Columns = [{"Name", 13, []},
	           {"Description", 20, []},
	           {"Version", 8, []},
	           {"State", 8, []},
	           {"Type", 12, []},
	           {"Pid", 10, []}],
    {ok, {Columns, 1}, undefined}.

This specifies the six columns and their width. It also specifies that the first column is going to be the one to sort on by default and also that we don’t care about the state so we set it to undefined.

Usually I now create a dummy header/2 and row/2 function to give myself a mockup of how it will be (and perhaps to get the column width right) but I will skip that step here and go straight to implementation. Below is the header function which will be implemented:

header(HeaderProplist, State) ->
    SysMem = proplists:get_value(system, HeaderProplist),
    ProcMem = proplists:get_value(processes, HeaderProplist),
    AtomMem = proplists:get_value(atom, HeaderProplist),
    BinMem = proplists:get_value(binary, HeaderProplist),
    CodeMem = proplists:get_value(code, HeaderProplist),
    EtsMem = proplists:get_value(ets, HeaderProplist),

    Line1 = lists:concat(["System: ", SysMem, ", Process: ", ProcMem, ", Atom: ", AtomMem]),
    Line2 = lists:concat(["Binary: ", BinMem, ", Code: ", CodeMem, ", ETS: ", EtsMem]),
    Line3 = "Machine uptime: ",
    Line4 = proplists:get_value(machine_uptime, HeaderProplist),

    {ok, [Line1, Line2, Line3, Line4], State}.

To make the concept more clear I have bloated the function for readability. Lines 2-7 show various values that are read from a proplist; this means that we expect a proplist from the get_data/0 function which we implement later. During development I don’t make any assumptions on the data structures here but rather use dummy values and come back and change them to fit the real data structure after I have implemented the collector module. Lines 9-12 formats the four strings that we need and finally I return them as a list (remember; must be a list of four!).

After the header has been implemented we need to provide a callback to fill the rows in the table. The row function will be called for every element in the TableData. Here is the row function:

row({Name, Desc, Vers, Pid, Type}, State) ->
    {AppState, PidStr} =
        case Pid of
            not_running ->
                {"Loaded", "-"};
            undefined ->
                {"Started", "-"};
            Pid ->
                [_, Mdl, End] = string:tokens(erlang:pid_to_list(Pid), "."),
                {"Started", lists:concat(["<0.", Mdl, ".", End])}
        end,
    {ok, {Name, Desc, Vers, AppState, Type, PidStr}, State}.

As mentioned before I wouldn’t make any assumptions on the data structure I get in the function header but this time I happen to know ;). Most of this should be self explanatory but a note about line 9-10; if you retrieve a pid (as a pid type) from a remote node the pid reference on the local node will indicate that it isn’t a local pid by setting the first identifier to some number (other than 0). This is useful but ugly because in this case you want to make it seem as if you are on the other node showing its data. A hack to overcome this is to make the pid into a string on the remote side when you ask for it, this will return a string based on a local pid and thus the local reference format. Another hack is to get the remote pid, make it into a string, strip away the “remote” part of it and replace it with a “local” part. The latter is what I’m doing here. The return of this function is a tuple with a value for each column in the table, in our case exactly six since we have six columns.

The final part is to provide the collector module, call it entop_application_collector.erl:

%% entop_application_collector.erl
-module(entop_application_collector).
-export([get_data/0]).

get_data() ->
    HeaderProplist = [{machine_uptime, os:cmd("uptime")} |
                      erlang:memory([system, processes, atom, binary, code, ets])],

    AppInfo = application:info(),
    Loaded = proplists:get_value(loaded, AppInfo),
    Running = proplists:get_value(running, AppInfo),
    Started = proplists:get_value(started, AppInfo),

    MapFun = fun({Name, Desc, Vers}, Acc) ->
                      AppPid = proplists:get_value(Name, Running, not_running),
                      StartType = proplists:get_value(Name, Started, not_running),
                      [ {Name, Desc, Vers, AppPid, StartType} | Acc ]
                  end,

    Applications = lists:foldl(MapFun, [], Loaded),

    {ok, HeaderProplist, Applications}.

This function will first produce a proplist on which the header/2 function will be applied (line 6-7) and then produce a list of tuples for which the function row/2 will be mapped on (applied to every element produced in that list).

One last step: to specifiy the callbacks. Currently there is no easier way to do this but it will probably be part of future versions of the tool. To specify the callbacks edit the file entop.hrl and change the state record definition so that the field callback = entop_application_format and the field remote_module = entop_application_collector.

If you did it all right, it should look something like this:

If you want to continue experimenting try adding a “Crash’n Recovery” notification which says how many times an application has crashed and restarted during the time it has been monitored (Tip: Use the state to monitor the pids).

Good luck, have fun

Categories: entop Tags: , ,

Renaming ntop to entop

2010/08/16 1 comment

So the ntop application I released yesterday has received pretty nice feedback from friends and unknowns but apparently there is another ntop application out there; the “other” ntop is “Network Top” and is found at www.ntop.org. So I’m changing the name of my ntop to entop! It is a corny name… I know… but I liked the name ntop so this will just have to work + google doesn’t show anything software related when searching for entop (except for some Finish stuff which doesn’t look software related :D) so I just picked that name.

So… ntop is now entop which stands for “Erlang Node top”. Enjoy it here:

http://github.com/mazenharake/entop

Also; here is a new screen shot! It looks exactly the same as the previous one… but says entop ๐Ÿ˜›

Categories: cecho, entop, Erlang, ntop, Software Tags: , , ,

Announcing ntop – A top-like monitoring tool for Erlang nodes

2010/08/15 8 comments

The name in this post is old; the application is now named ‘entop’, just to clear that confusion.

Introducing ntop 0.0.1

ntop is a tool which aims to be similar to the unix tool ‘top’ but instead of displaying the OS processes it displays the processes (and various information) of a given Erlang node. If you don’t know what ‘top’ is then see this wikipedia page.

ntop uses cecho (must be version 0.3.0 or later)

I wasn’t too sure of what information one would want other then what I put in it; I have only my own and my collegues’ experience in what we need when monitoring systems so to make sure that this can fit anyone I made sure to make the columns and headers customizable enough to print out different information (which might be more relevant to other people). I’ll go through how to write a different version and how to extend ntop in a different post.

ntop is released under the 2-clause BSD license, do what ever you want with it. Here is a screenshot on how it looks like:

If you try it out then please let me know if you find it useful and if there is something missing or needed! I’m sure there are bugs as well and I’ll fix them as I go.

You can find it (and cecho) on my github page: http://github.com/mazenharake

Enjoy.

[Edit]: Doh! Forgot to give the link to my github page. ๐Ÿ™‚

Categories: cecho, Erlang, ntop, Software Tags: , ,

Euler 11

2010/07/08 1 comment

After packing yesterday I thought I’d spend the last half an hour to solve the Euler 11 problem. I had used pen and paper in the car on the way home (I don’t drive myself ๐Ÿ˜‰ to come up with a nice enough algorithm. However no matter how I twist and turn the problem I realized that I still need to check every possible solution. The first naive way on paper was to check every direction.

This is an example; assume we have a (zero indexed) 10×10 grid. and that our position is say at 4,4 (in blue) then the initial idea was to check all possible directions (below in red)


x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x

I soon realized that the opposite of each direction is just a reiteration of something that was already calculated. E.g. 4,4 -> 7,1 is the same as 7,1 -> 4,4 . This means that it is enough to just test half of these.


x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x

Now I only have to make sure I don’t get out of bounds. Since I’m traversing the grid from top right to bottom left i want to do as few checks as possible. First thing is that 3 out of 4 checks are made on the left side of the position, this means that I need to check that the position is x >= 3 and y > 3 and y < height – 3.ย  The last one only needs to be checked when the position is y >= 3 only. After that I simply take the max of those values.

This yields the following in python 3:

def local_max(y, x):
    llarge = drdlarge = ularge = dlularge = 0
    if x <= 16:
        # Check from position to right
        llarge = grid[y][x]*grid[y][x+1]*grid[y][x+2]*grid[y][x+3]
        if y >= 3:
            # Check from position diagonally right up
            dlularge = grid[y][x]*grid[y-1][x+1]*grid[y-2][x+2]*grid[y-3][x+3]
        if y <= 16:
            # Check from position diagonally right down
            drdlarge = grid[y][x]*grid[y+1][x+1]*grid[y+2][x+2]*grid[y+3][x+3]
    if y >= 3:
        # Check from position up
        ularge = grid[y][x]*grid[y-1][x]*grid[y-2][x]*grid[y-3][x]
    tmax = max(llarge, drdlarge, ularge, dlularge)
    # print("Local max:", tmax)
    return tmax

Then it is just a matter of iterating across the grid and get the largest value:

def traverse_grid():
    largest = 0
    for y in range(0, 20):
        for x in range(0, 20):
            candidate = local_max(y, x)
            if candidate > largest:
                largest = candidate
    return largest

if __name__ == '__main__':
    start = time.time()
    print("Answer:", traverse_grid())
    print("Completed in", str(time.time()-start), "seconds")

Two important lessons followed (This is the biggest reason to learn a language by actually USING it):

  1. On line 12 in the local_max() function I put x >= 3 by mistake. I thought everything was ok when I ran the code because I did get the right answer. However after look at the code when writing this blog entry (I wrote most of this yesterday evening and revised it today) I noticed that x >= is wrong! Python has a “feature” which is pretty dangerous if you don’t know about it. If you have a tuple e.g. grid = ((1,2,3),(4,5,6)) and then do grid[-1][0] the result is not a crash, it is simply 4. This is because in python you can slice a tuple as well! Very important detail!
  2. Because I usually program Erlang I sometimes put a comma at the end of a line by mistake. On line 5 in the traverse_grid() function I put a comma by mistake after the function call. This is another “feature” in python which means that I’m creating a one-value tuple. I was banging my head senseless trying to understand this late at night and simply extracted the first element but I couldn’t understand why. Today I asked my friend to be my Rubber duck and while explaining the code I realized what I was doing. This is also important to know because it “fails” silently if you don’t realize what is going on as an Erlang programmer

That’s all…

I’m off to Turkey! See you at the end of the month!

Categories: Python, Software Tags:

Project euler and Python3

I’m back. A lot of things happening right now… I’m getting married (is one of them) so I haven’t gotten around to update this page. Anyway I’ll start now again.

So I’ve decided to start learning a new programming language just to learn and get different perspectives of things (keep my brain fit :P). Erlang remains my number 1 language but I decided to learn python3. The reason is not important but I just wanted to know if python was any good.

I choose python3 over python2 because I didn’t like the fact that python2 was not going to be updated after 2.7 (which recently actually got released if I remember correctly). I am aware of the fact that much legacy code is written in python2.x but since I’m not using it commercially I don’t really care, it is for my own sake anyway.

To learn python I searched for a page with problems to solve (best way to learn a language is to use it) and I found projecteuler.net, I suggest that anyone who wants a challenge (in various levels) to go there. The page will explain more what it is but basically it presents a bunch of problems that you solve and you can solve them however you like (pen and paper even if you like that sort of things).

currently I’m at problem 11 which reads as follows:

In the 20ร—20 grid below, four numbers along a diagonal line have been marked in red.

08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48

The product of these numbers is 26 ร— 63 ร— 78 ร— 14 = 1788696.

What is the greatest product of four adjacent numbers in any direction (up, down, left, right, or diagonally) in the 20ร—20 grid?

I guess I could just brute force it but I’m trying to come up with a good solution in python, we’ll see how it works out.

All problems [that I manage to solve] in projecteuler I’m hoping to write solutions for in Erlang, Python, C and Javascript just to see how they are solved in different languages. The other day I saw a solution in Haskell (for problem 10) which very elegant which made me open my eyes for Haskell but I’ll see about that (Ocaml has been proposed as well and I really would love to have the time to look at Lisp or Scheme), I simply don’t have time for everything.

Oh… and I’m getting married so I guess that will put me off for a month or two ๐Ÿ˜›

Categories: Python Tags:

Eirc – An IRC client library for Erlang

I just released eirc. An IRC client lib. I haven’t done any proper testing and no performance testing either but it works for writing simpler IRCBots which was my intention.

Next step will probably be a git commit watcher/announcer which uses eirc to announce events from a git repo. I’ll figure out how to do that later.

Btw: Anyone who knows any guides on how to work with .git or bare repos (Read: How they are structured, which files that say what and how to extract various information from the binaries (if there are any)) then please let me know.

peace

EDIT:

The link to the library is here: http://github.com/mazenharake/eirc

Categories: eirc, Erlang, Software Tags: ,