Archive

Archive for the ‘Philosophy’ Category

Bridging SVN to Git

At work I have a problem. The problem is that I am doing many things at the same time… Now usually this is not a problem but in this case it is. The reason why this is a problem is because I’m the developer and maintainer of a few components that are in both production (I need to bugfix etc), development (I need to add enhancements etc) and research (trying out design and concept) and I need to maintain several versions of the component in each of these. But the core problem is that I don’t have the right tools to do this effectively. Since we are using SVN as a version control system I’m somewhat limited in how I can work.

To make it a bit more annoying I also have 2 laptops that I work on. One which is my personal one which is more customized, have the applications I want and I can install what ever I want on. The other one is my work laptop which must have certain applications, I can not add what ever I want and a few times a day automatically updates my Windows OS and reboots. My work laptop is honestly a pain in the ass sometimes.

Anyway. There are three things I would really like to achieve which I can’t when using only SVN:

  1. I want to be able to commit my work without having access to a central server
  2. I want to be able to create branches and merge between them without keeping track of when they were made or last merged and also I want to create an arbitrary number of branches to test stuff offline
  3. I want to be able to easily transfer my work from one laptop to another without going through the central server including a merge when I do so. E.g. If I made a small change on my personal laptop and then switched to my work laptop and did more work without the change make in my personal laptop and then want to change that work back to my personal laptop I don’t want to overwrite what I did there previously, I want to merge it!
  4. I want someone else to be able to contribute to the component so that if someone else noticed something while I was away and fixed it then they should be able to allow me to pull it in without sending me an email saying here is the diff (bleeeh!) without me having to update from the svn repo and get conflicts

That’s what I want(!) and fortunately I can achieve all this with Git (the rest that comes with it is just extra!). I should mention that there is something called git-svn but Since I don’t want to work with a whole repo, just a part of it, I didn’t want something as heavy as git-svn. Git-svn supports almost everything and tries to “merge” the git and svn commands which is not what I want… I find it confusing and unnecessary.

Now the problem I had to solve is how do I create a bridge where I can have my own structure with a central repo for things that are ready to be commited to SVN? So after a few minutes I realized that his is what I want:

So this is a step-by-step guide to how I achieved this; maybe someone out there will find it useful.

1. Check out your directory from the SVN repo
Check out the directory you want to bridge. SVN creates a one “.svn” directory in each subdirectory that is part of the SVN tree (including the top directory). This directory is changed on every action that you do in that directory (and every parent directory will know about it). Git doesn’t do that, it sees things differently (fortunately! you will understand why later), git has only 1 directory which it will create. Anyway so check out the directory you want to track. Below is an example:

:~> svn co svn://hostname/repo/trunk/app
A     app/file1
A     app/file2
...
A     app/fileX
Checked out revision 2823.
:~>

2. Initiate a Git repo inside the SVN repo
Ok, so after we have checked out the SVN directory we need to initialise the Git repo inside the directory that was created by SVN.

:~> git init
Initialized empty Git repository in .git/
:~>

3. Have SVN ignore the .git directory
Since we don’t want to track the .git directory in SVN we have to let SVN know that we should ignore it. This is done by putting a propset on the top level directory (where the .git directory is). Here is how it is done:

:~> svn propset svn:ignore ".git" .
property 'svn:ignore' set on '.'
:~>

Node the last character in the command (the period), it sets the property on the current directory.

4. Have Git ignore the .svn directories
When we push and pull back and forth we don’t want git to care about .svn directories because they change every time any of the SVN tracked files change and we want to be able to push/pull to the staging directory without git getting confused. This is how it is done in Git:

:~> echo -e ".svn\n*/.svn/*" >> .git/info/exclude
:~>

A few things to note; the first .svn is to ignore the top-level .svn directory. The “*/.svn/*” pattern is to ignore all the rest of the .svn subdirectories. Here you can add more patterns; as an example I put the string “.svn\n*/.svn/*\n*.beam\n*.app” to ignore Erlang binaries and the Erlang application file (which my Makefile generates). If you are using other languages you can just adjust to whatever you want to ignore. Personally I’m only interested in tracking source files

5. Add the files you want to make available to Git
Now all your files are tracked by SVN and SVN is also ignoring git. Git however is not tracking anything. So do this by explicitly adding all the files you want to have available in the Git “cloud”, or if your exclude is specific enough you can do like this example:

:~> git add ./*
:~> git commit -m "Initial commit"
10 files changed ...
...
create mode 100644 file10
:~>

That’s it. Now you can pull this directory from where ever you want and commit to the stage directory until you are ready and can pull/push between computers. Have fun! 🙂

!WARNING!
There is (what I consider a really serious) bug in Git. This is how it happens:

  1. Create two directories E.g. foo/ and bar/ and initiate a git repo in both
  2. Create a file “baz” in both directories E.g. ‘echo “data” > foo/baz’ and ‘touch bar/baz’
  3. Go to foo/ and add the file to the git repo and commit it. ‘git add baz; git commit -m “whatever” ‘
  4. Go to bar/ and do a pull; ‘git pull ../foo/’ Git will silently overwrite the file in bar/

This behaviour is utterly stupid; git should warn of existing files or, even better, try to merge the files! I haven’t found where to submit a bug report yet but I will as soon as I find out where.

Hope this is useful 🙂 Good luck!

Categories: Philosophy, Software

TDD, time wasting and Psuedo-Tests

2009/11/04 5 comments

I’m probably going to be shot, stabbed, burned drowned and buried because I’m writing this but I hope that I reach out to enough people before I go out!

My friends, watch out! TDD can be a really huge timewaster.

When did you know exactly how you expected your application/component to behave? Probably you didn’t until you tried a few things out, then sketched on some paper then thought some more then tried some other stuff again and then got back to paper etc. After that you had a perhaps clearer idea of what it was that you want to create. This is prototyping and it should be the first stage in any new/enhanced application. Once this has been done and you know the properties of your system then you can commence TDD.

If you didn’t do prototyping and you naively started writing your unit-tests then you have just wasted your time. I really don’t understand how people can argue that they should write tests first before they implement anything. If you write a test and expect it to mysteriously define your application then you are better off just using water-fall. I don’t feel very “agile” when I’m sitting there writing unit tests for something I will develop just to realize that I need to re-write the tests every 10 minutes. ALL the tests that I write before I write code I need to constantly rewrite… This is a black hole for time.

Instead I propose psuedo-tests.

Before I continue some people are probably going to argue that I “didn’t do it right” or that “it works for me you must be doing something wrong” well I don’t really care because you probably wasted your time and you didn’t know about it. I tried the whole lot and it is such an annoying thing that I don’t understand how it has become so popular. TDD should have a big IF in front of it like this: “IF you are going to do TDD, then make sure that you KNOW what it is you are going to develop”.

So what are psuedo-tests?

A psuedo-test is a test that doesn’t test anything it is just used to conceptually affect the design of your code. It is like psuedo-code but instead describes how the tests will run (instead of how your algorithm or code will run). This has the following benefit on your development

  1. Psuedo-code is _very_ easy to change. You don’t have to care about the actual interfaces or compilation or running or changing details everytime your code change.
  2. Your code takes testing into consideration. Things like get/set functions and various introspection function are taken into account, you seldom have to add these later. In Erlang e.g. it is usual to have a start/0 function instead of start_link/0 for testing purposes.
  3. The tests become simpler and smaller and easier to understand. Why? Well because you don’t have to rewrite your tests 600 times, you rewrite less often. Many people forget that tests are code as well and re-writing something and building on top of things in a very fast pace creates bulky code.
  4. If you write enough psuedo-tests then the step to property-based testing is not far (using tools like QuickCheck etc).

Example: We want to create a supervisor, then one test could be:

test_sup() ->
  Start supervisor
  Start a child under the supervisor
  Check that the supervisor has 1 child
  Check that the child of that supervisor is alive
  Kill the child
  Check that the supervisor has 1 child
  Check that the child of that supervisor is alive
  Stop supervisor

Then during implementation you realize that it would be a good idea to kill the children if the supervisor was stopped so instead of implementing it in your test code immediately you simply update your psuedo-test first adding

test_sup() ->
  ...
  Stop supervisor
  Check that the child is dead

This is much faster and you don’t have to care about implementation details… you can get on with your coding. The test is your specification and your guide without wasting your time and as you can see… this isn’t very far from a property if you define it well enough. Then just abstract this test to handle N children and you are good to go for a property test. When you have something substantial to test then implement your tests and run it!

NOTE however:
I am NOT saying that you shouldn’t test! That is not at all what I’m saying… you should always properly test your code. I’m just saying don’t spend all your time writing silly tests just because you are doing “TDD” or whatever… have the tests make sense! Test something substantial that make sense!

Just a thought… Now I’m going to bed…

 

Categories: Erlang, Philosophy Tags:

in case in case this happens

2009/10/15 2 comments

I truly hate nested case clauses in Erlang they are the root of all that is unholy and should be banned. I have what some consider a fanatic principal when it comes to case clauses and that is: Never have more then one case clause per function. Sure you will argue that it’s not that bad and if you know what you are doing it’s fine and sometimes you can’t avoid it and bla bla bla!

Well guess what, it sucks and I got solid proof to back that!

Why nested case clauses suck

consider this code:

case foo() of
  not_foo ->
    case bar() of
      not_bar ->
        case baz() of
          not_baz ->
            {error, none};
          yes_baz ->
            baz
        end;
     yes_bar ->
        bar
    end;
  yes_foo ->
    foo
end.

And OPS you get a cause_clause error in your live system… and it happens a little bit now and then and you have no idea why or when and your tests sure aren’t finding this really nasty edge case… so What do you do now??. You could try to trace foo(), bar() and baz() but tracing on a live environment isn’t a very good idea… besides you have to leave your trace on for possible hours because the bug might not happen very often! Well you could argue that if you don’t know what the possible return values are you designed it wrong… well that is kindof the point I’m trying to get across! There are a lot of versions on how to “re-write” this.. an example is the classic “construct a tuple” shortcut:

case {foo(), bar(), baz()} of
  {_, _, yes_baz} -> baz;
  {_, yes_bar, _} -> bar;
  {yes_foo, _, _} -> foo
end.

I put the first clause matching the right hand side first on purpose to show that this short cut really doesn’t make sense. In this case you are running 3 commands (which might very well even have dependencies) and you only want to know one of them. In case of true/false functions this is another common type:

case (foo() orelse bar() orelse baz()) of
  true ->
    %% do something...
  false ->
    %% do something else...
end.

problem here is that you don’t know which one succeeded or caused the truth value. In the end I have found that breaking this down in three functions is the best:

f() ->
 case foo() of
   true -> foo;
   false -> b()
 end.

b() ->
 case bar() of
   true -> bar;
   false -> c()
 end.

...

It would still give you a more readable code, if you get an error such as case clause then you know where it is. You wouldn’t want to trace a live system but imagine that you still did, then you would in this case have to trace at least three cases in the nested case clause version while in this version you can trace only 1 (This is still a fairly naive way of looking at it, but it is still better).

NOTE: Seriously, don’t do it though (trace a live system), I turned on a trace in a live system once and by mistake put a trace on a very actively used function, we had to kill the node! but being cleaver as we usually are the node was in such a way that it could safely be killed because we had other nodes that we could fail over on without loosing any data (hahaa!)… but in the end I still was lucky and it was a stupid thing to do :D.

If you have 1 case clause per function and you have small functions then you are much much more likely to know where an error will end up, it becomes clear as day… you know what you called, and with what argument so you should be able to narrow down the reasons for the error. Simply put; you can’t trace case clauses and that means a lot.

I think I shall call this one the “single clause” philosophy (since it probably applies to if clauses as well). The fact is that it has helped me shorten my functions a lot. Probably the number one mistake that people do in functional programming do is to treat functions other then functions; using if statements instead of pattern matching etc. Functions should be small (maybe 5-10 lines, very rarely more) and do only 1 thing (unless it is a glue function which its purpose is to call other functions).

Pattern matching is the power of all righteous.

Categories: Erlang, Philosophy Tags: ,