Going away from bzr toward git

(this is a small rant about why I like bzr less and less and like git more and more; this is only a personal experience, not a general git vs bzr thing, take it as such).

Source control systems are a vital tool for any serious software project. They provide an history of the project, are an invaluable tool for release process, etc… When I started to develop some code outside school exercises, I wanted to learn one for my own projects.

Using svn

This was not so long ago – 3-4 years ago, and at that time, SVN was the logical choice. I wanted to use it on my machine, to keep history, and being able to go back; since I mainly code for scientific research, the time and rollback aspects were particularly important.

Using SVN did not really make sense to me at that time: Using it to track other projects was of course easy (checking out, log, commit), but I could not really understand how to use it for my own projects:

  • I could not understand their branches and tags concept. Note that I did not even know what those terms mean at that time; I did not understand why it would matter at all where I would put the tags and   branches, why I needed to copy things for tags, etc… From the svn-book, it was not really clear what the difference between branch and tags was.
  • Setting up svn on one machine is awkward: Why should I create a repository somewhere, and populate it from somewhere else ? How should I do backup of the repository ?
  • Getting back in time is unintuitive: you have to “merge back” in time the revisions you want to rollback. This is really error prone.

Bzr, the first source control which made sense to me

At the end, I found easier to just use tarballs to save the state of my projects (my projects are always quite small). Then, a bit more than two years ago, I discovered bzr (bzr-ng at that time): it was a better arch, the SCS developed by Tom Lord for distributed development. Arch always intrigued me, but was extremely awkward: it could not handle windows very well, there were strange filenames, and it was source code invasive. Even checking out other projects like rhythmbox was painful. bzr on the contrary was really simple:

  • Creating a new project ? bzr init in the top directory, and then adding the code and committing. No separate directory for the db, no “bzradmin” to create the repository
  • branches and tags (tags came a bit later in bzr, starting at version 0.15 IIRC) were dead easy: bzr branch to create the branch, no need to use some copy commands, etc.. tags are even easier.

I have used bzr ever since for all my projects; in the mean time, I have been much more involved with several open source projects, which all use svn, and I always felt svn was an inferior, more complicated tool compared to bzr. With bzr, I understood what branch could be used for, and more generally how a SCS
can be helpful for development.

Since bzr was so pleasant to use, I of course wanted to use it for the projects I was involved with, so I was really excited by bzr-svn to track svn repositories. Unfortunately, bzr-svn has never been a really pleasant
experience. One problem was that the python wrapper of libsvn were really buggy (to the point that bzr-svn has now its own wrapper). Also, it was extremely slow to import revisions, and failed on some repositories I used bzr-svn on. That’s how I started to look at other tools, in particular hg: hg had an ability to import svn, and it was more reliable than bzr-svn in my experience. But it was not really practical to use to commit back to svn repository, so I never investigated this really deeply.

Bzr annoyances

At the same time, there were some things which I was never thrilled by with bzr. Two in particular:

One branch per directory

That’s a conscious design decision from bzr developers. This means it is a bit simpler to know where you are (a branch is a path), but I find it awkward when you need to compare branches / need to “jump” from branch to branch. When you are deep down inside the tree of your project, comparing branches (diff, log, etc…) becomes annoying because you have to refer to branch form their path.

Revision numbers

Each commit is assigned a revid by bzr, which is a unique number per repository. That’s the number bzr deals with internally. But for most UI purpose, you deal with revno, that is simple integers numbers: of course, because of the distributed nature of bzr, those numbers are not unique for a repository, only within a branch. I find this extremely confusing. Again, this appears more clearly when comparing several branches at the same time. For example, when I have not worked on a project for a long time, I may not remember the relative state of different branches: the bzr command missing is then very useful to know which commits are unique to one branch. But the numbers mean different things in different branches, which mean they are useless in that case; being useless would have actually been ok, but they are in fact very confusing.

For example, I recently went back to a branch I have not worked on for more than one month. Let’s say my current development focus in in branch A, and I wanted to see the status of branch B. I can use bzr missing for that purpose. I can see that 5 revisions, from 300 to 305 are missing. I then go into branch B, and study a bit the source code, in particular with bzr blame. I see some code with revision under 300 in branch B, which I could not see in branch A. Now, this was confusing: any revision before 300 is in A too according to bzr missing, so how is it possible for bzr blame to report difference code in A and B, for a section commited with a revno < 300 ? The reason is that revision 305 is actually a merge, and when going through the detailed log in branch B, I can see that revision 305 contains 296.1.1, then 299.1, 299.2, 299.3 and 299.4. I can’t see how this a useful behavior. Maybe I am biased as someone doing a lot of math all day long, but having 296.1.1 after 304 does not make any sense to me. What’s the point of using supposedly simple numbers when they have arbitrary ordering, which changes depending on where you are seeing them ? SVN revno were already quite confusing when using branches, but bzr made it worse in my opinion.

Nitpicks

There were also things which were less significant for me, but still unpleasant: bzr startup is really slow, its use in script not really useful – if you want to do anything substantial, you have to study the plugin API. Also, it  tarted to become a bit inflexible for some things: for example, incorporating a second project also tracked by bzr into a first project is difficult (if not impossible; I could never manage to do it), history-related perations are often slow, using a lot of branches takes a lot of space unless you are using shared repository which feel like an hack more than a real solution, etc…

(Re)-Discovering git

About the same time, I had to use git for one project which I was interested in. I found it much easier to use than when I looked at it for the first time. There was no cogito anymore, the basic commands were like bzr. I decided to give git-svn a try, and it was much faster than bzr-svn to import some projects; the repositories were extremely small [1]. Also, although git UI is still quite arcane, I found git itself a pleasure to use: it felt simple, because the concept were simple – much more than bzr, in fact. sha-1 for revision are not awkward, because you barely use them at the UI level (git UI is very powerful for human-revision handling: no number, but you can easily ask for parent in a branch or in the DAG relatively to a given revision, you can look by commiters, by string in the commit or the code, by date, etc…); bzr revno feel like an hack after being used to git. For example, wherever I am, if I want to compare branch2 to branch1, in git I can do:

git log branch1..branch2
git diff branch1..branch2

Also, git is scriptable, which is appealing to the Unix user in me. I can understand the POV of bzr developers concerning extensibility with plugin (it is not unlike the argument of UNIX pipe vs Windows COM extensions as developed by Miguel in his Let’s make Unix not suck [2]), but I prefer the git model at the end. Bzr decision to go toward extensibility with plugins is not without merit: I  think the good error report from bzr is partly a consequence of this choice. OTOH, git messages can be cryptic; but git simplicity at the core level makes this much less significant than I first expected.

A key git difference compared to bzr is that git is really just a content tracker. It does not track directory at all, or filenames for example: it instead tries to detect when you rename files. I remember at least once  then this was mentioned on bzr ML [3], where a bzr developer argued that bzr could do like git, while keeping explicit meta information (when you tell bzr to rename a file). One obvious drawback is that depending on how the change was made to the tree, patch vs merge for example, bzr behavior will be different; this is very serious in my opinion. Specially for a language like python, where the files/directory name matters, directory renames should be quickly propagated, and can never be done lightly anyway. And it means git can be much better at dealing with renames when import external data, merge between unrelated branches, etc…  Because its algorithm for renames detection is used all the time, it has to work quite well. It is a bit similar to the merge capability of distributed SCS: there is no reason for them to be inherently better at merging, but because they would be unusable without good merge tracking capability, this has to work reliably from the start in DVCS. Even if in theory, bzr could detect renames like git (in addition to its explicit rename handling), in practice, it has not happened, and as far as I am aware, nobody has done any work in that direction.

Another advantage of git I did not mention, but that’s because it has been rehashed ad nauseam, and it is the most obvious one to anyone using both tools: git is incredibly fast. Many things I would never do with bzr because it would take too much time are doable with git; sometimes, git favor speed to much (in its rename detection, for example: you should really be aware of the -M and -C options in log and other history-related command), but even when telling git to spend time detecting renames, it is still much faster than bzr.

Finally, git is getting a lot of traction: it is used by Linux, Xorg, android, RoR, a lot of freedesktop projects, is being discussed for KDE. This means it will become even better, and that other DVCS will have a very hard time to compete. As a very concrete example: Git UI improvements were much more significant than bzr speed improvements during the last year (bzr speed has not improved much in my experience since 0.92 and the pack format: long history and network make bzr almost unusable for big projects with large history contributed by a large team across the world; OTOH, git 1.5.3 was the first git version which I could use without hurting my head too much).

For all those reasons – simplicity of the core model, flexibility, scriptability, and speed – I think I will start to use git for all my projects, and give up on bzr. I think bzr is still superior to git for some things, and
depending on the project or the tree you are tracking, bzr may be better (in particular because it tracks directories, which git does not, and this can matter; I am also not sure whether git would be appropriate for tracking /etc or your $HOME).

[1] for every project I have imported so far, the git clone is as big or smaller than a svn checkout; you read that right: one revision checked out from svn is often bigger than a full history; I have imported the full history of numpy, scipy, scikits on my github account, and I have not used much more than half of my 100 Mb account)

[2] http://primates.ximian.com/~miguel/bongo-bong.html

[3] https://lists.ubuntu.com/archives/bazaar/2007q3/028591.html

Advertisements

10 thoughts on “Going away from bzr toward git

  1. (You use the non-word “ackward” several times in the article; I think you want the word “awkward” instead.)

    I agree that Bazaar currently does not interact well with Subversion. I’ve been burned several times trying to use Bazaar as a “better Subversion client”, only to have arbitrary noise appear in the repository which causes existing Subversion users of the same repository to get quite annoyed.

    Does Git do any better at this; i.e. can one use Git as a Subversion UI, without any other user of the same branches needing to know? Or does Git not have the ability to interact seamlessly with an existing Subversion branch?

    The speed increases of Bazaar over the past year (since mid-2007, say) have been quite impressive from my perspective; I’m noticing much better network performance and commit times, just for a couple of examples.

    In the end, one thing that keeps me using Bazaar and that I don’t see in other VCSes is the flexibility to support different workflows on top of the same repository, *without* needing to change the tool, or even decide ahead of time which workflow to follow: one can switch to or from centralised checkouts at will, for example. http://bazaar-vcs.org/Workflows

  2. Thanks for the spelling remark, I fixed it. I did not notice that firefox spelling checker did not work inside wordpress edit box.

    git-svn does not leave anything in the svn repository, but it is much simpler than bzr-svn. In particular, you can’t create svn branches and hope to ‘mirror’ them in svn (I have never tried it with bzr-svn either, but I think it is possible). With rebase, this makes git a nice way to deal with patches: you create git branches, and you can manage relatively big patches against svn and updating them quite easily.

    Concerning the speed thing: my reference to say that bzr speed has not improved much is 0.92 (when pack format was introduced). I agree that pre-pack and post-pack was a drastic change in speed, specially for working trees. There are several dimensions when talking about speed: speed wrt working trees, speed wrt history size, speed wrt network latency. Long history are really slow in bzr (by long, I mean something like > 10000 revisions); I don’t know if you have tried bzr on emacs or python repositories, but I would say bzr is unusable as of today on those repositories. And neither repositories are very big (they have long history, though, of the order of 100 000 revisions); this is an important point, because you commit much more often with a DVCS than with svn. But network has become a killer for me (I live in Japan, and interact a lot with launchpad, I don’t know it that matters).

    For the flexibility, I find git much more flexible, because you can create your own model on the top of git. For example, there is no need for new concept to deal with multiple branches (no shared repository). I have never really used the different workflows (like bounded checkout, etc…). I can certainly use git for centralized development, since I use it on top of svn repositories.

  3. Hey Ondrej. I think that Stefan is getting converted to. I don’t think git is way better than mercurial: git does not work as well as hg (or bzr) on windows, for example.

    I agree that for branch handling, speed, and advanced usage, git is overly better than bzr ATM. The important thing is that git is not just faster and more complicated, some things are actually simpler with git than they are with bzr.

  4. Well, in fact, due to:

    http://code.google.com/p/msysgit/

    it actually works much better than hg, that must be used from the crappy windows cmd.exe. But you probably meant some GUI tools, that I think so far are better for hg. But I don’t want gui, just a terminal, and here I think git really rocks, including windows.

    It took me a little while to get used to remote branches:

    http://wiki.sympy.org/wiki/Git_hg_rosetta_stone#how_to_checkout_remote_branch

    but once I got it, I am really happy with git.

  5. Thanks for pointing to me to this on the SciPy list. Msys-Git did work for me after uninstalling the old cygwin version I had installed. I did run into a little trouble trying to get SSH public key support working. But with a little help from #git, I got it going.

    Then I watched this tutorial: http://excess.org/article/2008/07/ogre-git-tutorial/
    And was really just blown away.
    Git really seems to be able to do just about anything you can imagine.
    And I love the rich curses gui. I was always annoyed by the choice between Tortoise*** and dumb command-line tools. I like Git’s self-contained gui much better. And staging is great. I was surprised at first that you had to “add” things just to commit them. But so many times with SVN or Hg I’ll find I want to do a commit — except for this one file, etc. Or I’ll accidentally check in a bunch of files that I forgot that I had changed. And the per-hunk commits — wow. Just wow. The other tools aren’t even dreaming about features like that yet.

    I think Git does itself a disservice by promoting itself a merely “fast”. I always got the impression from that that it was “fast (but clunky to use and not very feature-rich)”. And that’s just not the case.

  6. @Ondrej:

    I work with some Windows tools that only work properly from cmd.com, so actually working properly with cmd.com is important to me. Fortunately msys-Git can do it either way.

    Also the Git gui actually works fine on Windows.

    I tried the hg Tortoise replacement and just couldn’t figure it out. It certainly didn’t bear much resemblance to TortoiesBzr or TortoiseSVN. After that I’ve not used any gui for Hg. Perhaps Hg’s gui just needs an excellent video tutorial like the one Git has in the link I posted above.

    And after watching that video I understand how the Git gui works and it’s really just great. But what’s also great is how much you can do via interactive command-line commands with git.

    Small rant: Personally I wish people would just cut it out with the Tortoise*** shell extensions already. I really don’t need to see the SVN status of my files all the time. And having to right-click and dig down a few menu levels to do anything is *TERRIBLE UI*. It also makes Explorer slow (especially TortoiseBzr — it makes Explorer basically unusable on a repo with a few hundred files). The icon caching processes they use also create locks to files which makes it impossible to safely eject removable hard drives. Very annoying.

    They should just make the Tortoise functionality available as a stand-alone special file browser, so that when I want to see that stuff I can have it, but when I’m done it can get out of my way. Kinda like how WinCVS used to work, though they had kind of a weird browser interface that was slightly different from the way Explorer works for no apparent reason.

    Anyway, I think I really like Git gui. It seems to show exactly the info you need in order to make decisions about what to commit and what to put in the log message. Really seems to be designed by someone who knew what they were doing in terms of workflow.

  7. Hi Bill,

    Yes, git blows away the competition on many fronts – and I agree that git is not just faster. Now that I am familiar with git, I will never go back to bzr for most things (except documents handling, etc… because it feels a bit safer in those cases, I don’t push them publicly). Even if bzr was faster than git, I would prefer git. The staging area takes a bit some time, but it is very helpful – particularly for merging.

    I am still not satisfied by the remote handling. It is still too low level, and I keep getting some strange errors about refs which are not helpful at all.

    Except that: git-svn has been a big time saver for me on numpy/scipy. I think I would have burned out without it.

  8. @david

    I would like to convert an SVN repo to git. I don’t really care about keeping in sync after that. Is git-svn the right tool to use even in that case?

    • It depends – if you have access to the svn dump, then git fast-import is the solution: http://www.kernel.org/pub/software/scm/git/docs/git-fast-import.html. I would guess that in that case, your git repository will have tags, and branches as git branches (tags are branches in git-svn, because of svn brain-damaged tag implementation).

      git-svn does not give you the whole power of git (for branches, etc… because you can’t puhs back git concepts to svn repo).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s