Ian Bicking: the old part of his blog

Re: Distributed vs. Centralized Version Control

I don't think current centralized SCMs can accomodate large numbers of development branches. The problem comes when mainline development activity is merged into those branches to produce a more up-to-date branch. In CVS and svn, such a merge duplicates the mainline changes--even changes to unmodified files--into the back-end storage system. So, a hundred actively maintained development branches means your mainline development is consuming a hundred times as much space. I've brought up cleverer ways of representing big merges in Subversion, but there hasn't been a whole lot of interest so far. For the most part, projects seem happy storing patches in an issue-tracking system to represent small changes--not a very elegant solution, but one which eliminates 95% of the efficiency problem with branches.

The distributed SCM answer is to use a hundred times as many hard drives to hold the data. That works fine, but the perception of distributed functionality as an anti-feature is not unique to you: other projects, like gcc, would much rather see development go on "in the open" in the central repository, rather than off to the side where it's harder to find.

Comment on Distributed vs. Centralized Version Control
by Greg Hudson


I just stumbled across this thread again, and noticed that something appears to have gone uncorrected here. Subversion does do branching cheaply. The whole tree is not copied into the branch: subversion internally stores what amounts to a symbolic link.

It may be the case that if you check out from the top of the tree, you'll get multiple copies of the files. I'm just pointing out how the repository itself works. (This is all described in the svn book.)

# Kevin Dangoor

I don't think that's the duplication Greg is talking about. What's he's referring to is this sequence:

  1. Branch trunk -> foo (cheap, no dup files)
  2. hack hack hack on trunk (modify 20 files)
  3. Merge trunk changes to foo (oops)

At stage 3, we've now got 20 modified files in foo, for which we are storing diffs in both the trunk and in foo. Instead, the modified trunk files could be "re-branched". You'd still need a new revision in the branch, obviously, but it would just be with the new pointers. This is probably yet another issue that really needs true branch and merge history tracking.

# Steve Greenland