Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rjbs (4671)

  (email not shown publicly)
AOL IM: RicardoJBSignes (Add Buddy, Send Message)
Yahoo! ID: RicardoSignes (Add User, Send Message)

I'm a Perl coder living in Bethlehem, PA and working Philadelphia. I'm a philosopher and theologan by training, but I was shocked to learn upon my graduation that these skills don't have many associated careers. Now I write code.

Journal of rjbs (4671)

Monday September 10, 2007
08:37 PM

splitting up a "big" git repo, take two

[ #34411 ]

Converting to git has been fun, educational, and annoying, at various times. Here are some notes (mostly to myself) on the fun, educational, and annoying bits that I solved today.

When I converted my personal Subversion repository (notes on my RPG, talks, letters, config files) to Git, I just turned one big Subversion repo into one big Git repo. It was about 400 megabytes, in Git.

I decided, a little later, that there was no reason that all my huge Keynote slides needed to be in the same place as my letters to my grandmother or my notes on my RPGs. A little digging made it seem like this would work just fine:

$ git filter-branch --subdirectory-filter letters HEAD

With that, my history would be rewritten to only include changes in the letters directory, which would become root. When I did it, I found that everything was still there -- but listed as a new or modified file. That was easy to deal with, too:

$ git reset --hard

Great! Now I had only my fifteen letters with twenty total revisions. Only one small problem:

$ du -sh ../letters
381M letters

$ git-gc && git-prune
$ du -sh ../letters
381M letters

I couldn't figure out what the hell was going on, mostly because I didn't know much about how git stored things. More importantly, I didn't realize that when cloning a git repository from one directory to another on the same filesystem, git-clone will make hardlinks. Then, when cleaning up, it will see that files are in use and not purge them. They're not actually taking up any more space, and after I had removed the original, big repository that I'd cloned, a cleanup would have worked.

I realized this later, when reading the docs because of another problem. I had long since removed the trimmed-down repositories, because I thought they were taking up too much space. To deal with the problem from the get-go, I did something like this:

for dir in talks/exporter rpg/deliverance letters;
  do (
    cd $dir
    git filter-branch --subdirectory-filter talks/$dir HEAD
    git reset --hard
    cd ..
    git clone --no-hardlinks $dir _$dir
    cd _$dir
    git gc --aggressive
    git prune

The important thing here is the --no-hardlinks option to git clone. Obviously, it prevents hardlinking, instead making copies of everything. With that done, the gc and prune commands can work as I had expected, removing all the objects not used in this particular repo.

More and more, I find that git is a really well-designed system. I wish I had a reason to learn much more about it!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Thanks for the write-up: I was trying almost the same thing! I had imported a pretty big repository (made up of various subprojects) to git, which is sorta what you described a bit ago []. I had also renamed some directories along the way, so when I tried to separate them up with git-filter-branch, it was failing with collecting all the revisions needed.

    I ended up writing two helper [] scripts [] to solve my problem, and I wish I hadn't glossed over --no-hardlinks because that would have saved quite a bit of time.