Backup and Synchronization with Git
Posted by Ryan Coyner on November 24, 2008 under Git - 3 Comments
I have a desktop and a laptop both running Arch Linux. The desktop used to run Ubuntu, but I was a little frustrated with its bloat and complexity. Arch is faster, leaner and easier to configure thanks to its simple model. Being a power user, it didn't make sense for me to keep Ubuntu.
I've started to use my desktop more regularly since installing Arch on it. This eventually became a problem because when I am on my desktop there is bound to be a file on my laptop that I desperately need, and vice versa. No, not porn. Usually homework assignments or a critical piece of a project I'm working on.
I was determined to solve this problem because it hindered my productivity significantly. The first solution that immediately came to mind was rsync. I did a little reading up on it and concluded that it wasn't the tool for me. In order for the synchronization with rsync to be practical you have to synchronize at the end of every session because rsync doesn't handle conflicting merges like a VCS does. When I'm working from my laptop I'm not guaranteed to be connected to the net to complete synchronization. If I don't complete synchronization at the end of each session, rsync won't be able to include separate modifications I make to a single file from different machines when I synchronize later.
So why the hell not just use a VCS? There is no reason not to, so that's exactly what I did - I put my entire home directory into a Git repository. Not only does it give me fast and reliable synchronization, I also get version control. Sweet.
Setting Up a Remote Repository
I hosted the repository on my server and setup a remote repository:
$ mkdir /var/git/ryan.git $ cd /var/git/ryan.git $ git --bare init
Depending on your system setup you may have to change some ownerships and privileges.
Adding Only Necessary Files
I initialized an empty repository from one of my machine's home directory:
$ git init
And added the remote repository:
$ git remote add origin ssh://username@serveraddress:port/var/git/ryan.git
At this point I had to be careful as to what I added. I didn't want to add any directories that are already under version control because that would be redundant. The directory containing all my music was also excluded; otherwise the repository would have become insanely large and exploded. I also had to be careful as to which hidden files I added. Some hidden files contain hardware information, cached information, or information specific to a machine that I didn't want to synchronize. I listed those files and directories in .gitignore:
.Xauthority .bash_history .config/epdfview/ .config/gtk-2.0/ .easytag/ .dbus/ .fehbg .fontconfig/ .gegl-0.0/ .gimp-2.6/ .java/ .lesshst .local/ .macromedia/ .mozilla/ .mpd/ .openoffice.org/ .opera/ .recently-used .recently-used.xbel .ssh/ .subversion/ .thumbnails/ .vimbackup/ .viminfo /media/music/ /projects
Now add the entire home directory and commit:
$ git add . $ git commit
To make committing simple, rename .git/hooks/prepare-commit-msg.sample to .git/hooks/prepare-commit-msg and append this at the end of the file:
NODE=`uname -n` sed -i "1 s/^/$NODE/" "$1"
This will put the machine name at the beginning of the commit message. I'm too lazy to write a proper commit message each time just for synchronization, and you can't commit without writing a message. This lets me commit without having to have to type a single letter.
Synchronization
Once it's been committed, push to the repository:
$ git push origin master
From the other machine, clone the repository:
$ git clone ssh://username@serveraddress:port/var/git/ryan.git
If you've already cloned it and just want to update it, pull instead:
$ git pull origin master
I've used this setup for about two weeks now and it's been working great. I've been consistent about pushing at the end of my sessions and pulling at the beginning of my sessions so I haven't faced any major file conflicts yet.
Pros
- Automatic Redundant Backup - Not only do you have your files in multiple machines, you also have a repository you can checkout from at anytime. This works great if you have a server. I haven't touched my external hard drive ever since I set this up.
- Version Control - Made a mistake? Roll back. Modified the same file from both machines? Conflict handling. All the tools of Git are at your disposal.
- Fast - Git is really fast and has excellent compression.
Cons
- Adding Files & Commiting - You need to remember to add files and commit often. I'm used to using Git so this is only a minor annoyance. There may also be a clever way to fully automate the committing process.
- Space - You need space for your repository. Not only for the repository itself, but also the .git/ directory. Right now my .git/ is hovering at 61 MB. The files in the home directory that are version controlled totals to 94.5 MB. Make sure your hard disk is not hurting for space.

3 Comments
Jakub Narebski November 25, 2008 at 08:21
Why not use some sync tool with merge (and diff) capabilities like Unison?
Ryan Coyner November 26, 2008 at 11:31
That would be because I have never heard of Unison until now. I will give it a whirl later but just from reading the features it doesn't seem to offer a decisive advantage over Git. I might do a comparison of the two later.
Dieter_be January 2, 2009 at 18:15
More cons:
1) I think it's inefficient with large binary files, if they change regularly. I'm not sure, i forgot exactly how the compression works.
2) no notion of owner/group/permissions
3) no support for empty directories (unless using workarounds)
Gibak is one of the solutions to these problems. See http://eigenclass.org/hiki/gibak-0.3.0