Backup and Synchronization with Git

Posted by Ryan Coyner on November 24, 2008 under Git - 3 Comments

I have a desktop and a laptop both running Arch Linux. The desktop used to run Ubuntu, but I was a little frustrated with its bloat and complexity. Arch is faster, leaner and easier to configure thanks to its simple model. Being a power user, it didn't make sense for me to keep Ubuntu.

I've started to use my desktop more regularly since installing Arch on it. This eventually became a problem because when I am on my desktop there is bound to be a file on my laptop that I desperately need, and vice versa. No, not porn. Usually homework assignments or a critical piece of a project I'm working on.

I was determined to solve this problem because it hindered my productivity significantly. The first solution that immediately came to mind was rsync. I did a little reading up on it and concluded that it wasn't the tool for me. In order for the synchronization with rsync to be practical you have to synchronize at the end of every session because rsync doesn't handle conflicting merges like a VCS does. When I'm working from my laptop I'm not guaranteed to be connected to the net to complete synchronization. If I don't complete synchronization at the end of each session, rsync won't be able to include separate modifications I make to a single file from different machines when I synchronize later.

So why the hell not just use a VCS? There is no reason not to, so that's exactly what I did - I put my entire home directory into a Git repository. Not only does it give me fast and reliable synchronization, I also get version control. Sweet.

Setting Up a Remote Repository

I hosted the repository on my server and setup a remote repository:

$ mkdir /var/git/ryan.git
$ cd /var/git/ryan.git
$ git --bare init

Depending on your system setup you may have to change some ownerships and privileges.

Adding Only Necessary Files

I initialized an empty repository from one of my machine's home directory:

$ git init

And added the remote repository:

$ git remote add origin ssh://username@serveraddress:port/var/git/ryan.git

At this point I had to be careful as to what I added. I didn't want to add any directories that are already under version control because that would be redundant. The directory containing all my music was also excluded; otherwise the repository would have become insanely large and exploded. I also had to be careful as to which hidden files I added. Some hidden files contain hardware information, cached information, or information specific to a machine that I didn't want to synchronize. I listed those files and directories in .gitignore:

.Xauthority
.bash_history
.config/epdfview/
.config/gtk-2.0/
.easytag/
.dbus/
.fehbg
.fontconfig/
.gegl-0.0/
.gimp-2.6/
.java/
.lesshst
.local/
.macromedia/
.mozilla/
.mpd/
.openoffice.org/
.opera/
.recently-used
.recently-used.xbel
.ssh/
.subversion/
.thumbnails/
.vimbackup/
.viminfo
/media/music/
/projects

Now add the entire home directory and commit:

$ git add .
$ git commit

To make committing simple, rename .git/hooks/prepare-commit-msg.sample to .git/hooks/prepare-commit-msg and append this at the end of the file:

NODE=`uname -n`
sed -i "1 s/^/$NODE/" "$1"

This will put the machine name at the beginning of the commit message. I'm too lazy to write a proper commit message each time just for synchronization, and you can't commit without writing a message. This lets me commit without having to have to type a single letter.

Synchronization

Once it's been committed, push to the repository:

$ git push origin master

From the other machine, clone the repository:

$ git clone ssh://username@serveraddress:port/var/git/ryan.git

If you've already cloned it and just want to update it, pull instead:

$ git pull origin master

I've used this setup for about two weeks now and it's been working great. I've been consistent about pushing at the end of my sessions and pulling at the beginning of my sessions so I haven't faced any major file conflicts yet.

Pros

Cons

3 Comments

Jakub Narebski November 25, 2008 at 08:21

Why not use some sync tool with merge (and diff) capabilities like Unison?

Ryan Coyner November 26, 2008 at 11:31

That would be because I have never heard of Unison until now. I will give it a whirl later but just from reading the features it doesn't seem to offer a decisive advantage over Git. I might do a comparison of the two later.

Dieter_be January 2, 2009 at 18:15

More cons:
1) I think it's inefficient with large binary files, if they change regularly. I'm not sure, i forgot exactly how the compression works.
2) no notion of owner/group/permissions
3) no support for empty directories (unless using workarounds)

Gibak is one of the solutions to these problems. See http://eigenclass.org/hiki/gibak-0.3.0

Post a Comment

Notoriety

Delicious

Recent Entries

11/24/08 - Backup and Synchronization with Git

11/4/08 - Configuring Special Keys on a ThinkPad T60 (Arch Linux)

9/16/08 - Building Packages in Arch Linux

9/6/08 - Setting Up Wireless on a ThinkPad T60 (Arch Linux)

7/28/08 - Customized Logging in Apache 2.2

Recent Comments

1/2/09 - Backup and Synchronization with Git

12/29/08 - Building Packages in Arch Linux

12/26/08 - Building Packages in Arch Linux

11/26/08 - Backup and Synchronization with Git

11/25/08 - Backup and Synchronization with Git