May 10, 2008
Beginner to Beginner: rsync exclude-from
Oh, I am so about to make a fool of myself in public…
I now have a D-Link DNS-323 plugged into my home network. It’s a network storage device that I want to use as a centralized backup for my family’s various computers because some of us don’t always plug our Macs into our USB external hard drive to let the Mac Time Machine work its backup magic. Unfortunately, the hack I found on the Net to get Time Machine to recognize the DNS-323 doesn’t work for me: Time Machine lets me say I want the backup to be housed on the DNS-323, but the software craps out when it actually tries to back up to it. If there’s an easy way around that, I’d love to hear about it.
In the interim, I’ve been playing with rsync, a command-line utility included in Leopard that does backups. I’ve had no luck with rsyncX, which is a Mac specific version, but rsync is working. It took some doing to get it running on the DNS-323, including installing fun plug (the DNS-323 is a linux box) and writing a config file that specifies which machines rsync recognizes. My Linux hacker nephew Greg did that part of it for me. (Thanks, Greg.)
There’s a script that enables rsync to mimic Time Machine. It’s been working pretty well — my hourly backups go far slower than they should, so I’m undoubtedly doing something wrong — but I had a heck of a time telling it which directories I want it to back up. You gain control over the backup set by specifying a file of inclusions and exclusions. You do this in the rsync command line by saying “–exclude-from filename” where you replace “filename” with the name of the file that has the list.
After a bunch of Internet research and way too much trial and error, I now have a list that does what I want, although I’m sure it’s laughably kludgy, and possibly fatally wrong. Nevertheless, here’s how I think it works…
The file can list both includes and excludes. You indicate which is which by prefacing each item with a + or a -. The list assumes that the root directory is whichever one you specified in the rsync command line. So, if your command line said that you want to back up “/Users/me/”, then you would tell it to exclude “/Users/me/junk” by putting the following line in your exclude-from file:
– junk/
Likewise, to include /Users/me/importantstuff/ you’d put in the line:
+ importantstuff/
But, at least in my experiments, that line will not include any subdirectories of importantstuff. After failing to understand the instructions I found on the Net, and after a lot of trial and error, I’ve found that it works if I also include the line:
+ importantstuff/**
The double stars tell it to backup all the subdirectories and all their subdirectories, ad infinitum. I’ve found I have to put in both the line without the stars and then the line with the stars. You’d think the line with the stars would be enough, but in my tries and my errors, it wasn’t.
The list of inclusions and exclusions is sensitive to the order of the list. If you have particular subdirectories you want to exclude (e.g., importantstuff/junk/), put them first:
– importantstuff/junk/**
If you want rsync to backup only designated directories, list your excludes first, then your includes, and end with
– *
which tells it to exclude anything you didn’t already tell it to include. I have the feeling that that may be an ugly hack with unintended consequences. Remember, I don’t know what I’m doing.
So, my exclude-from file looks roughly like this:
– *Azureus*/
– *Azureus*/**
– Documents/TiVo*
– Documents/Aptana*
+ Sites/
+ Sites/**
+ Pictures/
+ Pictures/**
+ Music/
+ Music/**
+ Documents
+ Documents/**
– *
Two important notes: 1. The -n parameter on the command line will run rsync in “what if” mode, showing you what it would do without actually doing it. 2. As I’ve likely made some embarrassing and awful mistakes, please read the comments in hopes that some knowledgeable and kind soul will correct me.