Recently I’ve had to re-engineer my whole backup strategy because of a looming hardware issue on my old backup server. Since I am operating servers at Hetzner, I purchased some backup space there and based my new system around their available system. The new strategy consists of several key components:
- Mounting a remote directory via sshfs
- Ensuring the remote mountpoint is always available using autofs
- Creating a loopback filesystem on the remote storage
- Incremental rsync snapshots using my Hactar backup script
I will explain all the necessary steps to set up this system on a Debian Linux system (my current system at the time of writing runs on version 7.8).
Install and configure autofs & sshfs
Both software packages are available through Debian’s apt package management system. Install them via:
apt-get install sshfs autofs
First make sure you can connect to your remote server via SSH without a password. To achieve this for the Hetzner backup space read the article on their Wiki: Backup Space SSH Keys. For any other SSH accessible host you can refer to my (rather old) explanation here on this blog: SSH public key authentication.
Create the directory you would later like to mount your SSH filesystem to. In my case I chose
Once you can reach your remote system without the need to input a password, proceed to configure autofs. Edit the file named
/etc/auto.master and add the following line:
/mnt/sshfs /etc/auto.sshfs uid=0,gid=0,--timeout=30,--ghost
Next create the file
/etc/auto.sshfs with the following contents:
my-backup-space -fstype=fuse,rw,nodev,nonempty,allow_other,reconnect,max_read=65536 :sshfs\#email@example.com\:/
Let’s break down each of the fields in this command.
my-backup-space– This will be the name of the directory in
/mnt/sshfswhere your remote directory will be available just as if it where a local mountpoint.
-fstype=fuse,rw,nodev,noempty,allow_other, reconnect, max_read=65536– These are the same parameters you would put in
/etc/fstab. The parameters
max_readare specific to sshfs. Read more details in the sshfs manpage.
:sshfs\#firstname.lastname@example.org\:/– This instructs autofs to use sshfs to mount this directory. The hash symbol (#) must be escaped or everything following behind it would be treated as a comment. Next are username and hostname or IP address for the machine where your backup will reside, separated by an @ symbol. The last colon must be escaped as well and is followed by the path on the remote system that should be mounted into your local directory (“/” in this example).
Next reload autofs using
service autofs reload (yes we are still in pre-systemd times currently on Debian 7.8). Now each time you access the specified directory (e.g.
/mnt/sshfs/my-backup-space) autofs will ensure that everything is available via sshfs without any interaction from your side.
Sparse image file
To be able to actually backup to the remote filesystem and retain all permissions, it is necessary to create a filesystem image on the backup host, which we will later mount and use as our actual backup target.
dd if=/dev/zero of=/mnt/sshfs/my-backup-space/backup.img bs=1 seek=500G count=1 mkfs.ext4 /mnt/sshfs/my-backup-space/backup.img
The seek parameter will create a sparse image on the remote directory, meaning it won’t actually fill every bit of space with zeroes which could take a very long time over a network connection.
Now to also automatically mount this loopback device add the following line to
/mnt/auto /etc/auto.img --timeout=30
Next create the file
/etc/auto.img with the following contents
/backup -fstype=ext4,defaults,sync,dirsync,commit=1,loop :/mnt/sshfs/my-backup-space/backup.img
This will mount your filesystem image as /backup on your local machine. With those steps behind us, we can start to actually save files and create our backup strategy.
Incremental rsync snapshots
Previously my backup “strategy” just consisted of a nightly rsync run via cron using the command
rsync -avz --files-from=rsync.includes --delete / email@example.com:/backup/hostname, which is nice but it just gives you a clone of the directories designated in rsync.includes without any data retention. This time around, I wanted to be a bit more sophisticated and have a backup solution with the following features:
- Backup all of the files, only excluding a few defined directories that don’t need restoration in case of system restore
- Keep old backups for a defined number of days
- Have everything available in one neat script that takes a few arguments for configuration
With those points in mind I wrote Hactar, an incremental daily rsync backup script. What it does will be outlined below.
On my server Hactar lives in
/usr/local/bin/hactar so it is available in my path, but in theory it could be saved anywhere. Just make sure you give it executable permissions using
chmod +x hactar. Hactar by default expects an rsync excludes file in
/etc/hactar.excludes but you could also alter this path in the script itself or pass it on the commandline with
-e /full/path/to/excludes.file. This is my excludes file with what I think are reasonable defaults for a webserver running Debian Linux:
/backup /bin /boot /dev /lib /lib64 /lost+found /media /mnt /proc /run /sbin /selinux /sys /tmp /var/cache /var/lock /var/run /var/tmp /var/swap.img /var/lib/mysql
The basic usage of the script is
hactar [OPTIONS] SOURCE TARGET, where source and target can be local or remote directories using the familiar rsync syntax.
Here are a few example calls to hactar:
hactar -v -e my_exludes / /backup hactar example.com:/home /backup hactar /var/www example.com:/mnt/backup hactar -r 7 /var/log /backup/logs
What the script does, is create a new subdirectory in the specified target directory that is named after the current day in the form yyyy-mm-dd (e.g. 2015-01-01). On the first run Hactar will rsync everything (minus excluded directories) to this subdirectory and you will have a full snapshot of the system you want to back up. On the next run (on the next day), the script will create another new subdirectory parallel to the previous and use the previous day’s subdirectory as a hardlink target for the current day. This means rsync will only transfer files that are new or changed, into the new subdirectory and hardlink everything else to the previous day. That way this system only uses as much space as the new and changed files add to the backup but still the current directory will include a full backup snapshot.
The default retain time is 29 days. This means Hactar will delete the oldest subdirectory after 30 days when it’s time for its daily (nightly) run, after it created today’s backup. Since we hardlinked all files, there won’t be any problems with files that where linked to the now deleted snapshot. Hardlinked files still remain on disk as long as there is at least one link pointing to them.
With the script and the excludes file in place I just make sure I can access each server I want to back up via passwordless SSH login and add another line to my crontab.
0 1 * * * /usr/local/bin/hactar / /backup/localhost > /dev/null 0 2 * * * /usr/local/bin/hactar example.com:/var/www /backup/webserver > /dev/null
Those two example lines will backup everything from localhost and all website files from a remote system into my locally mounted backup space at 1 a.m. and 2 a.m. respectively. Using
> /dev/null after each cronjob command will ensure no script output will clutter your syslog but will send you an email in case anything unusual happens during the nightly backup run.
I am always open for comments, questions and critique on this system, just use the form below. Also, I am very happy for any pull requests on the Hactar GitHub Repository to further improve the script’s usability and squash any bugs that evaded me thus far.
Some of you might have spotted that my default excludes file has the line
/var/lib/mysql. That’s because I save all databases using mysqldump separately twice per day and don’t want the redundancy of having the data files in my backup as well. Since this method could potentially cause integrity issues when used on a running MySQL server (although I have thankfully never experienced any in the +10 years I have used it), I am very open to suggestions about better database backup solutions. Ideally I would like to do this via LVM snapshots, but none of my current servers has its databases on LVM volumes at the moment, so I am looking for alternatives with minimal downtime and maximal integrity reliability.