Why Longstop?

Graham T. Smith

Introduction

Backup is an important activity for anyone using a computer. However, choosing a backup strategy is not simple, and many of the appllications in the *Nix world are not that easy to use. This document intended as an overview of current technologies and their weaknesses and strengths, and the issues that Longstop is intended to address.

A backup or archive is a copy of selected data from a machine over time. A single copy of a file structure or mirror of file structure is not in itself a backup unless there is a mechanism to track changes over time. While one can have the sudden loss of data for various reasons, what is much more frequent is a gradual corruption or loss of data integrity over time. Merely replicating or mirroring data does not give any protection against the latter process.

As important as backing something up is getting that something out of the backup. So any backup tool needs to know what is kept where.

Storage

The first question one needs to ask where to put your backup data there are obviously a number of options. Firstly any backup solution that relies on storage internal to the machine is not a safe backup solution. Any backup or archive solution requires storage to be separate from the machine being backed up. Ideally one also needs at least two distinct backup solutions. It is surprising how many times a backup solution fails when you need it (or maybe Murphy just does overtime on such occasions).

An external disk drive

With the current price of disk storage this is an obvious option. However, this is not a flexible option and does not allow for more traditional media rotation strategies. Buying two drives (minimum) to every single drive in use for real data is not really a practicable option. It should be noted that some people will suggest that disk mirroring or RAID is a backup strategy. It is not. It is strategy to ensure fault tolerance not data integrity. Two copies of corrupted data are just as bad as one copy of corrupted data.

It is quite common for people to use old data drives for this purpose, a practice which personally I would suggest undermines the primary reason for backup. Ideally your backup medium (or device) should at least be as reliable as the machine you are backing up.

At best this is good for short term emergency recovery or data migration. For more robust solutions something else is required.

Networked storage

This can be quite expensive. However, even in the home and small business environment there now affordable options. It could be argued that an external USB drive is such a solution but the devices dedicated to this purpose usually have additional security options. The one issue that remains is how do you backup the backup (see above).

Tape

The old faithful backup solution. Tape backup units are usually expensive, and the media is expensive for any capacity that is reasonable for modern use. This is a reliable and well trusted solution for a medium to large enterprise but not a usuable solution for other users. Cheap tape units can also be chronically unreliable to use, and their tapes equally so.

DVD/CD

It is not surprising this medium is neglected as a backup option. The media is remarkably prone to data error. Most audio or video media formats can handle this but this can cause problems with computer data. So any backup solution that utilises this media will need to cope with the possiblity of data errors. However, it does has some advantages. It is cheap, one can extend the amount of media you require as you require it rather than purchase all you need up front. With the advent of double layer and high density disk formats the potential to store quite significant amounts of data on this medium exists.

Software

The second question is what software to use. I would for a small business or home user not recommend any solution that requires tape units. There are a number of options to use.

GUI based solutions may be attractive but the number of files which can be involved probably limits a GUIs utility in selecting how and what to archive in an easily comprehensible manner. (Currently my backups involve nearly a million files, tuning this through a GUI interface usually produces some interesting problems).

Some common tools that I know that are in use are...

Rsync

This is very good for disk or remote snapshot backups. This is an 'Industrial Strength' tool. There are quite a few utilities which extend rsync capabilities to do effective backup or make it more accessable to the less technically skilled. However, this at its core is a disk replication tool. It is not really suitable for use with removable media. Having said this, if one has the facilities it is a very good idea to use Rsync as part of a backup strategy.

Drive Imaging (Ghost, dd etc)

Not a backup solution. What this is good for is when one has a machine into the state that one wants it in, with all the applications you want configured and installed, one can make an image of that machine so that if for any reason one has to rebuild from scratch this can be easily done. Monolithic drive images are usually extremely vulnerable to corruption, and time consuming to produce and validate.

Tar,Dar and KDar

Tar is a well understood archiving format. One can use Tar to create archives on drives or DVD media but corrupted compressed Tar Archives can be difficult to recover. Dar introduces some interesting concepts but the file management option are more limited than Tar and using it with removable media is not always easy. KDar is the KDE front end for Dar but is really only useful for generating Dar commands which can be modified later.

Longstop as an alternative

Longstop archives files to CD or DVD and uses a fairly simple command file syntax to control the backup. Apart from a knowledge of Perl style regular expressions there should not be many major technical challenges in creating a command file.

The location of files on media is tracked by a MySQL database backend. The main backup script is intended for use in a cron process so is command line only. The scripts to list what is archived or extract from an archive are currently command line only but could be given a GUI/CUI front end if sufficient interest exists.

Many file orientated backup systems merely copy files. With databases and some other applications this may be problematic (particularly when migrating between software versions). Longstop has the capability to backup MySQL databases and Subversion repositories by using the respective application dump tools. Control of how a backup or archive is performed can be defined by directory or file, so directory structures (such as email) which may be awkward to recover from differential or incremental backups can be included appropriately within the same backup operation.

Longstop uses Tar format files but rather than compress the tar files, files are compressed and then tarred. Optional checksumming is performed on these files and if one these tar files (or contents) should be corrupted the extract process will continue to look for copies of files that have not been extracted across the available media in a particular media set.