Linux administration

Maintain an Effective Data Backup Strategy

Nguyen Hai Chau
Vietnam National University

Reasons for Backup

  • Disks fail
  • Bugs in software can cause corruption
  • Configuration mistakes by administrator
  • Accidental deletion or overwriting (e.g., with rm, mv or cp)
  • Malicious deletion or virus attack
  • Theft of machines with hard drives in
  • Fire, or other disasters which can destroy hardware

Backup Media

  • Traditionally, backups have been made onto tapes
    • Can store lots of data on reasonably cheap tapes
  • Copying to a different hard disk
    • There is a risk of losing the backup along with the original
    • Better if on a remote computer
  • CD writers can be used to store backups on CDs
    • Convenient for long-term storage
    • Handy to remove to remote locations

Types of Backup

  • Full backup — includes everything of importance
    • Might not include system files which are from the install CD
    • Can include a lot of files, many of which hardly ever change
  • Differential backup — only includes changes since last full backup
    • Nightly backup only needs to include files changed since the last full backup
    • Recovery requires the full backup on which it was based
  • Incremental backup — only includes changes since last backup
    • Nightly backup only includes files changed in the last 24 hours
    • Recovery requires the last full backup and a complete sequence of incremental backups after that

Backup Strategy

  • The backup schedule should be regular and well known by those who rely on it
    • It must be decided what to backup and what can be left out
  • Typically a full backup is done once a week or once a month
    • Daily changes are recorded in a differential or incremental backup each night
  • Large sites might have more than these two levels to their strategy
  • Monthly tapes might be kept for a long time, in case a really old file becomes important

Archiving Files with tar

  • tar can package up files for distribution or backup
    • Originally for "tape archive" manipulation
    • Files can be stored anywhere
  • Encapsulates many files in a single file
    • Known as a tar archive or a tarball
  • Has unusual command-line option syntax
    • Common options are given as single letters
    • No hyphen is needed
  • tar must be given exactly one action option
    • Indicates which operation to perform
    • Must be the first option

Creating Archives with tar

  • Use the c option to create an archive
  • For example, to create an archive called docs.tar.gz containing everything in the documents directory:
$ tar czf docs.tar.gz documents
  • f specifies the archive's filename
    • Must be followed directly by the filename
    • Common to use .tar extension
    • Any subsequent options require a hyphen
  • The z option compresses the archive with gzip
    • .tar.gz extension used to indicate compression
    • .tgz extension also popular
  • The list of files and directories to archive follows the options.

Listing the Files in tar Archives

  • To check that a tar file has been made correctly, use the t operation (for 'list'):
$ tar tzf docs.tar.gz
  • The z and f options work as for the c operation
  • To show more information about files, add the v (for 'verbose') option
    • Shows information similar to ls -l
    • Can also be specified with c to list filenames as they are added

Extracting Files from tar Archives

  • Use the x operation to extract files from an archive:
$ tar xzvf docs.tar.gz
  • The v option lists the files as they are extracted
  • To extract individual files, list them on the command line:
$ tar xzvf docs.tar.gz documents/phone-numbers.txt
  • Other useful options:
    • k (--keep-old-files) will not overwrite any existing files, only extracting missing ones
    • p (--preserve-permissions) will set extracted files to have the permissions they had when archived

Device Files for Accessing Tapes

  • Under Linux, tape drives are accessed through several groups of device files
  • Each device group has number, with the first drive numbered 0
  • These are the most commonly used devices:
    • /dev/st0 — SCSI tape drive, which will be automatically rewound after each operation
    • /dev/nst0 — the same drive, but with no automatic rewinding
    • /dev/nft0 — the same without rewinding
    • /dev/ht0 — ATAPI tape drive
    • /dev/nht0 — the same without rewinding

Using tar for Backups

  • Tape drive devices can be read and written directly by tar
  • To write a backup of /home to the first SCSI tape drive:
# tar cvf /dev/st0 /home
  • We haven't used compression (the z option)
    • This might make the backup slower, at least on less powerful machines
    • Compressing the whole archive would make it much less resilient against corruption
  • In the example the auto-rewinding device is used, so the tape will be rewound after tar is finished, and the archive can be extracted:
# tar xvf /dev/st0 /tmp/restored-home

Controlling Tape Drives with mt

  • mt can move tapes backwards and forwards, and perform other operations on them
  • Usage: mt [-f device] command [count]
  • The -f option sets the tape device to use (e.g., /dev/st0)

    • The default is usually /dev/tape, which should be a symlink to a non-rewinding device, like /dev/nst0
  • These are some of the more common commands:

    • fsf, bsfm — move forwards and backwards one (or count) files
    • eod — go to the end of the valid data
    • rewind — go to the start of the tape
    • offline — eject the tape

Deciding What to Backup

  • Being selective about what is included in the backups can drastically reduce the time and space taken

  • For example, /bin, /sbin, /lib and /usr could be restored from an installation CD

    • But it might still be worth backing them up to make restoration simpler
  • The things which are most likely to be important in backups are:

    • /home
    • The CVS repository, or other places where project work is stored
    • Some directories under /var (particularly email)

What Not to Backup

  • Some other areas which shouldn't be backed up are:
    • /tmp — usually doesn't contain anything of lasting value
    • /proc — automatically generated by the kernel
    • /dev — if using devfs this is also generated automatically
    • /mnt — some media mounted here, like CD ROMS, typically aren't backed up
    • Filesystems mounted remotely whose backup is taken care of elsewhere

Scripting Backup

  • It is common to have a script to perform backups each night

    • Might perform different types of backup, e.g., a full backup on Saturday night and a differential one on other nights
  • Such a script can be run with cron, making backup automatic

  • Example scripts are readily available on the WWW

Other Backup Software

  • cpio — alternative archiving program
  • afio — similar to cpio, but allows files to be compressed individually
    • Compressed archives can be made which are resilient to corruption
  • amanda: Amanda is an open source software that works on Unix/GNU Linux and Windows. It supports native backup utilities and formats such as GNU tar for backups on Unix/Linux.

Exercise 1

  • a. Create a single file in your home directory containing a backup of all the contents of /etc/.
  • b. Create another archive with the same contents, but compressed to save disk space. Compare the sizes of the two archives.
  • c. List the contents of each of your archives.
  • d. Extract the contents of one of the archives into a directory under your home directory.
  • e. Create a new subdirectory, and extract a single file from the archive into it.

    • Your current directory must be the one into which to extract.
    • You will need specfiy the path of the file to be extracted, but without a leading slash.

Exercise 2

With an archive of /etc/, extracted under your home directory:

  • a. Modify at least two files in your extracted copy. Re-extract one of them from the archive, losing your changes in that one file but preserving your changes elsewhere.
  • b. Delete some files in your extracted copy. Make tar discover which these are and re-extract them from the archive, without clobbering changes made to other files.

Exercise 3

  • a. Produce a list of the names of all files under /home/ which have been modified in the past day. Only include regular files in this list.
  • b. Create a tarball containing all files under /home/ changed modified in the past day. Why is including directories in this list not sensible?
  • c. Create a tarball containing all files on the system that have changed in the past day, and which are in directories you deem worthy of being backed up.
  • d. Set up a cron job to make a daily incremental backup of the system.
    • It should run at 18:00 every day.
    • The files created should be stored under /var/tmp/backup/.
    • Each day's backup should be in a file named with that day's date, such as /var/tmp/backup/2017/0111.tgz