Blog

Howto – local and remote snapshot backup using rsync with hard links

Introduction

Your Linux servers are running smoothly? Fine.
Now if something incidentally gets wrong you need also to prepare an emergency plan.
And even when everything is going fine, a backup that is directly stored on the same server can still be useful:

  • for example when you need to see what was changed on a specific file,
  • or if you want to find the list of files that were installed or modified after the installation of an application

This article is then intended to show you how you can set up an open-source solution, using the magic of the famous ‘rsync’ tool and some shell-scripting, to deploy a backup system without the need of investing into expensive proprietary software.
Another advantage of a shell-script is that you can easily adapt it according to your specific needs, for example for you DRP architecture.

The proposed shell-script is derivate from the great work of Mikes Handy’s rotating-filesystem-snapshot utility (cf. http://www.mikerubel.org/computers/rsync_snapshots).
It creates backups of your full filesystem (snapshots) with the combined advantages of full and incremental backups:

  • It uses as little disk space as an incremental backup because all unchanged files are hard linked with existing files from previous backups; only the modified files require new inodes.
  • Because of the transparency of hard links, all backups are directly available and always online for read access with your usual programs; there is no need to extract the files from a full archive and also no complicate replay of incremental archives is necessary.

It is capable of doing local (self) backups and it can also be run from a remote backup server to centralize all backups to a safe place and therefore avoid correlated physical risks.
‘rsync’ features tremendous optimizations of bandwidth usage and transfers only the portions of a file that were changed thanks to its brilliant algorithms, created by Andrew Tridgell. (cf. http://bryanpendleton.blogspot.ch/2010/05/rsync-algorithm.html)
‘rsync’ is also using network encryption via ‘ssh’.

The script let you achieve:

  • local or remote backups with extremely low bandwidth requirement
  • file level deduplication between backups using hard links (also across servers on the remote backup server)
  • specify a bandwidth limit to moderate the network and I/O load on production servers
  • backup retention policy:
    • per server disk quota restrictions: for example never exceed 50GB and always keep 100GB of free disk
    • rotation of backups with non-linear distribution, with the idea that recent backups are more useful than older, but that sometimes you still need a very old backup
  • filter rules to include or exclude specific patterns of folders and files
  • integrity protection, the backups have a ‘chattr’ read-only protection and a MD5 integrity signature can also be calculated incrementally

 

Installation

The snapshot backups are saved into the ‘/backup’ folder.
You can also create a symbolic link to point to another partition with more disk space, for example:

ln -sv mnt/bigdisk /backup

Then create the folders:

mkdir -pv /backup/snapshot/{$(hostname -s),rsync,md5-log}
[ -h /backup/snapshot/localhost ] || ln -vs $(hostname -s) /backup/snapshot/localhost

Now create the shell-script ‘/backup/snapshot/rsync/rsync-snapshot.sh’ (download rsync-snapshot.sh):

#!/bin/bash
# ----------------------------------------------------------------------
# created by francois scheurer on 20070323
# derivate from mikes handy rotating-filesystem-snapshot utility
# see http://www.mikerubel.org/computers/rsync_snapshots
# ----------------------------------------------------------------------
#rsync note:
#    1) rsync -avz /src/foo  /dest      => ok, creates /dest/foo, like cp -a /src/foo /dest
#    2) rsync -avz /src/foo/ /dest/foo  => ok, creates /dest/foo, like cp -a /src/foo/. /dest/foo (or like cp -a /src/foo /dest)
#    3) rsync -avz /src/foo/ /dest/foo/ => ok, same as 2)
#    4) rsync -avz /src/foo/ /dest      => dangerous!!! overwrite dest content, like cp -a /src/foo/. /dest
#      solution: remove trailing / at /src/foo/ => 1)
#      minor problem: rsync -avz /src/foo /dest/foo => creates /dest/foo/foo, like mkdir /dest/foo && cp -a /src/foo /dest/foo
#    main options:
#      -H --hard-links
#      -x --one-file-system
#      -a equals -rlptgoD (no -H,-A,-X)
#        -r --recursive
#        -l --links
#        -p --perms
#        -t --times
#        -g --group
#        -o --owner
#        -D --devices --specials
#    useful options:
#      -S --sparse
#      -n --dry-run
#      -I --ignore-times
#      -c --checksum
#      -z --compress
#      -bwlimit=X limit disk IO to X kB/s
#    other options:
#      -v --verbose
#      -y --fuzzy
#      --stats
#      -h --human-readable
#      --progress
#      -i --itemize-changes
#    quickcheck options:
#      the default behavior is to skip files with same size & mtime on destination
#      mtime = last data write access
#      atime = last data read access (can be ignored with noatime mount option or with chattr +A)
#      ctime = last inode change (write access, change of permission or ownership)
#      note that a checksum is always done after a file synchronization/transfer
#      --modify-window=X ignore mtime differences less or equal to X sec
#      --size-only skip files with same size on destination (ignore mtime)
#      -c --checksum skip files with same MD5 checksum on destination (ignore size & mtime, all files are read once, then the list of files to be resynchronized is read a second time, there is a lot of disk IO but network trafic is minimal if many files are identical, log includes only different files)
#      -I --ignore-times never skip files (all files are resynchronized, all files are read once, there is more network trafic than with --checksum but less disk IO, log includes all files)
#      --link-dest does the quickcheck on another reference-directory and makes hardlinks if quickcheck succeeds
#        (however, if mtime is different and --perms is used, the reference file is copied in a new inode)
#    see also this link for a rsync tutorial: http://www.thegeekstuff.com/2010/09/rsync-command-examples/



# ------------- the help page ------------------------------------------
if [ "$1" == "-h" ] || [ "$1" == "--help" ]
then
  cat << "EOF"
Version 2.00 2012-08-31

USAGE: rsync-snapshot.sh HOST [--recheck]

PURPOSE: create a snapshot backup of the whole filesystem into the folder
  '/backup/snapshot/HOST/snapshot.001'.
  If HOST is 'localhost' it is replaced with the local hostname.
  If HOST is a remote host then rsync over ssh is used to transfer the files
  with a delta-transfer algorithm to transfer only minimal parts of the files
  and improve speed; rsync uses for this the previous backup as reference.
  This reference is also used to create hard links instead of files when
  possible and thus save disk space. If original and reference file have
  identical content but different timestamps or permissions then no hard link
  is created.
  A rotation of all backups renames snapshot.X into snapshot.X+1 and removes
  backups with X>512. About 10 backups with non-linear distribution are kept
  in rotation; for example with X=1,2,3,4,8,16,32,64,128,256,512.
  The snapshots folders are protected read-only against all users including
  root using 'chattr'.
  The --recheck option forces a sync of all files even if they have same mtime
  & size; it is can verify a backup and fix corrupted files;
  --recheck recalculates also the MD5 integrity signatures without using the
  last signature-file as precalculation.
  Some features like filter rules, MD5, chattr, bwlimit and per server retention
  policy can be configured by modifying the scripts directly.

FILES:
    /backup/snapshot/rsync/rsync-snapshot.sh  the backup script
    /backup/snapshot/rsync/rsync-list.sh      the md5 signature script
    /backup/snapshot/rsync/rsync-include.txt  the filter rules

Examples:
  (nice -5 ./rsync-snapshot.sh >log &) ; tail -f log
  cd /backup/snapshot; for i in $(ls -A); do nice -10 /backup/snapshot/rsync/rsync-snapshot.sh $i; done
EOF
  exit 1
fi




# ------------- tuning options, file locations and constants -----------
SRC="$1" #name of backup source, may be a remote or local hostname
OPT="$2" #options (--recheck)
HOST_PORT=22 #port of source of backup
SCRIPT_PATH="/backup/snapshot/rsync"
SNAPSHOT_DST="/backup/snapshot" #destination folder
NAME="snapshot" #backup name
LOG="rsync.log"
MIN_MIBSIZE=5000 # older snapshots (except snapshot.001) are removed if free disk <= MIN_MIBSIZE. the script may exit without performing a backup if free disk is still short.
OVERWRITE_LAST=0 # if free disk space is too small, then this option let us remove snapshot.001 as well and retry once
MAX_MIBSIZE=20000 # older snapshots (except snapshot.001) are removed if their size >= MAX_MIBSIZE. the script performs a backup even if their size is too big.
#old: SPEED=5 # 1 is slow, 100 is fast, 100000 faster and 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BWLIMIT=100000 # bandwidth limit in KiB/s. 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BACKUPSERVER="rembk" # this server connects to all other to download filesystems and create remote snapshot backups
MD5LIST=0 #to compute a list of md5 integrity signatures of all backuped files, need 'rsync-list.sh'
CHATTR=1 # to use 'chattr' command and protect the backups again modification and deletion
DU=1 # to use 'du' command and calculate the size of existing backups, disable it if you have many backups and it is getting too slow (for example on BACKUPSERVER)
SOURCE="/" #source folder to backup

HOST_LOCAL="$(hostname -s)" #local hostname
#HOST_SRC="${SRC:-${HOST_LOCAL}}" #explicit source hostname, default is local hostname
if [ -z "${SRC}" ] || [ "${SRC}" == "localhost" ]
then
  HOST_SRC="${HOST_LOCAL}" #explicit source hostname, default is local hostname
else
  HOST_SRC="${SRC}" #explicit source hostname
fi

if [ "${HOST_LOCAL}" == "${BACKUPSERVER}" ] #if we are on BACKUPSERVER then do some fine tuning
then
  MD5LIST=1
  MIN_MIBSIZE=35000 #needed free space for chunk-file tape-arch.sh
  MAX_MIBSIZE=12000
  DU=0 # NB: 'du' is currently disabled on BACKUPSERVER for performance reasons
elif [ "${HOST_LOCAL}" == "${HOST_SRC}" ] #else if we are on a generic server then do other some fine tuning
then
  if [ "${HOST_SRC}" == "ZRHSV-TST01" ]; then MIN_MIBSIZE=500; CHATTR=0; DU=0; MD5LIST=0; fi
fi




# ------------- initialization -----------------------------------------
shopt -s extglob                                            #enable extended pattern matching operators

OPTION="--stats
  --recursive
  --links
  --perms
  --times
  --group
  --owner
  --devices
  --hard-links
  --numeric-ids
  --delete
  --delete-excluded
  --bwlimit=${BWLIMIT}"
#  --progress
#  --size-only
#  --stop-at
#  --time-limit
#  --sparse
if [ "${HOST_SRC}" != "${HOST_LOCAL}" ] #option for a remote server
then
  SOURCE="${HOST_SRC}:${SOURCE}"
  OPTION="${OPTION}
  --compress
  --rsh="ssh -p ${HOST_PORT} -i /root/.ssh/rsync_rsa -l root"
  --rsync-path="/usr/bin/rsync""
fi
if [ "${OPT}" == "--recheck" ]
then
  OPTION="${OPTION}
  --ignore-times"
elif [ -n "${OPT}" ]
then
  echo "Try rsync-snapshot.sh --help ."
  exit 2
fi




# ------------- check conditions ---------------------------------------
echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot backup is created into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001 ==="
STARTDATE=$(date +%s)

# make sure we're running as root
if (($(id -u) != 0))
then
  echo "Sorry, must be root. Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi

# make sure we have a correct snapshot folder
if [ ! -d "${SNAPSHOT_DST}/${HOST_SRC}" ]
then
  echo "Sorry, folder ${SNAPSHOT_DST}/${HOST_SRC} is missing. Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi

# make sure we do not have started already rsync-snapshot.sh or rsync process (started by rsync-cp.sh or by a remote rsync-snapshot.sh) in the background.
if [ "${HOST_LOCAL}" != "${BACKUPSERVER}" ] #because BACKUPSERVER need sometimes to perform an rsync-cp.sh it must disable the check of "already started".
then
  RSYNCPID=$(pgrep -f "/bin/bash .*rsync-snapshot.sh")
  if ([ -n "${RSYNCPID}" ] && [ "${RSYNCPID}" != "$$" ]) || pgrep -x "rsync"
  then
    echo "Sorry, rsync is already running in the background. Exiting..."
    echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
    exit 2
  fi
fi




# ------------- remove some old backups --------------------------------
# remove certain snapshots to achieve an exponential distribution in time of the backups (1,2,4,8,...)
for b in 512 256 128 64 32 16 8 4
do
  let a=b/2+1
  let f=0 #this flag is set to 1 when we find the 1st snapshot in the range b..a
  for i in $(eval echo $(printf "{%.3d..%.3d}" "${b}" "${a}"))
  do
    if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ]
    then
      if [ "${f}" -eq 0 ]
      then
        let f=1
      else
        echo "Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} ..."
        [ "${CHATTR}" -eq 1 ] && chattr -fR -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
        rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
      fi
    fi
  done
done

# remove additional backups if free disk space is short
remove_snapshot() {
  local MIN_MIBSIZE2=$1
  local MAX_MIBSIZE2=$2
  for i in {512..001}
  do
    if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ] || [ ${i} -eq 1 ]
    then
      let d=0 #disk space used by snapshots and free disk space are ok
      echo -n "$(date +%Y-%m-%d_%H:%M:%S) Checking free disk space... "
      FREEDISK=$(df -B M ${SNAPSHOT_DST} | tail -1 | sed -e 's/  */ /g' | cut -d" " -f4 | sed -e 's/M*//g')
      echo -n "${FREEDISK} MiB free. "
      if [ ${FREEDISK} -ge ${MIN_MIBSIZE2} ]
      then
        echo "Ok, bigger than ${MIN_MIBSIZE2} MiB."
        if [ "${DU}" -eq 0 ]
        then #avoid slow 'du'
          break
        else
          echo -n "$(date +%Y-%m-%d_%H:%M:%S) Checking disk space used by ${SNAPSHOT_DST}/${HOST_SRC} ... "
          USEDDISK=$(du -B 1048576 -s "${SNAPSHOT_DST}/${HOST_SRC}/" | cut -f1)
          echo -n "${USEDDISK} MiB used. "
          if [ ${USEDDISK} -le ${MAX_MIBSIZE2} ]
          then
            echo "Ok, smaller than ${MAX_MIBSIZE2} MiB."
            break
          else
            let d=2 #disk space used by snapshots is too big
          fi
        fi
      else
        let d=1 #free disk space is too small
      fi
      if [ ${d} -ne 0 ] #we need to remove snapshots
      then
        if [ ${i} -ne 1 ]
        then
          echo "Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} ..."
          [ "${CHATTR}" -eq 1 ] && chattr -fR -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
          rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
        else #all snapshots except snapshot.001 are removed
          if [ ${d} -eq 1 ] #snapshot.001 causes that free space is too small
          then
            if [ "${OVERWRITE_LAST}" -eq 1 ] #last chance: remove snapshot.001 and retry once
            then
              OVERWRITE_LAST=0
              echo "Warning, free disk space will be smaller than ${MIN_MIBSIZE} MiB."
              echo "$(date +%Y-%m-%d_%H:%M:%S) OVERWRITE_LAST enabled. Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001 ..."
              rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001"
            else
              for j in ${LNKDST//--link-dest=/}
              do
                if [ -d "${j}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${j}" | cut -b5) != "i" ]
                then
                  chattr -fR +i "${j}" #undo unprotection that was needed to use hardlinks
                fi
              done
              echo "Sorry, free disk space will be smaller than ${MIN_MIBSIZE} MiB. Exiting..."
              echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
              exit 2
            fi
          elif [ ${d} -eq 2 ] #snapshot.001 causes that disk space used by snapshots is too big
          then
            echo "Warning, disk space used by ${SNAPSHOT_DST}/${HOST_SRC} will be bigger than ${MAX_MIBSIZE} MiB. Continuing anyway..."
          fi
        fi
      fi
    fi
  done
}

# perform an estimation of required disk space for the new backup
while : #this loop is executed a 2nd time if OVERWRITE_LAST was ==1 and snapshot.001 got removed
do
  OOVERWRITE_LAST="${OVERWRITE_LAST}"
  echo -n "$(date +%Y-%m-%d_%H:%M:%S) Testing needed free disk space ..."
  mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  chmod -R 775 "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  cat /dev/null >"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
  LNKDST=$(find  "${SNAPSHOT_DST}/" -maxdepth 2 -type d -name "${NAME}.001" -printf " --link-dest=%p")
  for i in ${LNKDST//--link-dest=/}
  do
    if [ -d "${i}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${i}" | cut -b5) == "i" ]
    then
      chattr -fR -i "${i}" #unprotect last snapshots to use hardlinks
    fi
  done
  eval rsync
    --dry-run
    ${OPTION}
    --include-from="${SCRIPT_PATH}/rsync-include.txt"
    ${LNKDST}
    "${SOURCE}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space" >>"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
  RES=$?
  if [ "${RES}" -ne 0 ] && [ "${RES}" -ne 23 ] && [ "${RES}" -ne 24 ]
  then
    echo "Sorry, error in rsync execution (value ${RES}). Exiting..."
    echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
    exit 2
  fi
  let i=$(tail -100 "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" | grep 'Total transferred file size:' | cut -d " " -f5)/1048576
  echo " ${i} MiB needed."
  rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.test-free-disk-space"
  remove_snapshot $((${MIN_MIBSIZE} + ${i})) $((${MAX_MIBSIZE} - ${i}))
  if [ "${OOVERWRITE_LAST}" == "${OVERWRITE_LAST}" ] #no need to retry
  then
    break
  fi
done




# ------------- create the snapshot backup -----------------------------
# perform the filesystem backup using rsync and hard-links to the latest snapshot
# Note:
#   -rsync behaves like cp --remove-destination by default, so the destination
#    is unlinked first.  If it were not so, this would copy over the other
#    snapshot(s) too!
#   -use --link-dest to hard-link when possible with previous snapshot,
#    timestamps, permissions and ownerships are preserved
echo "$(date +%Y-%m-%d_%H:%M:%S) Creating folder ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
chmod 775 "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
cat /dev/null >"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
echo -n "$(date +%Y-%m-%d_%H:%M:%S) Creating backup of ${HOST_SRC} into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
if [ -n "${LNKDST}" ]
then
  echo " hardlinked with ${LNKDST//--link-dest=/} ..."
else
  echo " not hardlinked ..."
fi
eval rsync
  -vv
  ${OPTION}
  --include-from="${SCRIPT_PATH}/rsync-include.txt"
  ${LNKDST}
  "${SOURCE}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000" >>"${SNAPSHOT_DST}/${HOST_SRC}/${LOG}"
RES=$?
if [ "${RES}" -ne 0 ] && [ "${RES}" -ne 23 ] && [ "${RES}" -ne 24 ]
then
  echo "Sorry, error in rsync execution (value ${RES}). Exiting..."
  echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot failed. ==="
  exit 2
fi
for i in ${LNKDST//--link-dest=/}
do
  if [ -d "${i}" ] && [ "${CHATTR}" -eq 1 ] && [ $(lsattr -d "${i}" | cut -b5) != "i" ]
  then
    chattr -fR +i "${i}" #undo unprotection that was needed to use hardlinks
  fi
done
mv "${SNAPSHOT_DST}/${HOST_SRC}/${LOG}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${LOG}"
gzip -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${LOG}"




# ------------- create the MD5 integrity signature ---------------------
# create a gziped 'find'-list of all snapshot files (including md5 signatures)
if [ "${MD5LIST}" -eq 1 ]
then
  echo "$(date +%Y-%m-%d_%H:%M:%S) Computing filelist with md5 signatures of ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
  OWD="$(pwd)"
  cd "${SNAPSHOT_DST}"
#  NOW=$(date "+%s")
#  MYTZ=$(date "+%z")
#  let NOW${MYTZ:0:1}=3600*${MYTZ:1:2}+60*${MYTZ:3:2} # convert localtime to UTC
#  DATESTR=$(date -d "1970-01-01 $((${NOW} - 1)) sec" "+%Y-%m-%d_%H:%M:%S") # 'now - 1s' to avoid missing files
  DATESTR=$(date -d "1970-01-01 UTC $(($(date +%s) - 1)) seconds" "+%Y-%m-%d_%H:%M:%S") # 'now - 1s' to avoid missing files
  REF_LIST="$(find ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.001/ -maxdepth 1 -type f -name 'snapshot.*.list.gz' 2>/dev/null)"
  if [ -n "${REF_LIST}" ] && [ "${OPT}" != "--recheck" ]
  then
    REF_LIST2="/tmp/rsync-reflist.tmp"
    gzip -dc "${REF_LIST}" >"${REF_LIST2}"
    touch -r "${REF_LIST}" "${REF_LIST2}"
    ${SCRIPT_PATH}/rsync-list.sh "${HOST_SRC}/${NAME}.000" 0 "${REF_LIST2}" | sort -u | gzip -c >"${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
    rm "${REF_LIST2}"
  else
    ${SCRIPT_PATH}/rsync-list.sh "${HOST_SRC}/${NAME}.000" 0 | sort -u | gzip -c >"${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
  fi
  touch -d "${DATESTR/_/ }" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz"
  cd "${OWD}"
  [ ! -d "${SNAPSHOT_DST}/${HOST_SRC}/md5-log" ] && mkdir -p "${SNAPSHOT_DST}/${HOST_SRC}/md5-log"
  cp -al "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz" "${SNAPSHOT_DST}/${HOST_SRC}/md5-log/${NAME}.${DATESTR}.list.gz"
  mv "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${DATESTR}.list.gz" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000/${NAME}.${DATESTR}.list.gz"
  touch -d "${DATESTR/_/ }" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
fi




# ------------- finish and clean up ------------------------------------
# protect the backup against modification with chattr +immutable
if [ "${CHATTR}" -eq 1 ]
then
  echo "$(date +%Y-%m-%d_%H:%M:%S) Setting recursively immutable flag of ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000 ..."
  chattr -fR +i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.000"
fi

# rotate the backups
if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512" ] #remove snapshot.512
then
  echo "Removing ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512 ..."
  [ "${CHATTR}" -eq 1 ] && chattr -fR -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512"
  rm -rf "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.512"
fi
[ -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && rm -f "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
for i in {511..000}
do
  if [ -d "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" ]
  then
    let j=${i##+(0)}+1
    j=$(printf "%.3d" "${j}")
    echo "Renaming ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i} into ${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j} ..."
    [ "${CHATTR}" -eq 1 ] && chattr -i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}"
    mv "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${i}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j}"
    [ "${CHATTR}" -eq 1 ] && chattr +i "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.${j}"
    [ ! -h "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last" ] && ln -s "${NAME}.${j}" "${SNAPSHOT_DST}/${HOST_SRC}/${NAME}.last"
  fi
done

# remove additional backups if free disk space is short
OVERWRITE_LAST=0 #next call of remove_snapshot() will not remove snapshot.001
remove_snapshot ${MIN_MIBSIZE} ${MAX_MIBSIZE}
echo "$(date +%Y-%m-%d_%H:%M:%S) ${HOST_SRC}: === Snapshot backup successfully done in $(($(date +%s) - ${STARTDATE})) sec. ==="
exit 0
#eof

 

Then create the file ‘/backup/snapshot/rsync/rsync-include.txt’ (download rsync-include) that contains the include and exclude patterns:

#created by francois scheurer on 20120828
#
#note:
#  -be careful with trailing spaces, '- * ' is different from '- *'
#  -rsync stops at first matched rule and ignore the rest
#  -rsync descends iteratively the folders-arborescence
#  -'**' matches also zero, one or several '/'
#  -get the list of all root files/folders
#     pdsh -f 1 -w server[1-22] 'ls -la / | sed -e "s/  */ /g" | cut -d" " -f9-' | cut -d" " -f2- | sort -u
#  -include all folders with '+ */' (missing this rule implies that '- *' will override all the inclusions of any subfolders)
#  -exclude all non explicited files with '- *'
#  -exclude everything except /etc/ssh: '+ /etc/ssh/**  + */  - *'
#  -exclude content of /tmp but include foldername: '- /tmp/*  + */'
#  -exclude content and also foldername /tmp: '- /tmp/  + */'
#  -exclude content of each .ssh but include foldername: '- /**/.ssh/*  + */'
#
#include everything except /tmp/:
#- /tmp/
#same but include /tmp/ as an empty folder:
#- /tmp/*
#include only /var/www/:
#+ /var/
#+ /var/www/
#+ /var/www/**
#- *
#same but also include folder structure:
#+ /var/www/**
#+ */
#- *




#pattern list for / (include by default):
+ /

- /lost+found/*
- /*.bak*
- /*.old*
#- /backup/*
#- /boot/*
#- /etc/ssh/ssh_host*
#- /home/*
- /media/*
- /mnt/*/*
#- /opt/*
- /opt/fedora*/data/*
- /opt/fedora*/lucene/*
- /opt/fedora*/tomcat*/logs/*
- /opt/fedora*/tomcat*/temp/*
- /opt/fedora*/tomcat*/work/*
- /postgresql/*/main/pg_log/*
- /postgresql/*/main/pg_xlog/*
- /postgresql/*/main/postmaster.opts
- /postgresql/*/main/postmaster.pid
#- /postgresql/*/main/*/*
- /proc/*
- /root/old/*
#- /root/.bash_history
- /root/.mc/*
#- /root/.ssh/*openssh*
- /root/.viminfo
- /root/tmp/*
#- /srv/*
- /sys/*
- /tmp/*
#- /usr/local/franz/logstat/logstat.log
- /var/cache/*
- /var/lib/mysql/*
- /var/lib/postgresql/*/main/*/*
- /var/log/*
#- /var/spool/*
- /var/tmp/*

#pattern list for /backup/ (exclude by default):
+ /backup/
- /backup/lost+found/*
- /backup/*.bak*
- /backup/*.old*
+ /backup/snapshot/
+ /backup/snapshot/rsync/
+ /backup/snapshot/rsync/**
- /backup/snapshot/*
- /backup/*
+ /mnt/
+ /mnt/*/
+ /mnt/*/backup/
+ /mnt/*/backup/snapshot/
+ /mnt/*/backup/snapshot/rsync/
+ /mnt/*/backup/snapshot/rsync/**
- /mnt/*/backup/snapshot/
- /mnt/*/backup/

#pattern list for /boot/ (include by default):
+ /boot/
- /boot/lost+found/*
- /boot/*.bak*
- /boot/*.old*
+ /boot/**

#pattern list for /home/ (include by default):
+ /home/
- /home/lost+found/*
- /home/*.bak*
- /home/*.old*
- /home/xen/*
+ /home/**

#include folder structure by default:
#+ */
#include everything by default:
+ *
#exclude everything by default:
#- *
#eof

 

And finally create the optional shell-script ‘/backup/snapshot/rsync/rsync-list.sh’ (download rsync-list.sh) that calculates the MD5 integrity signatures:

#!/bin/bash
# created by francois scheurer on 20081109
# this script is used by rsync-snapshot.sh,
# it recursively prints to stdout the filelist of folder $1 and computes md5 signatures
# it deals correctly with special filenames with newlines or ''
# note1: the script assumes that a file is unchanged if its size and ctime are unchanged;
#   this assumption has a very small risk of being wrong:
#   it could be wrong if two files with different contents but same filename and size are created in the same second in two directories;
#   if the first directory is then removed and the second is renamed as the first, the file is not detected as changed.
# note2: ctime checking can be replaced by mtime checking if CTIME_CHK=0;
#   this is needed by rsync-snapshot.sh (because of hard links creation that do not preserve ctime).




# ------------- the help page ---------------------------------------------
if [ "$1" == "-h" ] || [ "$1" == "--help" ]
then
  cat << "EOF"
Version 1.6 2009-06-19

USAGE: rsync-list.sh PATH/DIR CTIME_CHK [REF_LIST]

PURPOSE: recursively prints to stdout the filelist of folder PATH/DIR and computes md5 integrity signatures.
  It deals correctly with special filenames with newlines or ''.
  If a ref_list is provided, it is used to avoid the re-calculation of md5 on files
  with unchanged filename and ctime.
  A ref_list is a file containing the output of a previously execution of this shell-script.
  The script assumes that a file is unchanged if its size and ctime are unchanged.
  The ref_list_mtime is used to force a md5 re-calculation of all files with newer ctime:
  -if file_ctime > ref_list_mtime then re-calculate md5
  -if file_ctime = ref_file_ctime then use ref_list
  CTIME_CHK can be 1 to base the algorithm on ctime or 0 to base it on mtime.

NOTE: the script assumes that all processes avoid all file modifications in PATH/DIR during the script's execution,
  you should read following remarks if this assumption cannot be guaranted:
  -a recent ref_list_mtime (>= date_of_first_write_to_ref_list) causes the script
   to miss all files with: ref_list_mtime >= file_ctime > ref_file_ctime
   solution: 'touch' ref_file_mtime with date_of_first_write_to_ref_list - 1 second
  -an old ref_list_mtime (< date_of_last_write_to_ref_list) causes the script
   to double all files with: ref_list_mtime < file_ctime = ref_file_ctime
   solution: pipe the output to 'sort -u'

EXAMPLE:
  DATESTR=$( date -d "1970-01-01 UTC $(( $( date +%s ) - 1 )) seconds" "+%Y-%m-%d_%H:%M:%S" ) # 'now - 1s' to avoid missing files
  REF_LIST="/etc.2008-11-23_10:00:00.list.gz"
  REF_LIST2="/tmp/rsync-reflist.tmp"
  gzip -dc "${REF_LIST}" >"${REF_LIST2}"
  touch -r "${REF_LIST}" "${REF_LIST2}"
  ./rsync-list.sh "/etc/" 1 "${REF_LIST2}" | sort -u | gzip -c >"/etc.${DATESTR}.list.gz" # 'sort -u' to avoid doubling files
  rm "${REF_LIST2}"
  touch -d "${DATESTR/_/ }" "/etc.${DATESTR}.list.gz"
EOF
  exit 1
elif [ $# -ne 2 ] && [ $# -ne 3 ]
then
  echo "Sorry, you must provide 2 or 3 arguments. Exiting..."
  exit 2
fi




# ------------- file locations and constants ---------------------------
SRC="$1" #name of source of backup, remote or local hostname
CTIME_CHK=$2 #1 for ctime checking, 0 for mtime checking
if [ "$CTIME_CHK" -eq 1 ]
then
  CTIME_STAT="%z"
  CTIME_FIND="-cnewer"
else
  CTIME_STAT="%y"
  CTIME_FIND="-newer"
fi
REF="$3" #filename of optional reference list
SCRIPT_PATH="/backup/snapshot/rsync"
FINDSCRIPT="$SCRIPT_PATH/rsync-find.sh.tmp" # temporary shell-script to calculate filelist




# ------------- using reference list to to reduce md5 calculation time -
if [ -n "$REF" ] #we have a previous md5 list
then

  if ! [ -s "$REF" ] #invalid reference list
  then
     echo "Error: $REF is not a valid reference list. Exiting..."
     exit 2
  fi

  touch /tmp/testsystime.tmp
  if ! [ /tmp/testsystime.tmp -nt "$REF" ] #if system time is incorrect then exit
  then
    echo "Error: system time is older than mtime of $REF. Exiting..."
    rm /tmp/testsystime.tmp
    exit 2
  fi
  rm /tmp/testsystime.tmp

  cat "$REF" | while read -r LINE #consider all previous files that still exist now with same ctime and size and print their already calculated md5
  do
    SIZE_AND_CTIME="${LINE#* md5sum=* * * * }" #extract size and ctime from reference list
    SIZE_AND_CTIME="${SIZE_AND_CTIME% `*}"
    LINE2="${LINE%% md5sum=*}"    #1) keep only the filename part of the line
    LINE2="${LINE2//n/
}"                                #2) replace 'n' with newline, the problem now is that 'n' is replaced, too (following is not a solution because it removes previous char LINE2="${LINE2//[^]n/newline}")
    LINE2="${LINE2//
/n}"                          #3) replace ''+newline with 'n', fixing the problem of 2)
    LINE2="${LINE2///}" #4) replace '' with ''
    if [ -a "$LINE2" ] || [ -h "$LINE2" ] #check if file still exists
    then
      SIZE_AND_CTIME2=$( stat -c"%s $CTIME_STAT" "$LINE2" )
      SIZE_AND_CTIME2="${SIZE_AND_CTIME2#* md5sum=* * * * }" #get size and ctime from current file
      SIZE_AND_CTIME2="${SIZE_AND_CTIME2% `*}"
      if [ "$SIZE_AND_CTIME" == "$SIZE_AND_CTIME2" ] #current file unchanged (see above note), so print the already calculated md5
      then
        echo "$LINE"
      elif [ "${SIZE_AND_CTIME#* }" == "${SIZE_AND_CTIME2#* }" ] #size is different but ctime is same: update current file's ctime to force md5's recalculation (see below)
      then
        if [ "$CTIME_CHK" -eq 1 ]
        then
          chmod --reference="$LINE2" "$LINE2" #update ctime (note: system time is assumed to be correct)
        else
          touch -m "$LINE2" #update mtime (note: system time is assumed to be correct)
        fi
      fi
    fi #else the file has been either deleted or modified (different ctime) and reference list is here useless
  done
  CNEWER_REF="$CTIME_FIND $REF" #prepare 'find' -cnewer option
else
  CNEWER_REF=""
fi




# ------------- calculation of md5 sums --------------------------------
#this 1st method is not slow but fails on filenames with newlines or ''
find "${SRC}" $CNEWER_REF ! ( -path "*
*" -o -path "**" -o -path " *" -o -path "* " ) | while read LINE
do
  LINE2="$LINE"
  if ! [ -h "$LINE" ] && [ -f "$LINE" ]
  then
    RES=$( md5sum "$LINE" )
    LINE2="$LINE2 md5sum=${RES%% *}"
  else
    LINE2="$LINE2 md5sum=-"
  fi
  RES=$( echo $( stat -c"%A %U %G %s $CTIME_STAT `%F'" "$LINE" ) )
  echo -E "$LINE2 $RES"
done
#this 2nd method is slow but works on filenames with newlines or ''
( cat << "EOF"
#!/bin/bash
  LINE="$1"
#  LINE2="${LINE///}" # replace  with
  LINE2="${LINE///}" # replace  with
  LINE2="${LINE2//
/n}" # replace newline with n
  if ! [ -h "$LINE" ] && [ -f "$LINE" ]
  then
    RES=$( md5sum "$LINE" )
    LINE2="$LINE2 md5sum=${RES%% *}"
  else
    LINE2="$LINE2 md5sum=-"
  fi
  RES=$( echo $( stat -c"%A %U %G %s $CTIME_STAT `%F'" "$LINE" ) )
  echo -E "$LINE2 $RES"
EOF
) >"$FINDSCRIPT"
chmod +x "$FINDSCRIPT"
find "${SRC}" $CNEWER_REF ( -path "*
*" -o -path "**" -o -path " *" -o -path "* " ) -print0 | xargs --replace --null "$FINDSCRIPT" "{}"
rm "$FINDSCRIPT"
#eof

 

Set the ownerships and permissions:

chown -cR root:root /backup/snapshot/rsync/
chmod 700 /backup/snapshot/rsync/rsync-*.sh
chmod 600 /backup/snapshot/rsync/rsync-include.txt

 

Usage

When you call the script ‘rsync-snapshot.sh’ without parameters or with the hostname of the server itself (or localhost), the script performs a self-snapshot of the complete filesystem ‘/’.
You can and should use filter rules to exclude things like ‘/proc/*’ and ‘/sys/*’. For this you need to edit the configuration file ‘/backup/snapshot/rsync/rsync-include.txt’.
A description of the filter rules syntax is written as comments in the file itself.

The snapshot backup is created into ‘/backup/snapshot/HOST/snapshot.001′, where ‘HOST’ is your server’s hostname. If the folder ‘snapshot.001′ exists already it is rotated to ‘snapshot.002′ and so on, up to ‘snapshot.512′, thereafter it is removed. So if you create one backup per night, for example with a cronjob, then this retention policy gives you 512 days of retention. This is useful but this can require to much disk space, that is why we have included a non-linear distribution policy. In short, we keep only the oldest backup in the range 257-512, and also in the range 129-256, and so on. This exponential distribution in time of the backups retains more backups in the short term and less in the long term; it keeps only 10 or 11 backups but spans a retention of 257-512 days.
In the following table you can see on each column the different steps of the rotation, where each column shows the current set of snapshots (limited from snapshot.1 to snapshot.16 in this example):

1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
    3       3       3       3       3       3       3       3
4       4       4       4       4       4       4       4       4
    5               5               5               5
        6               6               6               6
            7               7               7               7
8               8               8               8               8
    9                               9
        10                              10
            11                              11
                12                              12
                    13                              13
                        14                              14
                            15                              15
16                              16                              16

 

To save more disk space, ‘rsync’ will make hard links for each file of ‘snapshot.001′ that already existed in ‘snapshot.002′ with identical content, timestamps and ownerships.
For example, the following example creates a backup and then use commands to let you see the used disk space of the 4 existing backups:

root@server05:~# /backup/snapshot/rsync/rsync-snapshot.sh
2012-09-11_19:07:43 server05: === Snapshot backup is created into /backup/snapshot/server05/snapshot.001 ===
2012-09-11_19:07:43 Testing needed free disk space ... 0 MiB needed.
2012-09-11_19:07:45 Checking free disk space... 485997 MiB free. Ok, bigger than 5000 MiB.
2012-09-11_19:07:45 Checking disk space used by /backup/snapshot/server05 ... 11011 MiB used. Ok, smaller than 20000 MiB.
2012-09-11_19:07:46 Creating folder /backup/snapshot/server05/snapshot.000 ...
2012-09-11_19:07:46 Creating backup of server05 into /backup/snapshot/server05/snapshot.000 hardlinked with  /backup/snapshot/server05/snapshot.001 ...
2012-09-11_19:07:52 Setting recursively immutable flag of /backup/snapshot/server05/snapshot.000 ...
Renaming /backup/snapshot/server05/snapshot.003 into /backup/snapshot/server05/snapshot.004 ...
Renaming /backup/snapshot/server05/snapshot.002 into /backup/snapshot/server05/snapshot.003 ...
Renaming /backup/snapshot/server05/snapshot.001 into /backup/snapshot/server05/snapshot.002 ...
Renaming /backup/snapshot/server05/snapshot.000 into /backup/snapshot/server05/snapshot.001 ...
2012-09-11_19:07:55 Checking free disk space... 485958 MiB free. Ok, bigger than 5000 MiB.
2012-09-11_19:07:55 Checking disk space used by /backup/snapshot/server05 ... 11050 MiB used. Ok, smaller than 20000 MiB.
2012-09-11_19:07:56 server05: === Snapshot backup successfully done in 13 sec. ===
-----------------------------
root@server05:~# du -chslB1M /backup/snapshot/localhost/snapshot.* | column -t
10901  /backup/snapshot/localhost/snapshot.001
10901  /backup/snapshot/localhost/snapshot.002
10901  /backup/snapshot/localhost/snapshot.003
10901  /backup/snapshot/localhost/snapshot.004
0      /backup/snapshot/localhost/snapshot.last
43602  total
-----------------------------
root@server05:~# du -chsB1M /backup/snapshot/localhost/snapshot.* | column -t
10898  /backup/snapshot/localhost/snapshot.001
40     /backup/snapshot/localhost/snapshot.002
45     /backup/snapshot/localhost/snapshot.003
45     /backup/snapshot/localhost/snapshot.004
0      /backup/snapshot/localhost/snapshot.last
11026  total
-----------------------------

 

We can see that the 4 snapshot backups use 10.9 GB each, so without hard links they would sum to 43 GB; the last command shows on the contrary that the real used size is only 11 GB, thanks to the hard links.
BTW, the following command can be very useful to replace all duplicate files with hard links to the first file in each set of duplicates, even if they have different name or path:

chattr -fR -i /backup/snapshot/localhost/snapshot.*
fdupes -r1L /backup/snapshot/localhost/snapshot.*

A good tutorial on how to use the ‘rsync’ command is available here:
http://www.thegeekstuff.com/2010/09/rsync-command-examples/
 

When called with a remote hostname as parameter, the script performs a snapshot backup via the network. This can be very useful for a DRP (Disaster Recovery Plan), in order to have a servers’ farm replicated every night to a secondary site. In addition to that you could implement a continuous replication of the databases for example. The ‘BWLIMIT’ can then be changed inside the shell-script to limit here the network bandwidth usage and the disk I/O overhead; it can help so to moderate the performance impact and avoid any slow down on critical production servers.
Other variables can also be modified at the beginning of the script, either as a global setting or specific tuning for some servers; a ‘BACKUPSERVER’ section is already provided for this purpose and let you tune specific settings for the remote central backup server:

HOST_PORT=22                            #port of source of backup
SCRIPT_PATH="/backup/snapshot/rsync"
SNAPSHOT_DST="/backup/snapshot"         #destination folder
NAME="snapshot"                         #backup name
LOG="rsync.log"
MIN_MIBSIZE=5000                        #older snapshots (except snapshot.001) are removed if free disk <= MIN_MIBSIZE. the script may exit without performing a backup if free disk is still short.
OVERWRITE_LAST=0                        #if free disk space is too small, then this option let us remove snapshot.001 as well and retry once
MAX_MIBSIZE=20000                       #older snapshots (except snapshot.001) are removed if their size >= MAX_MIBSIZE. the script performs a backup even if their size is too big.
BWLIMIT=100000                          #bandwidth limit in KiB/s. 0 does not use slow-down. this allows to avoid rsync consuming too much system performance
BACKUPSERVER="rembk"                    #this server connects to all other to download filesystems and create remote snapshot backups
MD5LIST=0                               #to compute a list of md5 integrity signatures of all backuped files, need 'rsync-list.sh'
CHATTR=1                                #to use 'chattr' command and protect the backups again modification and deletion
DU=1                                    #to use 'du' command and calculate the size of existing backups, disable it if you have many backups and it is getting too slow (for example on BACKUPSERVER)
SOURCE="/"                              #source folder to backup

if [ "${HOST_LOCAL}" == "${BACKUPSERVER}" ] #if we are on BACKUPSERVER then do some fine tuning
then
  MD5LIST=1
  MIN_MIBSIZE=35000 #needed free space for chunk-file tape-arch.sh
  MAX_MIBSIZE=12000
  DU=0 # NB: 'du' is currently disabled on BACKUPSERVER for performance reasons
elif [ "${HOST_LOCAL}" == "${HOST_SRC}" ] #else if we are on a generic server then do other some fine tuning
then
  if [ "${HOST_SRC}" == "ZRHSV-TST01" ]; then MIN_MIBSIZE=500; CHATTR=0; DU=0; MD5LIST=0; fi
fi

 

To make the backup server able to connect via ‘ssh’ to the target servers without interactive entering of a password, you should create a ‘ssh’ host-key with empty passphrase ‘/root/.ssh/rsync_rsa’ and copy the public key to the target servers:

#on each targetserver:
mkdir -p ~root/.ssh/
chown root:root ~root/.ssh/
chmod 700 ~root/.ssh/
touch ~root/.ssh/authorized_keys
chown root:root ~root/.ssh/authorized_keys
chmod 600 ~root/.ssh/authorized_keys
#update manually /etc/ssh/sshd_config to have 'AllowUsers root'
service ssh reload

#on the backupserver, create the key with an empty passphrase:
ssh-keygen -f ~/.ssh/rsync_rsa
#and upload the public key to the targetserver:
MYIP=$(hostname -i) #assign here the backupserver's external IP if necessary
echo "from="${MYIP%% *}",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="rsync ${SSH_ORIGINAL_COMMAND#* }" $(ssh-keygen -yf ~/.ssh/rsync_rsa)" | ssh targetserver "cat - >>~/.ssh/authorized_keys"

Note that the ‘command=’ restriction (http://larstobi.blogspot.ch/2011/01/restrict-ssh-access-to-one-command-but.html) will not apply if ‘/etc/sshd_config’ has already a ‘ForceCommand’ directive.
This central backup server could also be used to centralize the administration of all other servers via pdsh/ssh (LINK).
 

Because the script does not freeze the filesystem during its operation, there is no guaranty that the snapshot backup will be a strict snapshot, in other words the files will not be copied at the exact same moment. This is usually not an issue, except for databases. In order to keep the consistency of a database, you should follow the instructions of http://www.postgresql.org/docs/9.1/static/continuous-archiving.html and http://www.anchor.com.au/blog/documentation/better-postgresql-backups-with-wal-archiving/.

The following example applies for PostgreSQL 9.1 on Debian:

#sudo -u postgres mkdir /var/lib/postgresql/9.1/main/wal_archive
#sudo -u postgres chmod 700 /var/lib/postgresql/9.1/main/wal_archive

#vi /etc/postgresql/9.1/main/postgresql.conf
# - Archiving -
#archive_mode = on               # allows archiving to be done
#                                # (change requires restart)
#archive_command = 'test ! -f /var/lib/postgresql/9.1/main/backup_in_progress || (test ! -f /var/lib/postgresql/9.1/main/wal_archive/%f && cp %p /var/lib/postgresql/9.1/main/wal_archive/%f)'           # command to use to archive a logfile segment
#archive_timeout = 0            # force a logfile segment switch after this
#                                # number of seconds; 0 disables



#check if postgresql is running:
if sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /var/lib/postgresql/9.1/main/ status
then
  touch /var/lib/postgresql/9.1/main/backup_in_progress
  #freeze posgresql writing (all writes will go only in pg_xlog WAL-files), in order to make a clean backup at filesystem-level:
  sudo -u postgres psql -c "SET LOCAL synchronous_commit TO OFF; SELECT pg_start_backup('rsync-snapshot', true);"
fi

#perform the backup
/backup/snapshot/rsync/rsync-snapshot.sh

#check if postgresql is running:
if sudo -u postgres /usr/lib/postgresql/9.1/bin/pg_ctl -D /var/lib/postgresql/9.1/main/ status
then
  #unfreeze posgresql writing:
  sudo -u postgres psql -c "SET LOCAL synchronous_commit TO OFF; SELECT pg_stop_backup();"
  rm /var/lib/postgresql/9.1/main/backup_in_progress
  chattr -R -i /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main &>/dev/null
  mv /var/lib/postgresql/9.1/main/wal_archive/* /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main/pg_xlog/
  rm /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main/{backup_in_progress,backup_label}
  chattr -R +i /backup/snapshot/localhost/snapshot.001/var/lib/postgresql/9.1/main &>/dev/null
fi

#NB:
#  -If you need to re-create a standby server while transactions are waiting, make sure that the commands to run pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit = off, otherwise those requests will wait forever for the standby to appear.
#  -see also 'pg_dump' and 'pg_basebackups' commands.

 

It is even possible to freeze an ext3/ext4 filesystem before backuping, but this is quite dangerous, because all processes that try to write on it will get frozen until you unfreeze the filesystem.
You should therefore avoid using this on production server! But for the sake of information here are the steps to freeze the filesystem mounted on ‘/mnt/folder’ during 30 seconds on Debian:

wget ftp://ftp.kernel.org/pub/linux/utils/util-linux/v2.22/util-linux-2.22-rc2.tar.gz
tar xfz util-linux-2.22-rc2.tar.gz
cd xfz util-linux-2.22-rc2
aptitude install ncurses-dev libncurses5-dev mkcramfs cramfsprogs zlib1g-dev libpam-dev libpam0g-dev
./configure
make
make clean
man -l sys-utils/fsfreeze.8
./fsfreeze -f /mnt/folder && sleep 30 && ./fsfreeze -u /mnt/folder

 

I hope this ‘rsync-snapshot.sh’ script can be useful to you! ^_^
Another script will be posted on the blog soon to show you how to archive those snapshot backups on tapes using ‘tar’ with encrypted split chunks of data.

Kommentare

  1. Mariano Martinez Peck

    Hi,

    Thank you very much for this too. I wanted to know if you think this script could work with CentOS 6 + ext3 partitions. Thanks!!

    • Francois Scheurer

      hi Mariano, it was tested on debian and ubuntu but should work on all linux distrib.
      best regards

  2. I have some similar utilities to the ones you mention here for rsync and database backups: https://sourceforge.net/projects/discremental/ and https://sourceforge.net/projects/mysqlarchives/

    I am having trouble getting the backup data (with hard links) to replicate to another server. Have you tried replication? Any gotchas? I’m working with a few Terabytes of backups.

    • Francois Scheurer

      Hello JT,

      yes we do replication over internet with only 2MB/s (16Mbps) of about 2TB every weekend.
      It works well but need about 36 hours! without rsync it would be almost impossible.

      Hard links can be replicated with ‘-H’ option. As I already posted (to another question), you can use following rsync options:

      (see also at the beginning of rsync-snapshot.sh for useful comments on rsync options):

      #linux and linux
      rsync -n –bwlimit=5000 -HaxS –numeric-ids -zyc -vPh –stats -e’ssh -o ServerAliveInterval=60′ –delete –delete-delay –delete-excluded –exclude myfile SRC/myfolder/. DST/myfolder 2>&1 | less -S+F

      #linux and windows (ntfs)
      #cygwin: (rsync problem with ntfs acl: use –no-perms and put ‘noacl’ in /etc/fstab)
      rsync -n –bwlimit=4000 –delete -HaxSzyv –no-perms –numeric-ids –stats –progress –partial -e’ssh -o ServerAliveInterval=60 -p2200′ /cygdrive/c/parent/folder user@host:/parent/ 2>/tmp/rs.err | tee.exe /tmp/rs.log

      Best Regards

  3. Savas

    Hello Francois,
    First of all thank you for sharing this awesome script! I am using it with a cronjob and set to send me an ifo mail incase of any error.
    I have one question though:

    root@mserver:/home/backup/snapshot/server2backup# du -chslB1M /home/backup/snapshot/server2backup/snapshot.* | column -t
    1732 /home/backup/snapshot/server2backup/snapshot.001
    1426 /home/backup/snapshot/server2backup/snapshot.002
    1426 /home/backup/snapshot/server2backup/snapshot.003
    1426 /home/backup/snapshot/server2backup/snapshot.005
    0 /home/backup/snapshot/server2backup/snapshot.last
    6008 total
    root@mserver:/home/backup/snapshot/server2backup# du -chsB1M /home/backup/snapshot/server2backup/snapshot.* | column -t
    1727 /home/backup/snapshot/server2backup/snapshot.001
    1417 /home/backup/snapshot/server2backup/snapshot.002
    1417 /home/backup/snapshot/server2backup/snapshot.003
    1417 /home/backup/snapshot/server2backup/snapshot.005
    0 /home/backup/snapshot/server2backup/snapshot.last
    5978 total
    root@mserver:/home/backup/snapshot/server2backup#

    As can be seen above my backups seems they are not hardlinked. (at least almost none) I am also using rsync-list.sh to check the files. (rsync-snapshot.sh ${SERVER_NAME} –recheck)
    Do I miss stg ?
    (back up server is just a mysql server)

    • Francois Scheurer

      Hi Savas!

      Without –recheck, rsync-snapshot.sh will use ‘rsync’ with the standard ‘quick-check’ strategy (cf. man rsync on topic ‘–ignore-times’ and ‘–checksum’). All files with same size and time of last modification (mtime) get their comparison skipped and are hard-linked.

      Other files are compared and the fantastic rsync delta-transfer-algorithm that detects the minimal set of chunks to transfer via network for a remote backup; for a local backup the algorithm is not used and the file is simply copied.
      However, if the source and destination files have same content but different attributes (like permissions, ownerships or timestamps) the destination file is not overwritten but its attributes are simply updated, saving time also for local backups.
      Now if the destination file has more than one hard-link, this optimization cannot be used or it would alter the attributes of the other copies, too, which is not desirable; that is why rsync will create a new copy (instead of a hard-link) in this case.

      For example if /etc/host was unchanged since the last backup, then the new backup will make a hard-link copy.
      But if it was changed a new copy (not hard-linked) will be created.
      You have to understand that by ‘changed’ we mean all kind of changes, including a different time of last access (atime).
      Depending on your filesystem mount options, you may be in the situation where ‘–recheck’ (which uses ‘rsync –ignore-times’ that forces a read of all files) will update the ‘atime’ timestamp of ALL files !…
      This would ends with a full copy without hard-links.

      Check your mount options with ‘cat /proc/mounts | column -t’, you should see ‘relatime’ or ‘noatime’.
      Else you are using kernel default ‘strictatime’ where all read operations update the ‘atime’ timestamp.
      Remount then your filesystems with ‘relatime’ or stop using the option ‘–recheck’.

      Even with the mount option ‘relatime’, using ‘–recheck’ will be a bad choice: the backup will uses hard-links but it will be very slow because it will read twice the whole filesystem (source + destination). The idea of the ‘–recheck’ option is for paranoid users that want to check the backup integrity, to avoid the scenario where a corrupted file get hard-linked again and again without being detected; on a reliable system with RAID redundancy this should never happen.

      I hope this solves your problem.

      BTW, I will post soon a new version of the script that detect DB (MySQL & PostgreSQL) automatically and export the tables.

      Franzi

      PS:
      Avoiding to alter the attribute of other copies allows all snapshot backup files to be strictly identical to the original, for example if you check with the command ‘stat’ a backup file, you will see the same ownerships, permissions and timestamps as on the original. This can be important, if for example, you want to check what was the ownerships or the last modification time of a backup file.
      Imagine that you have made a mess by changing permissions recursively on some system folders and you want to restore from an old backup, you also want that the most recent backup (with the wrong permissions) has not altered the older backups. That should explain why rsync avoids hard-links in that case.

      • Savas

        Thank you Francois for your informative answer. As you said I checked the mount options and decided not to use –recheck option.
        Now it seems like it should be .
        We (the ones with less linux system administration experience) will be looking forward for your new script.

  4. anu

    thanks

  5. yemu

    Hi,

    i’m using your backup script (thanks) but I only have snapshot.000 and 001 directories created, the script does not create more backups. do you know why this may happen? thanks!

    y

    • Francois Scheurer

      Dear Yemu,

      If your disk has only few free space, the script will delete older snapshots,so you may only have snapshot.001.

      You can configure some settings by editing the script to change that:
      MIN_MIBSIZE=5000 # older snapshots (except snapshot.001) are removed if free disk < = MIN_MIBSIZE. the script may exit without performing a backup if free disk is still short.
      OVERWRITE_LAST=0 # if free disk space is too small, then this option let us remove snapshot.001 as well and retry once
      MAX_MIBSIZE=80000 # older snapshots (except snapshot.001) are removed if their size >= MAX_MIBSIZE. the script performs a backup even if their size is too big.

      Size are given in Megabytes.
      So here we delete older backups if the total size of all snapshots if greater than 80GB or if the free disk space is less than 5GB.

      Note: snapshot.000 is the temporary name and should disappear at the end of the script execution.
      If not, it means the script exited with an error, could you therefore paste the output of the script?

      Regards

  6. yemu

    Hi,

    how can I restore the backup? what command should I use?

    • Francois Scheurer

      Hi Yemu,

      Basically, you can just copy the backup files with ‘cp -a source/. destination/’.
      In the case you need to restore a whole system, you should boot on a live cd (or from the lan with a pxe-server), please look at the answer to Brent from 2014-05-01.

      Best regards

  7. I have found that the USB jump drive used as the destination was in FAT32 format and causing most of my issues. Lots of backup features I am looking for are in your script. I will be testing and evaluating your script that I downloaded.

    –Mark

    • Francois Scheurer

      You are welcome Mark!

      Attributes incompatibilities between FAT and linux filesystem will cause issues with permissions/ownerships/timestamps/sparse-file/hard-links.
      See also my comment to your previous post.
      You can then reformat your usb disk as ext4/3 or create a loop device in the FAT partition with a ext4/3 filesystem within.

      Best regards

  8. I am looking for a backup/restore solution for my various Ubuntu (14.10, Wubi 14.03, and Lubuntu) platforms that could support Windows and MacOS in the future.

    So far I have used ArchLinux’s “Full system backup with rsync” documentation and script and have tried the Grsync AP. I have experienced problems with both. The initial ‘rsync’ appears to have worked. Closer investigation shows that hard links and symbolic links were causing issues. My initial observation was the backed up partition was larger, thus my suspicion that hard links were not being replicated. And, that the some of the “–excluded={ . . .}” were not being honored, which should of made the backup smaller.
    Another problem I experienced is that a test of ‘rsync’ on the /bin directory did not copy the symbolic links and gave errors. It also left the mounted partition in “read-only” mode causing very strange issues until I remounted partition. I tried various combination in using the “-H” and “-s” options to address the hard/symbolic link issues with no positive results.

    My first question: Is ArchLinux different enough that ‘rsync’ works and Ubuntu does not? Second, will the PointSoftware script address my link issues and provide a restore partition solution? One partition being a Windows 7 bootable and hopefully a VMplayer/VitrualBox solution soon. Any issues with this partition?

    –Mark

    BTW, I am a really longtime experienced Xnix user/administrator. Started with a variant of Multix (NOS/VE) in the early 80′s and later the team lead Administrator on a 15K-20K user Berkley/System-5 OS from HP called Tru64 in the ’90s. A little Debian in the last 15 years related to Network Security.

    • Francois Scheurer

      Hi Mark
      Your rsync issue were with ArchLinux or with Ubuntu?
      Most of the time, the problems can be fixed with some options.

      As long as your filesystem for source and backup are unix-based (ext2,3,4) then yes the ‘rsync-snapshot.sh’ should do the job as you expect.
      If you backup a linux fs on ntfs then it will not work, because ntfs will not properly store linux attributes (permissions, etc).
      You can backup linux to ntfs if you need only document files and do not care about permissions/ownerships/timestamps/sparse-file/hard-links and so.
      Backuping from windows to linux has the same limitation.

      A workaround (for linux to windows) could be to create a loop device on windows where you create a linux filesystem and access it from linux via a cifs/smb mount point, but performances may be suboptimal…
      On the other side (windows to linux) you can create an ntfs read-write partition in linux as backup folder and use rsync with cygwin, but even like that you will not respect all windows properties like physical address (hard coded) of special files like pagefile.sys.
      So to backup a bootable windows system you need a windows tool.
      A tradeoff can be a scheduled local windows backup (with microsoft backup program) that is then replicated remotely with rsync.
      Files folders without important ntfs permissions can be backuped remotely with rsync to benefit from hard-links deduplication.

      I give you my personal favorite rsync options that you can test (see also at the beginning of rsync-snapshot.sh for useful comments on rsync options):

      #linux and linux
      rsync -n –bwlimit=5000 -HaxS –numeric-ids -zyc -vPh –stats -e’ssh -o ServerAliveInterval=60′ –delete –delete-delay –delete-excluded –exclude myfile SRC/myfolder/. DST/myfolder 2>&1 | less -S+F

      #linux and windows (ntfs)
      #cygwin: (rsync problem with ntfs acl: use –no-perms and put ‘noacl’ in /etc/fstab)
      rsync -n –bwlimit=4000 –delete -HaxSzyv –no-perms –numeric-ids –stats –progress –partial -e’ssh -o ServerAliveInterval=60 -p2200′ /cygdrive/c/parent/folder user@host:/parent/ 2>/tmp/rs.err | tee.exe /tmp/rs.log

      Best Regards

  9. Brent

    Hi Francois,

    I really appreciate this script and your thorough write-up. I am testing things out now and I intend to put this to use soon. I do not administer linux systems on a daily basis, however I have a client with a need for a backup solution like this.

    The one question I have is, what do I do if/when it’s time to actually do a full disk restore? I understand that if I’m simply restoring a specific file then I can browse the snapshot and retrieve it. But what I’m missing is what I would need to do to (for example) take a brand new harddrive and copy these snapshot backups to it so that it’s like nothing ever happened. In other words, a bare metal restore.

    Thanks again!

    • Francois Scheurer

      Hi Brent!

      Sorry for my late reply.
      Your question is actually a very good and important one.

      You should boot your new server with a linux live CD and apply the following steps:

      1) partition the new disk and set the active partition (for boot); useful commands:
      lshw -C disk #show disks
      lsblk -b #show partitions
      blkid #show filesystem UUID (http://www.cyberciti.biz/faq/linux-finding-using-uuids-to-update-fstab/)
      lsscsi #show scsi disks
      fdisk -l #show MBR partition tables
      fdisk -l /dev/sdY
      sfdisk -d /dev/sdX | sfdisk /dev/sdY #copy the MBR partition table from sdX to sdY
      parted -l #show GPT partition tables
      parted /dev/sdY p
      sgdisk /dev/sdX -R /dev/sdY; sgdisk -G /dev/sdY #copy the GPT partition table from sdX to sdY
      #reread partition table:
      partprobe
      partx -a /dev/sdY

      2) create the RAID arrays if you are using mdadm

      3) create the filesystems/swaps with the same UUID as before (compare ‘/etc/fstab’ with ‘blkid’)

      5) mount them and simply copy the local snapshot on it: for example with ‘rsync’ or ‘cp’, like:
      cp -ai /mnt/disk/backup/snapshot/myhost/snapshot.001/. /mnt/newrootfs/
      remember also to mount for example boot and home inside /mnt/newrootfs if they are on their own partition
      if the backup is on a remote host, then use for example:
      rsync -n –bwlimit=5000 -HaxSzycvPh –numeric-ids –stats -e’ssh -o ServerAliveInterval=60 -p22′ –delete –delete-delay –delete-excluded –exclude myfiletobeexcluded user@host:/backup/snapshot/myhost/snapshot.001/. /mnt/newrootfs/ 2>&1 | less -S+F

      6) and finally install the bootloader (like grub2) and reboot
      #chroot into new system
      for i in /proc/ /sys/ /dev/ /dev/pts/ /sys/kernel/debug/ /sys/kernel/security/ /dev/shm/ /tmp/ /var/run/ /var/lock/ /proc/sys/fs/binfmt_misc/; do mount -vo bind $i /mnt/newrootfs/$i; done
      chroot /mnt/newrootfs/ /bin/bash
      #for example grub2:
      #dpkg-reconfigure grub-pc
      #or with: update-grub2; grub-install /dev/sda; grub-install /dev/sdb
      exit
      umount -v /mnt/lroot/boot/; for i in /proc/ /sys/ /dev/ /dev/pts/ /sys/kernel/debug/ /sys/kernel/security/ /dev/shm/ /tmp/ /var/run/ /var/lock/ /proc/sys/fs/binfmt_misc/; do umount -vl /mnt/newrootfs/$i; done
      #reboot:
      shutdown -r now

      Best Regards

      • Brent

        Hi Francois,

        Excellent, thanks so much for this. I have this script up and running on 2 machines, but I’m having a problem with the 3rd, I’m hoping you can help.

        When I run the command to create the folders, it names the computer folder (example): computername.domain.com — this name is correct as confirmed by running the “hostname” command. This is identical to the two other machines.

        The strange thing is that when I try and run the snapshot script, it tells me it fails because the folder “computername” doesn’t exist. I’m not sure why in the script it’s picking up only computername.domain.com since that is the hostname. It isn’t a major issue, I simply symlinked computername to the computername.domain.com folder and that allows it to progress further, but it fails after the space calculation. I’m unsure if it’s related to this folder issue or not though.

        014-05-06_12:10:21 Testing needed free disk space …rsync: hlink.c:533: finish_hard_link: Assertion `(((unsigned char *)(node->data))[0]) == 0′ failed.
        rsync: stat “/backup/snapshot/computername/snapshot.test-free-disk-space/usr/bin/git” failed: No such file or directory (2)

        If I browse to that folder the snap…-space folder does exist, but there is no ./usr/bin/git.

        Is this related to the folder naming issue, or is something else going on? The other 2 are similar machines and work great.

        Thanks again.

        • Francois Scheurer

          Hi Brent, I am glad that you use the script.
          When you create the folder, I wrote to use ‘hostname’ but the script is actually using ‘hostname -s’, which returns the short domain (not fqdn).
          Normally ‘hostname’ returns also the shortname, but more precisely it returns the content of /etc/hostname, and apparently your 3rd server has a fqdn name (maybe a mail server?) in /etc/hostname, which is completely ok.
          So the solution is to use ‘mkdir -vp $(hostname -s)’, I will correct the documentation to reflect that.

          I do not know how to interpret the error message you got. Maybe you can post the output of ‘ls -la; rsync –version’, but anyway simply deleting the symlink and renaming the folder computername.domain.com into computername should do the trick.

          best regards

  10. stelios

    Hi Francois, I am using your very helpfull script for backing up. However i am having a small issue in using rsync-include .

    I wish for example to exclude all folders under / except /home and its contents.
    i understand from the guide that its done
    + /home/**
    – *

    However this doesnt seem to work and excludes everything.

    Thanks for your help.

    • Francois Scheurer

      Hi Stelios

      Yeah, it is not suprising that you have trouble with the include syntax of rsync. It is quite hard to understand the rsync man pages on this topic.
      That is why I put some additional explanations at the beginning of rsync-include.txt .
      One thing to remember is that rsync will go deeper in folders by iteration, first on all root files, second in all folders, third in all subfolders, and so on.
      Another point is that the the first matching rule is winning.
      A rule like ‘- *’ will exclude everything, so in order to include /var/www we need to put three includes (for the 3 first iterations) prior to ‘- *’:
      + /var/
      + /var/www/
      + /var/www/**
      the last ‘**’ are matching all characters including several ‘/’ for all paths inside /var/www/.

      Now let’s see some examples:
      #include everything except /tmp/:
      - /tmp/

      #same but include /tmp/ as an empty folder:
      - /tmp/*

      #include only /var/www/:
      + /var/
      + /var/www/
      + /var/www/**
      - *

      #same but also include folder structure:
      + /var/www/**
      + */
      - *

      In the last example we do not need ‘+ /var/’ and ‘+ /var/www/’ because ‘+ */’ is including all folders (and sub-folders…).
      So all folders will be created but only the files from /var/www will be copied.

      The rsync-include.txt file will usually ends with following lines:
      #include everything by default:
      + *

      or these:
      #include folder structure by default:
      + */
      #exclude everything by default:
      - *

      Now, the concrete answer for your question:
      + /
      + /home/
      + /home/**
      - *

      • Francois Scheurer

        Hi again Stelios,
        BTW, I would recommend you to backup at least /etc/ along with your /home/ .

        Note: I personally prefer to backup just the whole system (/) and exclude specific things like logs or temp folders.

        Note: a debian server will need usually less than 2 GB for a full system backup and then every additional backup will often require less than 20MB of incremental disk space.
        So backuping the wole system will not cost you a lot ;-)

  11. Matthias

    Hi Francois!

    This looks really interesting. I’ll check it out as soon as I find the time.
    I was wondering what happens, if the backup process gets interrupted. For example because I shut down the pc or the laptop goes into sleep in the middle of the backup process.

    • Francois Scheurer

      Hi Matthias

      Good question! If the process got interrupted, you will see an empty snapshot.test-free-disk-space folder or an incomplete backup named snapshot.000 .
      You can delete those folders or simply ignore it, because the next run of the script will overwrite them anyway.
      When the backup is complete, snapshot.000 get renamed to snapshot.001 and (optionally) protected against modification/deletion with ‘chattr’.

      best regards ^^

      PS: BTW, you can setup cron to use ‘cronic.sh’ as launcher of the script, this quite cool because you will get email alerts only in case of errors, while locally a complete log of all execution (with or without errors) will be saved. I will update the article to mention that at the end.

  12. David Highley

    Looks like some posts were dropped out. So I’m re-posting this information.

    We installed these scripts on our Fedora 20 systems. It took a bit of investigation and testing to get the authorized_keys file entries to work. Below is the syntax that worked for us:
    from=”10.2.2.9″,command=”$SSH_ORIGINAL_COMMAND”,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa PUBLIC_KEY root@hostname

    The quotations were needed and we had to pass the original ssh command, just rsync or absolute path would not work. The root@hostname is just a comment .

    • Francois Scheurer

      Hi David, sorry for the delay, we just enabled your comment.
      Thanks for your sharing!

      It is also possible to write ‘AllowUsers root@10.2.2.9′ inside sshd_config to restrict the source IP, but your solution looks better to put specific restrictions on that key only.

      Following URL may be helpful to restrict the ssh key on rsync command only:
      #http://larstobi.blogspot.ch/2011/01/restrict-ssh-access-to-one-command-but.html
      #http://www.eng.cam.ac.uk/help/jpmg/ssh/authorized_keys_howto.html
      They propose:
      from=”fromserver.example.com”,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command=”/usr/bin/rsync ${SSH_ORIGINAL_COMMAND#* }”
      However, it makes only sense for non-root users, because if rsync is granted for root, a client could easily do whatever he want by overwriting configuration files.
      PS: ‘chattr +i’ could be used to protect authorized_keys and sshd_config. But still it is difficult to protect all files (think about password, shadow, pam.d, etc.).

  13. Johannes

    Thanks for this script! I’m just wondering which is the up-to-date version: The linked file or the version on this side inside pre-tags?

    • Francois Scheurer

      Hi Johannes, I just uploaded the latest version now, so the linked file is the latest.

  14. David Highley

    It took us a few days to figure out the authorized_keys file syntax that works for the Fedora hosts we use. We ended up with entries like this:
    from=”IPAddress”,command=”$SSH_ORIGINAL_COMMAND”,no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa KEY root@hostname

    The quotes were needed and we had to use the secure shell parameter in order to get the command entry to work, would not work with just rsync for the command. This somewhat weakens the security as any command passed to ssh would be allowed to execute. KEY is the public RSA key value. The root@hostname is just a comment.

  15. Gerome

    First of all thank you for publishing these scripts- they are very usefull.
    I wonder if it wouldn’t be better to rename the symlink “snapshot.last”
    into “snapshot.oldest” because it points to the oldest, not the last Backup. Maybe I’m just getting it wrong but for me the last means in this context the last one made, not the last one in a row from newest to oldest.

    • Francois Scheurer

      Hi Gerome!

      you are absolutely right!
      and the newest backup is of course snapshot.001.
      i will make your proposed change.

  16. David Brooks

    Great script!! Thank you.

    One query though — in line 336 LNKDST is set as:

    find “${SNAPSHOT_DST}/” -maxdepth 2 -type d -name “${NAME}.001″ -printf ” –link-dest=%p”

    Should this not take into account HOST_SRC, with a maxdepth of 1? i.e.

    find “${SNAPSHOT_DST}/${HOST_SRC}” -maxdepth 1 -type d -name “${NAME}.001″ -printf ” –link-dest=%p”

    Thanks again,
    Dave

    • Francois Scheurer

      Hi Dave!

      Basically you have LNKDST=$(find “${SNAPSHOT_DST}/” -maxdepth 2 -type d -name “${NAME}.001″ -printf ” –link-dest=%p”)
      that may give “–link-dest=/backup/snapshot/mylocalserver.ch/snapshot.001 –link-dest=/backup/snapshot/myremoteserver.ch/snapshot.001″
      if you have two servers to backup, one being remote and one being the localhost.

      So as you see, $HOST_SRC will always belong to that list.
      The reason this list also contains all other hosts is to improve even more the deduplication! :-D
      Not only a file being unmodified accross daily backups will be hard linked instead of copied, but the same file will be hard-linked accross different hosts when possible.
      It means for example, that if you have a central server (we called it the remote backup server) that is backuping 10 remote linux servers, then all system files (like /usr/ etc.) will be saved only one time on the remote backup server.
      11 daily backup x 10 servers x 10GB Linux system files = 1100GB
      But by using hard-links we need only 10GB, reducing the needed disk space by about 99%! that is so cool isn’t?
      Many Thanks to Andrew Tridgell and Paul Mackerras for having written rsync!

  17. Jonathan

    Very nice easy-to-use script!

    I added possibility to have per-host excludes too, like that:
    192,193c192
    < INCLUDE="$SCRIPT_PATH/rsync-include-$HOST_SRC.txt"; [[ -e "$INCLUDE" ]] || INCLUDE="$SCRIPT_PATH/rsync-include.txt"

    348c347
    –include-from=”${SCRIPT_PATH}/rsync-include.txt” \
    393c392
    –include-from=”${SCRIPT_PATH}/rsync-include.txt” \

    Additionaly one must escape the inner quotes and the dollar in the line adding the command restriction and ssh key to a remote host.

    • Francois Scheurer

      Hi Jonathan,

      thx for your comment and additions.
      happy that you use it ;)
      it is basically just another backup-with-rsync-script but i like the rotation with exponential distribution of older backups.
      i think it really helps to have daily backups in the short term and also some sparse backups with 6-12 months age.

      The 2 last lines of your pasted diff should be:

      348c347
      –include-from=”${INCLUDE}” \
      393c392
      –include-from=”${INCLUDE}” \

      • Jonathan

        … not even that exactly, in fact each of these two lines should be added after each of the two ${SCRIPT_PATH} ones. I guess the filtering of the post removed them as they begin with a “less than” sign ;)
        Here with plus and minus :)

        @@ -192 +192,2 @@
        -
        +INCLUDE=”$SCRIPT_PATH/rsync-include-$HOST_SRC.txt”; [[ -e "$INCLUDE" ]] || INCLUDE=”$SCRIPT_PATH/rsync-include.txt”
        +echo “Using include patterns from $INCLUDE”
        @@ -347 +348 @@
        - –include-from=”${SCRIPT_PATH}/rsync-include.txt” \
        + –include-from=”${INCLUDE}” \
        @@ -392 +393 @@
        - –include-from=”${SCRIPT_PATH}/rsync-include.txt” \
        + –include-from=”${INCLUDE}” \

        Thanks for your comment :)

  18. J Wilson

    What a wonder script. I can’t thank you enough for this. I did make some small ajustments to the rsync command (less verbose and make the results more humanly readable -h ) but apart from that it is working really well.

    Thank you for putting this online.

    • Francois.Scheurer

      glad you use it J.Wilson ~(°e°)~

  19. Catalin

    Very nice script, I use it daily however it would be nice to add exclude lists per host as not all servers are alike and an option to specify the SSH port per host as well, as not all servers listen on 22.
    I modified the script to check for rsync-include-{host-name}.txt

    • Francois.Scheurer

      your modification is a good idea Catalin!

  20. this will be nicer :
    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    —2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
    —–3 3 3 3 3 3 3 3
    ——-4 4 4 4 4 4 4
    ———-5 5 5 5 5 5 5
    ————6 6 6
    —————7 7 7
    —————–8 8 8

  21. I tried to do a proof-of-concept of the retention scheme and I got a different result than you. I took 8 snapshots as an example which gave this result :

    1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
    ..2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2
    ….3…3…3…3…3…3…3…3
    ……4…4…4…4…4…4…4..
    ……..5…5…5…5…5…5…5
    ……….6………6………6..
    …………7………7………7
    …………..8………8……..

    So we converge on the column with the set : 1 2 3 5 7, where you get 1 2 3 7 (no 5), normally your script will not delete 4 and then it will be included in the rotation process and transformed to 5. Do you agree ?

    • Francois.Scheurer

      Hi Anouar,

      Yes on the 7th daily backup we will have the set 1 2 3 5 7, where 4 is renamed to 5 as you wrote.
      On the 4th day you will have 1 2 3 4 and not 1 2 4 however.
      See in article I updated the table to show it.
      Best Regards

  22. Hi,

    Thanks for sharing the scripts, they are very interesting.

  23. Thank you very much for sharing this.

    I succeeded in creating the necessary files and got it running. It creates a complete backup with a .000 extension, but rerunning the program just recreates the complete backup with the same .000 extension instead of the expected snapshot.001. Could you give me hints why I get this unexpected behaviour ?

    • Francois.Scheurer

      Hi Wim!

      The ‘snapshot.000′ is created when you start the script and will be renamed to ‘snapshot.001′ at the end of the process.
      So in your case something caused the script to exit before completion.
      Could you send the output of the script with the error message?
      You can also send the output of ‘uname -a; cat /proc/mounts’.
      You may try to disable chattr (modify the script with ‘CHATTR=0′) if this command is not supported on your system.

      Best regards

      • Francois.Scheurer

        I noticed that the version on the blog is not up-to-date and has a bug with the ‘seq’ command, that need an explicit ‘-1′ increment if the serie is decreasing.
        I used previously ‘for i in {512..001}’ which does not need an explicit increment but this is unsupported on older bash.
        I will update the blog this evening with the latest version of the script.

  24. Thanks Francois!

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *