Links

Content Skeleton

This Page

Previous topic

SVN2GIT via git svn clone

Next topic

Bitten Querying

Altbackup

Alternative Simple Backup using scp rather than rsync

Multiple command arguments are accepted togther with configuration options:

altbackup.py --help
altbackup.py check_source
altbackup.py dump check_source transfer purge_target

On target node only:

altbackup.py check_target              # digest recomputation and comparison against sidecar dna
altbackup.py extract_tracdb            # todays tarball if already copied over
altbackup.py extract_tracdb --day -1   # yesterdays tarball
altbackup.py extract_tracdb --day 2013/04/13    
      # specific tarball, only a few days tarballs are retained

altbackup.py examine_tracdb --day -1   # yesterdays tarball

Available commands:

transfer
scp matched files from source to target
purge_target

deletes remote files with the subfold on the targetnode retaining only the last cfg.keep files within each <catfold>/<subfold> Also deletes empty remote directories within the subfold, only one search for empties is made within each subfold so repeated invokation is typically needed to purge all empty directories.

NB assumes lexically sorted file paths are in date order

check_source
find matched files and checks that the sidecar dna matches the locally recomputed dna

Commands which can be run on the configured targetnode only:

check_target
checks that the sidecar dna matches the locally recomputed dna
extract_tracdb
extract trac.db out of the last backup tarball on the targetnode
examine_tracdb
examine trac.db out of the last backup tarball, printing last build times from each of the slaves
dump
print configuration parameters
tls OR ls
list target tarballs on targetnode (unenforced)
sls
list source tarballs on sourcenode (unenforced)

Timings and cron invokation

See commentry in the bash wrapper script altbackup.sh that invokes this python script and handles email notifications of non=zero return codes.

Issues

truncated transfer due to connection timeout

[blyth@cms01 05]$ pwd
/data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05

[blyth@cms01 05]$ find . -type f  -name '*.tar.gz' -exec ls -l {} \;
-rw-r--r--  1 blyth blyth 1529629254 May  8 14:06 ./08/104701/dybsvn.tar.gz
-rw-r--r--  1 blyth blyth 1531229474 May  9 14:01 ./09/104702/dybsvn.tar.gz
-rw-r--r--  1 blyth blyth 1147682816 May 10 14:08 ./10/104701/dybsvn.tar.gz

slow transfer leading to monitor warning

altbackup notification email received at 15:30:

=== altbackup_notify: FAILURE Wed May 29 15:33:12 CST 2013 /home/blyth/cronlog/altbackup.log cms01.phys.ntu.edu.tw 

2013-05-29 15:30:05,374 __main__ INFO     ================================ check_target 
2013-05-29 15:30:05,374 __main__ INFO     alt_check /data/var/scm/alt.backup/dayabay ['dybsvn', 'svnsetup'] : checking sidecar dna matches locally recomputed   
2013-05-29 15:30:05,374 __main__ INFO     looking for ['dybsvn'] source tarballs beneath /data/var/scm/alt.backup/dayabay from 2013/05/29 
2013-05-29 15:30:05,564 __main__ WARNING  SKIPPING AS no dna for path /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/29/104701/dybsvn.tar.gz 
2013-05-29 15:30:05,707 __main__ INFO     found 1 matching tarballs

Repeating the command, dont get the warning:

[blyth@cms01 ~]$ altbackup.py dump check_target

Doing a find reveals the transfer to be more than an hour slower than usual causing the dna not to be in place when the monitoring is done. If this repeats will need to move the cron time:

[blyth@cms01 ~]$ find /data/var/scm/alt.backup/dayabay -name 'dybsvn.tar.gz.dna' -exec ls -l {} \; 
-rw-r--r--  1 blyth blyth 64 May 27 14:24 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/27/104701/dybsvn.tar.gz.dna
-rw-r--r--  1 blyth blyth 64 May 29 15:32 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/29/104701/dybsvn.tar.gz.dna
-rw-r--r--  1 blyth blyth 64 May 28 14:07 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/28/104701/dybsvn.tar.gz.dna

monitor warnings from no dna

In the past week (late Nov 2013) there have been a several days with monitor mail warnings complaining of no dna. Checking on source node at IHEP find an error message in cronlog/altbackup.log:

ssh(15574) ssh: cms01.phys.ntu.edu.tw: Temporary failure in name resolution^M
lost connection

Switch to the IP rather than the name in .ssh/config

bash wrapper for altbackup.py

NB this does not actually do the backup, the ancient scm-backup machinery is still doing that

$ENV_HOME/scm/altbackup.sh $HOME/cronlog/altbackup.log dump check_source transfer purge_target

Crontab examples

On the sending source node:

SHELL=/bin/bash
HOME=/home/blyth
ENV_HOME=/home/blyth/env
CRONLOG_DIR=/home/blyth/cronlog
NODE_TAG_OVERRIDE=WW
MAILTO=blyth@hep1.phys.ntu.edu.tw
#
00 13 * * * ( . $ENV_HOME/env.bash ; env- ; python- source ; ssh-- ; $ENV_HOME/scm/altbackup.sh $HOME/cronlog/altbackup.log dump check_source transfer purge_target  ) > $CRONLOG_DIR/altbackup_.log 2>&1

On the receiving target node:

SHELL=/bin/bash
HOME=/home/blyth
ENV_HOME=/home/blyth/env
CRONLOG_DIR=/home/blyth/cronlog
MAILTO=blyth@hep1.phys.ntu.edu.tw
#
30 15 * * * ( . $ENV_HOME/env.bash ; env- ; python- source ; ssh-- ; $ENV_HOME/scm/altbackup.sh $HOME/cronlog/altbackup.log dump check_target ) > $CRONLOG_DIR/altbackup_.log 2>&1

SSH Debugging

The most common usage issue encountered with this script are bad SSH config preventing automated transfers. The result is typically a hang of script waiting for password input which results in no transfers beoing done.

Note that the NODE_TAG present in the crontab environment is crucial for this, as it is from using this that the appropiate envvars to access the SSH agent are determined:

[dayabay] /home/blyth > cat .ssh-agent-info-$NODE_TAG
SSH_AUTH_SOCK=/tmp/ssh-EcxfAm4848/agent.4848; export SSH_AUTH_SOCK;
SSH_AGENT_PID=4849; export SSH_AGENT_PID;
#echo Agent pid 4849;

[dayabay] /home/blyth > echo $NODE_TAG
Y2

These get set into the envirobment by base-env which is invoked by the sequence of bash functions: env-/env-env/base-/base-env:

[dayabay] /home/blyth/cronlog > t base-env
base-env is a function
base-env ()
{
    local dbg=${1:-0};
    local iwd=$(pwd);
    local sshinfo=$(env-home)/base/ssh-infofile.bash;
    elocal-;
    ssh--;
    case $(uname) in
        DebugSkipDarwin)
            ssh--osx-keychain-sock-export
        ;;
        *)
            source $(env-home)/base/ssh-infofile.bash
        ;;
    esac;
    [ -t 0 ] || return;
    [ "$dbg" == "t0fake" ] && echo faked tzero && return;
    clui-
}

Current Observed Timings, May 2013

  1. 10:47 backups started on source node
  2. 12:00 root (Qiumei) controlled scm backup typically completes before noon, as indicated by timestamps on the dna sidecars on source node
  3. 13:00 source node altbackup starts
  4. 13:00 svn transfers started
  5. 13:40~13:50 svn transfers completed
  6. 13:40~13:50 trac transfers started
  7. 14:00~14:20 trac transfers completed, indicated by timestamps on dna sidecars on target node
  8. 14:30 source node altbackup completes
  9. 15:40 target node check starts
[blyth@cms01 ~]$ altbackup.py ls
2013-05-20 11:28:00,183 env.scm.altbackup INFO     /data/env/local/env/home/bin/altbackup.py ls
2013-05-20 11:28:00,184 env.scm.altbackup INFO     interpreted day string None into 2013/05/20
2013-05-20 11:28:00,185 env.scm.altbackup INFO     ================================ ls
2013-05-20 11:28:00,185 env.scm.altbackup INFO     find /data/var/scm/alt.backup/dayabay -name '*.tar.gz' -exec ls -lh {} \;
2013-05-20 11:28:00,231 env.scm.altbackup INFO
-rw-r--r--  1 blyth blyth 2.4G May 17 13:51 /data/var/scm/alt.backup/dayabay/svn/dybsvn/2013/05/17/104702/dybsvn-20550.tar.gz
-rw-r--r--  1 blyth blyth 2.4G May 18 13:37 /data/var/scm/alt.backup/dayabay/svn/dybsvn/2013/05/18/104702/dybsvn-20557.tar.gz
-rw-r--r--  1 blyth blyth 2.4G May 19 13:38 /data/var/scm/alt.backup/dayabay/svn/dybsvn/2013/05/19/104702/dybsvn-20561.tar.gz
-rw-r--r--  1 blyth blyth 1.5G May 17 14:20 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/17/104702/dybsvn.tar.gz
-rw-r--r--  1 blyth blyth 1.5G May 18 14:01 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/18/104702/dybsvn.tar.gz
-rw-r--r--  1 blyth blyth 1.5G May 19 14:02 /data/var/scm/alt.backup/dayabay/tracs/dybsvn/2013/05/19/104702/dybsvn.tar.gz
-rw-r--r--  1 blyth blyth 7.3K May 17 13:53 /data/var/scm/alt.backup/dayabay/folders/svnsetup/2013/05/17/104702/svnsetup.tar.gz
-rw-r--r--  1 blyth blyth 7.3K May 18 13:38 /data/var/scm/alt.backup/dayabay/folders/svnsetup/2013/05/18/104702/svnsetup.tar.gz
-rw-r--r--  1 blyth blyth 7.3K May 19 13:39 /data/var/scm/alt.backup/dayabay/folders/svnsetup/2013/05/19/104702/svnsetup.tar.gz

Notification

In order to be notified incase of non-zero return codes from the scripts the MAILTO envvar needs to be defined to email addresses in the crontab.