Revision

  $Id: webjob-harvest-df.base,v 1.5 2010/06/28 18:43:26 pab Exp $

Purpose

  This recipe demonstrates how to harvest, normalize, and accrue df
  output locally (i.e., on the client) on a 5-minute basis.  Once an
  hour, this output is transferred to a centralized server where it
  can be further processed and disseminated.

Motivation

  To collect fine-grained output, it my be necessary to run certain
  jobs on a high frequency basis, but having all your WebJob clients
  check in at that same frequency may not be practical -- especially
  for large client populations.  To strike a balance between excessive
  network load and the benefits of centralized management, a two-stage
  approach is needed.  In stage one, output is accrued locally using
  local tools and schedulers; in the second stage, accumulated output
  is periodically transferred to a centralized server where it can be
  further processed and disseminated.  This recipe discusses the
  WebJob approach to solving this problem.

Requirements

  Cooking with this recipe requires an operational WebJob server.  If
  you do not have one of those, refer to the instructions provided in
  the README.INSTALL file that comes with the source distribution.
  The latest source distribution is available here:

    http://sourceforge.net/project/showfiles.php?group_id=40788

  Each client must be running UNIX and have basic system utilities and
  WebJob 1.5.0 or higher installed.

  The server must be running UNIX and have basic system utilities,
  Apache, and WebJob 1.5.0 or higher installed.

  The commands presented throughout this recipe were designed to be
  executed within a Bourne shell (i.e., sh or bash).

  This recipe assumes that you have read and implemented the following
  recipes:

    http://webjob.sourceforge.net/Files/Recipes/webjob-run-periodic.txt

    http://webjob.sourceforge.net/Files/Recipes/webjob-manage-cronjob.txt

    http://webjob.sourceforge.net/Files/Recipes/webjob-pad-rsync.txt

Time to Implement

  Assuming that you have satisfied all the requirements/prerequisites,
  this recipe should take less than one hour to implement.

Solution

  The following steps describe how to implement this recipe.

  1. Set WEBJOB_CLIENT and WEBJOB_COMMANDS as appropriate for your
     server.  Next, extract the harvest_df script at the bottom of
     this recipe, and install it in the appropriate commands
     directory.  If you want this script to be bound to a particular
     client, set WEBJOB_CLIENT as appropriate before running the
     following commands.  Once the file is in place, set its ownership
     and permissions to 0:0 and mode 644, respectively.

       # WEBJOB_CLIENT=common
       # WEBJOB_COMMANDS=/var/webjob/profiles/${WEBJOB_CLIENT}/commands
       # sed -e '1,/^--- harvest_df ---$/d; /^--- harvest_df ---$/,$d' webjob-harvest-df.txt > harvest_df
       # cp harvest_df ${WEBJOB_COMMANDS}/
       # chmod 644 ${WEBJOB_COMMANDS}/harvest_df
       # chown 0:0 ${WEBJOB_COMMANDS}/harvest_df

     Next, make harvest_df.pad so that harvest_df can be deployed
     locally on all clients.  Make the PaD file by running
     pad-make-script as follows:

       # pad-make-script -c ${WEBJOB_COMMANDS}/harvest_df > ${WEBJOB_COMMANDS}/harvest_df.pad

     Note: If you are using DSV, sign the harvest_df.pad file.

  2. Perform this step only if you choose to use rsync (as described
     in the webjob-pad-rsync.txt recipe) to move df data from WebJob
     clients to a WebJob server.  This step assumes that you have read
     and implemented the rsync recipe (webjob-pad-rsync.txt).

     Create a symlink called webjob_rsync_id_df.pad that points to
     webjob_rsync_id.pad.  This step assumes that you defined
     WEBJOB_COMMANDS in step one.

       # ( cd ${WEBJOB_COMMANDS} && ln -s webjob_rsync_id.pad webjob_rsync_id_df.pad )

     Note: If you are using DSV, create a symbolic link to the
     signature file.

       # ln -s webjob_rsync_id.pad.sig webjob_rsync_id_df.pad.sig

     At this point, your commands tree should have, at a minimum, the
     following files:

       commands
         |
         - ...
         - cronjob_manager
         - harvest_df.pad
         - hourly
         - webjob_rsync_id.pad
         - webjob_rsync_id_df.pad -> webjob_rsync_id.pad
         - ...

     Create the following upload directory:

       # mkdir -p /var/rsync.all/df
       # chown rsync:rsync /var/rsync.all/df

  3. This step describes the necessary changes to your hourly (or
     daily) script to ensure that the following occurs on each client:

       - harvest_df is installed
       - harvest_df is kept current
       - harvest_df is executed periodically through cron
       - harvest_df data is rsync'ed back to the WebJob server (optional)

       a. Add the following DeployFile job to your hourly (or daily)
          script to ensure that harvest_df is installed on each
          client.

            DeployFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "/usr/local/bin/harvest_df" "755" "root" "wheel"

       b. Add the following update job to your hourly (or daily)
          script to ensure that harvest_df is installed on each
          client.

            UpdateFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "/usr/local/bin/harvest_df" "755" "root" "wheel" "6522ea668f691de43f97e88bdf8ebc4d" "md5"

          Note: Each time you modify harvest_df, you'll need to create
          a new PaD file and update the MD5 hash shown above.  To
          obtain the MD5 hash of the script run webjob as follows:

            # sh harvest_df.pad webjob -h -t md5 %payload

       c. Decide what the local cron job will be and how often it
          should run.  For this recipe, we'll use the following
          crontab entry:

            0,5,10,15,20,25,30,35,40,45,50,55 * * * * [ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1

          This job runs harvest_df every 5 minutes (if it exists and
          is executable).  Each time the script runs, it executes the
          df utility, normalizes the data, and writes its output to a
          file of the form shown below.  Read the documentation header
          in harvest_df for more detailed usage information.

            /var/rsync/df/<hostname>/<YYYY-MM-DD>.out

          Next, add the following oneshot job to execute harvest_df
          every 5 minutes on all clients.

            ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg cronjob_manager --deploy -t '0,5,10,15,20,25,30,35,40,45,50,55 * * * *' -c '[ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1'

       d. If you are moving harvest_df data using the rsync method,
          add the following job to your hourly (or daily) script to
          transfer each client's output to the WebJob server.  Note:
          these scripts (hourly/daily) must be revision 1.4 or higher,
          or they must be modified to define HOSTNAME.  Here's an
          example job that could be added to the hourly script:

            ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg webjob_rsync_id_df.pad \
                rsync -avze \"ssh -i %payload -o BatchMode=yes -o StrictHostKeyChecking=no\" /var/rsync/df/${HOSTNAME} rsync@server:/var/rsync.all/df/

          Note: The first rsync operation can fail if the source
          directory does not yet exist on the client.

  4. After verifying all clients have deployed the harvest_df
     script and have installed the cron job to run harvest_df, remove
     those two jobs from your hourly (or daily) script.

     You can verify completion of the oneshot cronjob_manager job, by
     inspecting the WebJob output file.  It should look similar to
     this:

       COMMAND_LINE=[ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1
       TIME_SPECIFICATION=0,5,10,15,20,25,30,35,40,45,50,55 * * * *
       USER=root
       --- crontab.bak ---
       existing jobs...
       --- crontab.bak ---
       --- crontab.new ---
       existing jobs...
       0,5,10,15,20,25,30,35,40,45,50,55 * * * * [ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1
       --- crontab.new ---

     This output consists of three parts: runtime variables, a listing
     of the original crontab (crontab.bak), and a listing of the new
     crontab (crontab.new).  The job you inserted should show up at
     the bottom of crontab.new.

Closing Remarks

  This recipe has a number of moving parts and dependencies on other
  recipes.  Rather than putting all these pieces in the hourly script,
  it would be better to create a manager script that handles all the
  details (see Appendix 2).  Then, the manager script could be called
  from hourly (or daily) in a generic fashion.

  There is no requirement for the output to be rsync'ed to the WebJob
  server -- any server that accepts the rsync credentials could be
  used.  Think about that when designing/customizing your final
  solution.  One thing that this implies is that you're not required
  to open up SSH access from the various clients to the WebJob server.

Credits

  This recipe was brought to you by Klayton Monroe and Andy Bair.

References

Appendix 1

--- harvest_df ---
#!/bin/sh
######################################################################
#
# $Id: harvest_df.base,v 1.6 2007/10/08 16:51:28 klm Exp $
#
######################################################################
#
# Copyright 2006-2007 The WebJob Project, All Rights Reserved.
#
######################################################################
#
# NAME
#         harvest_df - harvest, normalize, and print or accrue df data
#
# DESCRIPTION
#         This utility collects, normalizes, and prints or accrues
#         df data.  The df data is presented with a pipe (|) field
#         separator in the following format:
#
#             yyyy-mm-dd|hh:mm:ss|filesystem|blocks|used_blocks|free_blocks|capacity|mount_point
#
#         where:
#
#             YYYY        = four-digit year
#             MM          = two-digit month
#             DD          = two-digit day of the month
#             HH          = two-digit hour
#             MM          = two-digit minute
#             SS          = two-digit second
#             filesystem  = Device path, logical name, NFS share, etc.
#             blocks      = Total number of 1K blocks
#             used_blocks = Total number of 1K blocks used
#             free_blocks = Total number of 1K blocks free
#             capacity    = Percentage of file system that is used
#             mount_point = Path name where the filesystem is mounted
#
# OPTIONS
#         -a
#             Specifying the accrue option causes accrual of df data
#             in a file.  The default value is to not accrue df data.
#
#         -H
#             Specifying the header option causes printing of a
#             header line for the df data.  The default value is to
#             not print a header line.
#
#         -h
#             Show usage and exit (i.e., help).
#
#         -S
#             Specifies that the time is recorded as the number of
#             seconds since the epoch.  This option does not work on
#             all platforms (e.g., Solaris).  If it does work, then
#             the output format will be:
#
#             seconds|filesystem|blocks|used_blocks|free_blocks|capacity|mount_point
#
#         -s
#             Specifying the silent option disables the printing of
#             df data to standard out.  The default value is to not
#             run silent.  This option only has meaning when the
#             accrue option is specified.
#
#         -d
#             Specifies the base output directory for writing df data
#             files.  This option is not mandatory and has a default
#             value of '/var/rsync/df'.
#
#         -u
#             Specifies that the date and time is to be recorded in
#             Universal Coordinated Time (UTC).
#
######################################################################

IFS=' 	
'

PATH=/sbin:/usr/sbin:/usr/local/sbin:/bin:/usr/bin:/usr/local/bin

PROGRAM=`basename $0`

######################################################################
#
# TestPid
#
######################################################################

TestPid()
{
  my_pid="$1"
  my_pid_regexp="^[0-9]+$"
  echo "${my_pid}" | egrep "${my_pid_regexp}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then # The PID is valid.
    return 0;
  fi
  return 1; # The PID is not valid.
}

######################################################################
#
# CreateLockFile
#
######################################################################

CreateLockFile()
{
  my_lock_file="$1"
  # Customize ln(1) options based on the OS.
  case `uname -s` in
  NIKOS) # This OS is so old it doesn't support '-n'.
    ln_options=
    ;;
  *)
    ln_options="-n"
    ;;
  esac
  if [ -z "${my_lock_file}" ] ; then
    return 1 # Rats, we didn't even get to the gate.
  fi
  my_old_umask=`umask`
  umask 022
  my_lock_dir=`dirname "${my_lock_file}"`
  if [ ! -d "${my_lock_dir}" ] ; then
    mkdir -p "${my_lock_dir}"
    if [ $? -ne 0 ] ; then
      return 1 # Rats, we got bushwhacked.
    fi
  fi
  umask 077
  my_temp_file="${my_lock_file}.$$"
  echo $$ | cat - > "${my_temp_file}"
  if [ $? -ne 0 ] ; then
    return 1 # Rats, we didn't even get out of the gate.
  fi
  umask ${my_old_umask}
  ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then
    rm -f "${my_temp_file}"
    return 0 # Ding ding ding, we have a winner.
  fi
  my_old_pid=`head -1 "${my_lock_file}"`
  TestPid "${my_old_pid}"
  if [ $? -eq 0 ] ; then
    kill -0 ${my_old_pid} > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
      rm -f "${my_temp_file}"
      return 1 # Rats, the lock is in use.
    fi
  fi
  # At this point, the lock is corrupt, stale, or owned by a different
  # user. Attempt to delete it, and go for the gold.
  rm -f "${my_lock_file}"
  ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then
    rm -f "${my_temp_file}"
    return 0 # Ding ding ding, we have a winner.
  fi
  rm -f "${my_temp_file}"
  return 1 # Rats, someone else got there first.
}

######################################################################
#
# DeleteLockFile
#
######################################################################

DeleteLockFile()
{
  my_lock_file="$1"
  if [ -n "${my_lock_file}" -a -f "${my_lock_file}" ] ; then
    my_old_pid=`head -1 "${my_lock_file}"`
    TestPid "${my_old_pid}"
    if [ $? -eq 0 ] ; then
      if [ ${my_old_pid} -eq $$ ] ; then
        rm -f "${my_lock_file}"
      fi
    fi
  fi
  return 0
}

######################################################################
#
# AddLineItem
#
######################################################################

AddLineItem()
{
  my_body="$1"
  my_line="$2"
  my_count="$3"

  if [ ${my_count} -eq 1 ] ; then
    my_body="${my_line}"
  else
    my_body="${my_body}
${my_line}"
  fi
  echo "${my_body}"
}

######################################################################
#
# Usage
#
######################################################################

Usage()
{
  echo 1>&2
  echo "Usage: ${PROGRAM} [-aHhSsu] [-d outdir]" 1>&2
  echo 1>&2
  exit 1
}

######################################################################
#
# Process command line arguments.
#
######################################################################

target="df"

accrue="0"

base_outdir="/var/rsync/${target}"

print_header="0"

run_silent="0"

use_seconds="0"

utc_option=""

while getopts "ad:HhSsu" OPTION ; do
  case "${OPTION}" in
  a)
    accrue="1"
    ;;
  d)
    base_outdir="${OPTARG}"
    ;;
  H)
    print_header="1"
    ;;
  h)
    Usage
    ;;
  S)
    use_seconds="1"
    ;;
  s)
    run_silent="1"
    ;;
  u)
    utc_option="-u"
    ;;
  *)
    Usage
    ;;
  esac
done

if [ ${OPTIND} -le $# ] ; then
  Usage
fi

######################################################################
#
# Initialize working variables.
#
######################################################################

hostname=`hostname | cut -d. -f1`

blocks_regexp="[0-9]+"

filesystem_regexp="[^|]+"

hms_regexp="[0-9][0-9]:[0-9][0-9]:[0-9][0-9]"

mount_point_regexp="/[^|]*"

percent_regexp="${blocks_regexp}%"

seconds_regexp="[0-9]+"

ymd_regexp="[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]"

date=`date ${utc_option} "+%Y-%m-%d"`

target_header="filesystem|blocks|used_blocks|free_blocks|capacity|mount_point"

target_regexp="${filesystem_regexp}[|]${blocks_regexp}[|]${blocks_regexp}[|]${blocks_regexp}[|]${percent_regexp}[|]${mount_point_regexp}"

if [ ${use_seconds} -eq 1 ] ; then
  seconds=`date ${utc_option} "+%s"`
  header="seconds|${target_header}"
  output_prefix="${seconds}"
  output_regexp="${seconds_regexp}[|]${target_regexp}"
else
  time=`date ${utc_option} "+%H:%M:%S"`
  header="date|time|${target_header}"
  output_prefix="${date}|${time}"
  output_regexp="${ymd_regexp}[|]${hms_regexp}[|]${target_regexp}"
fi

######################################################################
#
# Harvest, preprocess, and check (via regexp) target data.
#
######################################################################

OLD_IFS=${IFS} # Limit IFS to newlines only inside the for loop.
IFS='
'

GNU_DF_VERSION=`df --version 2> /dev/null`
if [ $? -eq 0 -a -n "${GNU_DF_VERSION}" ] ; then
  target_command=${target_command-"df -k -l --portability"}
else
  target_command=${target_command-"df -k -l"}
fi

count=1
for line in `eval ${target_command} | sed '1d' 2> /dev/null | awk '{print $1"|"$2"|"$3"|"$4"|"$5"|"$6}'` ; do
  output_line="${output_prefix}|${line}"
  echo "${output_line}" | egrep "${output_regexp}" > /dev/null 2>&1
  if [ $? -ne 0 ] ; then
    echo "${PROGRAM}: Error='Output (${output_line}) does not meet basic syntax checks.'" 1>&2
    exit 2
  fi
  output=`AddLineItem "${output}" "${output_line}" "${count}"`
  count=`expr ${count} + 1`
done
IFS=${OLD_IFS}

######################################################################
#
# Write output according to the user's specified options.
#
######################################################################

if [ ${accrue} -eq 1 ] ; then
  out_name=${date}.out
  out_path=${base_outdir}/${hostname}
  out_file=${out_path}/${out_name}
  if [ ! -d ${out_path} ] ; then
    mkdir -p ${out_path} || exit 2
  fi
  lock_file=${out_path}/harvest_${target}.pid
  CreateLockFile "${lock_file}"
  if [ $? -ne 0 ] ; then
    echo "${PROGRAM}: Error='Unable to secure a lock file. Job aborted.'" 1>&2
    exit 2
  fi
  if [ ${run_silent} -eq 1 ] ; then
    echo "${output}" >> ${out_file}
  else
    echo "${output}" | tee -a ${out_file}
  fi
  DeleteLockFile "${lock_file}"
else
  if [ ${print_header} -eq 1 ] ; then
    echo "${header}"
  fi
  echo "${output}"
fi
--- harvest_df ---

Appendix 2

  The following script combines all the management elements of this
  recipe (and then some) into one place.  To use this script, read
  through it and change the variables to suit your environment.  At a
  minimum, you should check/set the RSYNC_SERVER variable.  Note that
  this variable can be set at run time through the '-r' command line
  option.  You also need to ensure that the MD5 for harvest_df.pad is
  correct.

  To deploy everything on your clients, use the following oneshot job:

    ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m deploy-all

  Use the following job to periodically check/update harvest_df, rsync
  output, and prune old files:

    ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m periodic -r <rsync-server>

  To remove everything from your clients, use the following oneshot
  job:

    ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m remove-all

  To extract this script from the recipe, run the following command:

    $ sed -e '1,/^--- harvest_df_manager ---$/d; /^--- harvest_df_manager ---$/,$d' webjob-harvest-df.txt > harvest_df_manager

  Note: If you are using DSV, sign the harvest_df_manager script after
  installing it on the WebJob server.

--- harvest_df_manager ---
#!/bin/sh
######################################################################
#
# $Id: harvest_df_manager.base,v 1.17 2010/12/10 05:41:12 klm Exp $
#
######################################################################
#
# Copyright 2005-2007 The WebJob Project, All Rights Reserved.
#
######################################################################
#
# Purpose: Manage harvest_df deployments via webjob.
#
######################################################################

IFS=' 	
'

PATH=/sbin:/usr/sbin:/usr/local/sbin:/bin:/usr/bin:/usr/local/bin

PROGRAM=`basename $0`

HOSTNAME=`hostname | awk -F. '{print $1}'`

######################################################################
#
# TestPid
#
######################################################################

TestPid()
{
  my_pid="$1"
  my_pid_regexp="^[0-9]+$"
  echo "${my_pid}" | egrep "${my_pid_regexp}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then # The PID is valid.
    return 0;
  fi
  return 1; # The PID is not valid.
}

######################################################################
#
# CreateLockFile
#
######################################################################

CreateLockFile()
{
  my_lock_file="$1"
  # Customize ln(1) options based on the OS.
  case `uname -s` in
  NIKOS) # This OS is so old it doesn't support '-n'.
    ln_options=
    ;;
  *)
    ln_options="-n"
    ;;
  esac
  if [ -z "${my_lock_file}" ] ; then
    return 1 # Rats, we didn't even get to the gate.
  fi
  my_old_umask=`umask`
  umask 022
  my_lock_dir=`dirname "${my_lock_file}"`
  if [ ! -d "${my_lock_dir}" ] ; then
    mkdir -p "${my_lock_dir}"
    if [ $? -ne 0 ] ; then
      return 1 # Rats, we got bushwhacked.
    fi
  fi
  umask 077
  my_temp_file="${my_lock_file}.$$"
  echo $$ | cat - > "${my_temp_file}"
  if [ $? -ne 0 ] ; then
    return 1 # Rats, we didn't even get out of the gate.
  fi
  umask ${my_old_umask}
  ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then
    rm -f "${my_temp_file}"
    return 0 # Ding ding ding, we have a winner.
  fi
  my_old_pid=`head -1 "${my_lock_file}"`
  TestPid "${my_old_pid}"
  if [ $? -eq 0 ] ; then
    kill -0 ${my_old_pid} > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
      rm -f "${my_temp_file}"
      return 1 # Rats, the lock is in use.
    fi
  fi
  # At this point, the lock is corrupt, stale, or owned by a different
  # user. Attempt to delete it, and go for the gold.
  rm -f "${my_lock_file}"
  ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1
  if [ $? -eq 0 ] ; then
    rm -f "${my_temp_file}"
    return 0 # Ding ding ding, we have a winner.
  fi
  rm -f "${my_temp_file}"
  return 1 # Rats, someone else got there first.
}

######################################################################
#
# DeleteLockFile
#
######################################################################

DeleteLockFile()
{
  my_lock_file="$1"
  if [ -n "${my_lock_file}" -a -f "${my_lock_file}" ] ; then
    my_old_pid=`head -1 "${my_lock_file}"`
    TestPid "${my_old_pid}"
    if [ $? -eq 0 ] ; then
      if [ ${my_old_pid} -eq $$ ] ; then
        rm -f "${my_lock_file}"
      fi
    fi
  fi
  return 0
}

######################################################################
#
# DeployFile <config> <pad> <path> <mode> <owner> <group>
#
######################################################################

DeployFile()
{
  MY_CFG_FILE=$1
  MY_PAD_FILE=$2
  MY_TARGET_PATH=$3 # Full path including filename.
  MY_TARGET_MODE=$4
  MY_TARGET_OWNER=$5
  MY_TARGET_GROUP=$6

  MY_TARGET_DIR=`dirname ${MY_TARGET_PATH}`

  MY_MKDIR_CMD="{ umask 022 ; mkdir -p ${MY_TARGET_DIR} ; }"
  MY_CP_CMD="{ cp %payload ${MY_TARGET_PATH} ; }"
  MY_CHMOD_CMD="{ chmod ${MY_TARGET_MODE} ${MY_TARGET_PATH} ; }"
  MY_CHOWN_CMD="{ chown ${MY_TARGET_OWNER}:${MY_TARGET_GROUP} ${MY_TARGET_PATH} ; }"
  MY_RM_CMD="{ rm -f %payload ; }"

  MY_ID=`id | sed 's/^uid=\([0-9]\{1,5\}\)(.*$/\1/;'`

  if [ "${MY_ID}"X = "0"X ] ; then
    MY_PAD_CMD="{ ${MY_MKDIR_CMD} && ${MY_CP_CMD} && ${MY_CHMOD_CMD} && ${MY_CHOWN_CMD} } ; ${MY_RM_CMD}"
  else
    MY_PAD_CMD="{ ${MY_MKDIR_CMD} && ${MY_CP_CMD} && ${MY_CHMOD_CMD} } ; ${MY_RM_CMD}"
  fi

  if [ ! -f ${MY_TARGET_PATH} ] ; then
    webjob -e -f ${MY_CFG_FILE} ${MY_PAD_FILE} ${MY_PAD_CMD}
  fi
}

######################################################################
#
# UpdateFile <config> <pad> <path> <mode> <owner> <group> <hash> <type>
#
######################################################################

UpdateFile()
{
  MY_CFG_FILE=$1
  MY_PAD_FILE=$2
  MY_TARGET_PATH=$3 # Full path including filename.
  MY_TARGET_MODE=$4
  MY_TARGET_OWNER=$5
  MY_TARGET_GROUP=$6
  MY_TARGET_HASH=$7
  MY_DIGEST_TYPE=$8 # (MD5|SHA1)

  MY_CP_CMD="{ cp -f %payload ${MY_TARGET_PATH} ; }"
  MY_CHMOD_CMD="{ chmod ${MY_TARGET_MODE} ${MY_TARGET_PATH} ; }"
  MY_CHOWN_CMD="{ chown ${MY_TARGET_OWNER}:${MY_TARGET_GROUP} ${MY_TARGET_PATH} ; }"
  MY_RM_CMD="{ rm -f %payload ; }"

  MY_ID=`id | sed 's/^uid=\([0-9]\{1,5\}\)(.*$/\1/;'`

  if [ "${MY_ID}"X = "0"X ] ; then
    MY_PAD_CMD="{ ${MY_CP_CMD} && ${MY_CHMOD_CMD} && ${MY_CHOWN_CMD} } ; ${MY_RM_CMD}"
  else
    MY_PAD_CMD="{ ${MY_CP_CMD} && ${MY_CHMOD_CMD} } ; ${MY_RM_CMD}"
  fi

  if [ -f ${MY_TARGET_PATH} ] ; then
    MY_ACTUAL_HASH=`webjob -h -t ${MY_DIGEST_TYPE} ${MY_TARGET_PATH}`
    if [ "${MY_ACTUAL_HASH}" != "${MY_TARGET_HASH}" ] ; then
      webjob -e -f ${MY_CFG_FILE} ${MY_PAD_FILE} ${MY_PAD_CMD}
    fi
  fi
}

######################################################################
#
# DeployFiles
#
######################################################################

DeployFiles()
{
  MY_HARVEST_PREFIX=$1

  echo "DeployFiles() ..."
  if [ -z "${MY_HARVEST_PREFIX}" ] ; then
    false
  else
    DeployFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "${MY_HARVEST_PREFIX}/bin/harvest_df" "755" "root" "wheel"
  fi
}

######################################################################
#
# UpdateFiles
#
######################################################################

UpdateFiles()
{
  MY_HARVEST_PREFIX=$1

  echo "UpdateFiles() ..."
  if [ -z "${MY_HARVEST_PREFIX}" ] ; then
    false
  else
    UpdateFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "${MY_HARVEST_PREFIX}/bin/harvest_df" "755" "root" "wheel" "6522ea668f691de43f97e88bdf8ebc4d" "md5"
  fi
}

######################################################################
#
# RemoveFiles
#
######################################################################

RemoveFiles()
{
  MY_HARVEST_PREFIX=$1

  echo "RemoveFiles() ..."
  if [ -z "${MY_HARVEST_PREFIX}" ] ; then
    false
  else
    rm -f ${MY_HARVEST_PREFIX}/bin/harvest_df
  fi
}

######################################################################
#
# DeployCronjob
#
######################################################################

DeployCronjob()
{
  MY_COMMAND=$1
  MY_TIME_SPECIFICATION=$2
  MY_COMMAND_LINE=$3
  MY_CHECK_FIRST=${4-0}

  if [ X"${MY_CHECK_FIRST}" != X"0" ] ; then
    MY_CRONTAB_ENTRIES=`crontab -l`
    if [ $? -eq 0 -a -n "${MY_CRONTAB_ENTRIES}" ] ; then # Command succeeded, and we have data.
      echo "${MY_CRONTAB_ENTRIES}" | egrep -v "^#" | egrep "${MY_COMMAND}" > /dev/null 2>&1
      if [ $? -eq 0 ] ; then # One or more matches were found, so don't deploy.
        echo "DeployCronjob() ... skipped"
        return 0
      fi
    fi
  fi

  echo "DeployCronjob() ..."
  if [ -z "${MY_TIME_SPECIFICATION}" -o -z "${MY_COMMAND_LINE}" ] ; then
    false
  else
    webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload cronjob-manager --deploy -t "${MY_TIME_SPECIFICATION}" -c "${MY_COMMAND_LINE}"
  fi
}

######################################################################
#
# RemoveCronjob
#
######################################################################

RemoveCronjob()
{
  MY_EXPRESSION=$1

  echo "RemoveCronjob() ..."
  if [ -z "${MY_EXPRESSION}" ] ; then
    false
  else
    webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload cronjob-manager --remove -e "${MY_EXPRESSION}"
  fi
}

######################################################################
#
# PruneOutput
#
######################################################################

PruneOutput()
{
  MY_OUTDIR=$1
  MY_PRUNE_FLAG=$2
  MY_PRUNE_DAYS_TO_KEEP=${3-30}

  echo "PruneOutput() ..."
  if [ -z "${MY_OUTDIR}" ] ; then
    false
  else
    if [ -f ${MY_PRUNE_FLAG} ] ; then
      find ${MY_OUTDIR}/${HOSTNAME} -type f -a -mtime +${MY_PRUNE_DAYS_TO_KEEP} -print -exec rm -f {} \;
      rm -f ${MY_PRUNE_FLAG}
    else
      echo "${PROGRAM}: Warning='${MY_PRUNE_FLAG} must exist for pruning to work (check for errors).'" 1>&2
    fi
  fi
}

######################################################################
#
# RsyncOutput
#
######################################################################

RsyncOutput()
{
  MY_RSYNC_SERVER=$1
  MY_RSYNC_DIR=$2
  MY_RSYNC_ALL_DIR=$3
  MY_PRUNE_FLAG=$4

  echo "RsyncOutput() ..."
  if [ -z "${MY_RSYNC_SERVER}" -o -z "${MY_RSYNC_DIR}" ] ; then
    false
  else
    MY_SSH=`ssh -V 2>&1 | egrep "(Sun_|Open)SSH" > /dev/null && echo OpenISH`
    case "${MY_SSH}" in
    OpenISH)
      webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload webjob_rsync_id_df.open.pad \
        rsync -avze \"ssh -i %payload -o BatchMode=yes -o StrictHostKeyChecking=no\" ${MY_RSYNC_DIR}/${HOSTNAME} rsync@${MY_RSYNC_SERVER}:${MY_RSYNC_ALL_DIR}/ \&\& \
        touch ${MY_PRUNE_FLAG} # This signifies that it's ok to prune.
      ;;
    *)
      webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload webjob_rsync_id_df.pad \
        MY_PAD_CWD=\`pwd\` \&\& \
        echo \"IdKey \${MY_PAD_CWD}/%payload\" \> \${MY_PAD_CWD}/%payload.id \&\& ssh-keygen -D \${MY_PAD_CWD}/%payload \&\& \
        rsync -avze \"ssh -i \${MY_PAD_CWD}/%payload.id -o BatchMode=yes -o StrictHostKeyChecking=no\" \
          ${MY_RSYNC_DIR}/${HOSTNAME} rsync@${MY_RSYNC_SERVER}:${MY_RSYNC_ALL_DIR}/ \&\& \
          touch ${MY_PRUNE_FLAG} \; \
          rm -f \${MY_PAD_CWD}/%payload.id \${MY_PAD_CWD}/%payload.pub
      ;;
    esac
  fi
}

######################################################################
#
# RunHarvester
#
######################################################################

RunHarvester()
{
  MY_HARVESTER=$1
  MY_PRINT_HEADER=${2-0}

  echo "RunHarvester() ..."
  if [ "${MY_PRINT_HEADER}"X = "1"X ] ; then
    MY_HARVESTER_OPTS="-H"
  else
    MY_HARVESTER_OPTS=
  fi
  echo "--- HARVEST_BEGIN ---"
  ${MY_HARVESTER} ${MY_HARVESTER_OPTS}
  echo "--- HARVEST_END ---"
}

######################################################################
#
# Usage
#
######################################################################

Usage()
{
  echo 1>&2
  echo "Usage: ${PROGRAM} [-H webjob-home] [-k days] [-l lock-file] [-r rsync-server] -m {deploy-{all|cronjob|files}|periodic|prune-output|reload-cronjob|remove-{all|cronjob|files}|rsync-output|run-harvester|update-files}" 1>&2
  echo 1>&2
  exit 1
}

######################################################################
#
# Main
#
######################################################################

LOCK_FILE=/var/run/harvest_df_manager.pid

PRUNE_DAYS_TO_KEEP=30

PRUNE_DAYS_TO_KEEP_REGEXP="[0-9]+"

RSYNC_SERVER="" # INSERT THE HOSTNAME OR ADDRESS OF YOUR RSYNC SERVER HERE, OR USE '-r' COMMAND LINE OPTION.

RUN_MODE=

while getopts "H:k:l:m:r:" OPTION ; do
  case "${OPTION}" in
  H)
    WEBJOB_HOME="${OPTARG}"
    ;;
  k)
    PRUNE_DAYS_TO_KEEP="${OPTARG}"
    ;;
  l)
    LOCK_FILE="${OPTARG}"
    ;;
  m)
    RUN_MODE="${OPTARG}"
    ;;
  r)
    RSYNC_SERVER="${OPTARG}"
    ;;
  *)
    Usage
    ;;
  esac
done

if [ ${OPTIND} -le $# ] ; then
  Usage
fi

if [ -z "${RUN_MODE}" ] ; then
  Usage
else
  case "${RUN_MODE}" in
   deploy-all\
  |deploy-cronjob\
  |deploy-files\
  |periodic\
  |prune-output\
  |reload-cronjob\
  |remove-all\
  |remove-cronjob\
  |remove-files\
  |rsync-output\
  |run-harvester\
  |update-files\
  )
    : # Run mode is valid.
    ;;
  *)
    echo "${PROGRAM}: Error='The specified run mode (${RUN_MODE}) is not supported.'" 1>&2
    exit 2
    ;;
  esac
fi

if [ -z "${LOCK_FILE}" ] ; then
  echo "${PROGRAM}: Error='The lock file has been reduced to an empty string.'" 1>&2
  exit 2
fi

echo "${PRUNE_DAYS_TO_KEEP}" | egrep "${PRUNE_DAYS_TO_KEEP_REGEXP}" > /dev/null 2>&1
if [ $? -ne 0 ] ; then
  echo "${PROGRAM}: Error='The number of days to keep (${PRUNE_DAYS_TO_KEEP}) does not meet basic syntax checks.'" 1>&2
  exit 2
fi

PATH=${WEBJOB_HOME=/usr/local/webjob}/bin:${PATH} ; export PATH

HARVEST_PREFIX=/usr/local

HARVEST_OUTDIR=/var/rsync/df

HARVEST_COMMAND=${HARVEST_PREFIX}/bin/harvest_df

HARVEST_COMMAND_LINE="[ -x ${HARVEST_COMMAND} ] && ${HARVEST_COMMAND} -as > /dev/null 2>&1"

HARVEST_TIME_SPECIFICATION="0,5,10,15,20,25,30,35,40,45,50,55 * * * *"

PRUNE_FLAG=/var/run/df.prune

RSYNC_ALL_DIR=/var/rsync.all/df

RSYNC_DIR=${HARVEST_OUTDIR}

CreateLockFile "${LOCK_FILE}"
if [ $? -ne 0 ] ; then
  echo "${PROGRAM}: Error='Unable to secure a lock file.'" 1>&2
  exit 2
fi

case "${RUN_MODE}" in
deploy-all)
  DeployFiles "${HARVEST_PREFIX}"
  DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1"
  ;;
deploy-cronjob)
  DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1"
  ;;
deploy-files)
  DeployFiles "${HARVEST_PREFIX}"
  ;;
periodic)
  DeployFiles "${HARVEST_PREFIX}"
  UpdateFiles "${HARVEST_PREFIX}"
  DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1"
  RunHarvester "${HARVEST_COMMAND}" "1"
  if [ -n "${RSYNC_SERVER}" ] ; then
    RsyncOutput "${RSYNC_SERVER}" "${RSYNC_DIR}" "${RSYNC_ALL_DIR}" "${PRUNE_FLAG}"
    PruneOutput "${HARVEST_OUTDIR}" "${PRUNE_FLAG}" "${PRUNE_DAYS_TO_KEEP}"
  fi
  ;;
prune-output)
  touch ${PRUNE_FLAG} # This is a forced prune.
  PruneOutput "${HARVEST_OUTDIR}" "${PRUNE_FLAG}" "${PRUNE_DAYS_TO_KEEP}"
  ;;
reload-cronjob)
  RemoveCronjob "${HARVEST_COMMAND_LINE}"
  DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1"
  ;;
remove-all)
  RemoveFiles "${HARVEST_PREFIX}"
  RemoveCronjob "${HARVEST_COMMAND_LINE}"
  ;;
remove-cronjob)
  RemoveCronjob "${HARVEST_COMMAND_LINE}"
  ;;
remove-files)
  RemoveFiles "${HARVEST_PREFIX}"
  ;;
rsync-output)
  if [ -n "${RSYNC_SERVER}" ] ; then
    RsyncOutput "${RSYNC_SERVER}" "${RSYNC_DIR}" "${RSYNC_ALL_DIR}" "${PRUNE_FLAG}"
  else
    echo "${PROGRAM}: Error='The rsync server is not defined. Use the \"-r\" option to specify.'" 1>&2
    exit 2
  fi
  ;;
run-harvester)
  RunHarvester "${HARVEST_COMMAND}" "1"
  ;;
update-files)
  UpdateFiles "${HARVEST_PREFIX}"
  ;;
esac

DeleteLockFile "${LOCK_FILE}"
--- harvest_df_manager ---