Revision $Id: webjob-harvest-df.base,v 1.5 2010/06/28 18:43:26 pab Exp $ Purpose This recipe demonstrates how to harvest, normalize, and accrue df output locally (i.e., on the client) on a 5-minute basis. Once an hour, this output is transferred to a centralized server where it can be further processed and disseminated. Motivation To collect fine-grained output, it my be necessary to run certain jobs on a high frequency basis, but having all your WebJob clients check in at that same frequency may not be practical -- especially for large client populations. To strike a balance between excessive network load and the benefits of centralized management, a two-stage approach is needed. In stage one, output is accrued locally using local tools and schedulers; in the second stage, accumulated output is periodically transferred to a centralized server where it can be further processed and disseminated. This recipe discusses the WebJob approach to solving this problem. Requirements Cooking with this recipe requires an operational WebJob server. If you do not have one of those, refer to the instructions provided in the README.INSTALL file that comes with the source distribution. The latest source distribution is available here: http://sourceforge.net/project/showfiles.php?group_id=40788 Each client must be running UNIX and have basic system utilities and WebJob 1.5.0 or higher installed. The server must be running UNIX and have basic system utilities, Apache, and WebJob 1.5.0 or higher installed. The commands presented throughout this recipe were designed to be executed within a Bourne shell (i.e., sh or bash). This recipe assumes that you have read and implemented the following recipes: http://webjob.sourceforge.net/Files/Recipes/webjob-run-periodic.txt http://webjob.sourceforge.net/Files/Recipes/webjob-manage-cronjob.txt http://webjob.sourceforge.net/Files/Recipes/webjob-pad-rsync.txt Time to Implement Assuming that you have satisfied all the requirements/prerequisites, this recipe should take less than one hour to implement. Solution The following steps describe how to implement this recipe. 1. Set WEBJOB_CLIENT and WEBJOB_COMMANDS as appropriate for your server. Next, extract the harvest_df script at the bottom of this recipe, and install it in the appropriate commands directory. If you want this script to be bound to a particular client, set WEBJOB_CLIENT as appropriate before running the following commands. Once the file is in place, set its ownership and permissions to 0:0 and mode 644, respectively. # WEBJOB_CLIENT=common # WEBJOB_COMMANDS=/var/webjob/profiles/${WEBJOB_CLIENT}/commands # sed -e '1,/^--- harvest_df ---$/d; /^--- harvest_df ---$/,$d' webjob-harvest-df.txt > harvest_df # cp harvest_df ${WEBJOB_COMMANDS}/ # chmod 644 ${WEBJOB_COMMANDS}/harvest_df # chown 0:0 ${WEBJOB_COMMANDS}/harvest_df Next, make harvest_df.pad so that harvest_df can be deployed locally on all clients. Make the PaD file by running pad-make-script as follows: # pad-make-script -c ${WEBJOB_COMMANDS}/harvest_df > ${WEBJOB_COMMANDS}/harvest_df.pad Note: If you are using DSV, sign the harvest_df.pad file. 2. Perform this step only if you choose to use rsync (as described in the webjob-pad-rsync.txt recipe) to move df data from WebJob clients to a WebJob server. This step assumes that you have read and implemented the rsync recipe (webjob-pad-rsync.txt). Create a symlink called webjob_rsync_id_df.pad that points to webjob_rsync_id.pad. This step assumes that you defined WEBJOB_COMMANDS in step one. # ( cd ${WEBJOB_COMMANDS} && ln -s webjob_rsync_id.pad webjob_rsync_id_df.pad ) Note: If you are using DSV, create a symbolic link to the signature file. # ln -s webjob_rsync_id.pad.sig webjob_rsync_id_df.pad.sig At this point, your commands tree should have, at a minimum, the following files: commands | - ... - cronjob_manager - harvest_df.pad - hourly - webjob_rsync_id.pad - webjob_rsync_id_df.pad -> webjob_rsync_id.pad - ... Create the following upload directory: # mkdir -p /var/rsync.all/df # chown rsync:rsync /var/rsync.all/df 3. This step describes the necessary changes to your hourly (or daily) script to ensure that the following occurs on each client: - harvest_df is installed - harvest_df is kept current - harvest_df is executed periodically through cron - harvest_df data is rsync'ed back to the WebJob server (optional) a. Add the following DeployFile job to your hourly (or daily) script to ensure that harvest_df is installed on each client. DeployFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "/usr/local/bin/harvest_df" "755" "root" "wheel" b. Add the following update job to your hourly (or daily) script to ensure that harvest_df is installed on each client. UpdateFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "/usr/local/bin/harvest_df" "755" "root" "wheel" "6522ea668f691de43f97e88bdf8ebc4d" "md5" Note: Each time you modify harvest_df, you'll need to create a new PaD file and update the MD5 hash shown above. To obtain the MD5 hash of the script run webjob as follows: # sh harvest_df.pad webjob -h -t md5 %payload c. Decide what the local cron job will be and how often it should run. For this recipe, we'll use the following crontab entry: 0,5,10,15,20,25,30,35,40,45,50,55 * * * * [ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1 This job runs harvest_df every 5 minutes (if it exists and is executable). Each time the script runs, it executes the df utility, normalizes the data, and writes its output to a file of the form shown below. Read the documentation header in harvest_df for more detailed usage information. /var/rsync/df//.out Next, add the following oneshot job to execute harvest_df every 5 minutes on all clients. ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg cronjob_manager --deploy -t '0,5,10,15,20,25,30,35,40,45,50,55 * * * *' -c '[ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1' d. If you are moving harvest_df data using the rsync method, add the following job to your hourly (or daily) script to transfer each client's output to the WebJob server. Note: these scripts (hourly/daily) must be revision 1.4 or higher, or they must be modified to define HOSTNAME. Here's an example job that could be added to the hourly script: ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg webjob_rsync_id_df.pad \ rsync -avze \"ssh -i %payload -o BatchMode=yes -o StrictHostKeyChecking=no\" /var/rsync/df/${HOSTNAME} rsync@server:/var/rsync.all/df/ Note: The first rsync operation can fail if the source directory does not yet exist on the client. 4. After verifying all clients have deployed the harvest_df script and have installed the cron job to run harvest_df, remove those two jobs from your hourly (or daily) script. You can verify completion of the oneshot cronjob_manager job, by inspecting the WebJob output file. It should look similar to this: COMMAND_LINE=[ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1 TIME_SPECIFICATION=0,5,10,15,20,25,30,35,40,45,50,55 * * * * USER=root --- crontab.bak --- existing jobs... --- crontab.bak --- --- crontab.new --- existing jobs... 0,5,10,15,20,25,30,35,40,45,50,55 * * * * [ -x /usr/local/bin/harvest_df ] && /usr/local/bin/harvest_df -as > /dev/null 2>&1 --- crontab.new --- This output consists of three parts: runtime variables, a listing of the original crontab (crontab.bak), and a listing of the new crontab (crontab.new). The job you inserted should show up at the bottom of crontab.new. Closing Remarks This recipe has a number of moving parts and dependencies on other recipes. Rather than putting all these pieces in the hourly script, it would be better to create a manager script that handles all the details (see Appendix 2). Then, the manager script could be called from hourly (or daily) in a generic fashion. There is no requirement for the output to be rsync'ed to the WebJob server -- any server that accepts the rsync credentials could be used. Think about that when designing/customizing your final solution. One thing that this implies is that you're not required to open up SSH access from the various clients to the WebJob server. Credits This recipe was brought to you by Klayton Monroe and Andy Bair. References Appendix 1 --- harvest_df --- #!/bin/sh ###################################################################### # # $Id: harvest_df.base,v 1.6 2007/10/08 16:51:28 klm Exp $ # ###################################################################### # # Copyright 2006-2007 The WebJob Project, All Rights Reserved. # ###################################################################### # # NAME # harvest_df - harvest, normalize, and print or accrue df data # # DESCRIPTION # This utility collects, normalizes, and prints or accrues # df data. The df data is presented with a pipe (|) field # separator in the following format: # # yyyy-mm-dd|hh:mm:ss|filesystem|blocks|used_blocks|free_blocks|capacity|mount_point # # where: # # YYYY = four-digit year # MM = two-digit month # DD = two-digit day of the month # HH = two-digit hour # MM = two-digit minute # SS = two-digit second # filesystem = Device path, logical name, NFS share, etc. # blocks = Total number of 1K blocks # used_blocks = Total number of 1K blocks used # free_blocks = Total number of 1K blocks free # capacity = Percentage of file system that is used # mount_point = Path name where the filesystem is mounted # # OPTIONS # -a # Specifying the accrue option causes accrual of df data # in a file. The default value is to not accrue df data. # # -H # Specifying the header option causes printing of a # header line for the df data. The default value is to # not print a header line. # # -h # Show usage and exit (i.e., help). # # -S # Specifies that the time is recorded as the number of # seconds since the epoch. This option does not work on # all platforms (e.g., Solaris). If it does work, then # the output format will be: # # seconds|filesystem|blocks|used_blocks|free_blocks|capacity|mount_point # # -s # Specifying the silent option disables the printing of # df data to standard out. The default value is to not # run silent. This option only has meaning when the # accrue option is specified. # # -d # Specifies the base output directory for writing df data # files. This option is not mandatory and has a default # value of '/var/rsync/df'. # # -u # Specifies that the date and time is to be recorded in # Universal Coordinated Time (UTC). # ###################################################################### IFS=' ' PATH=/sbin:/usr/sbin:/usr/local/sbin:/bin:/usr/bin:/usr/local/bin PROGRAM=`basename $0` ###################################################################### # # TestPid # ###################################################################### TestPid() { my_pid="$1" my_pid_regexp="^[0-9]+$" echo "${my_pid}" | egrep "${my_pid_regexp}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then # The PID is valid. return 0; fi return 1; # The PID is not valid. } ###################################################################### # # CreateLockFile # ###################################################################### CreateLockFile() { my_lock_file="$1" # Customize ln(1) options based on the OS. case `uname -s` in NIKOS) # This OS is so old it doesn't support '-n'. ln_options= ;; *) ln_options="-n" ;; esac if [ -z "${my_lock_file}" ] ; then return 1 # Rats, we didn't even get to the gate. fi my_old_umask=`umask` umask 022 my_lock_dir=`dirname "${my_lock_file}"` if [ ! -d "${my_lock_dir}" ] ; then mkdir -p "${my_lock_dir}" if [ $? -ne 0 ] ; then return 1 # Rats, we got bushwhacked. fi fi umask 077 my_temp_file="${my_lock_file}.$$" echo $$ | cat - > "${my_temp_file}" if [ $? -ne 0 ] ; then return 1 # Rats, we didn't even get out of the gate. fi umask ${my_old_umask} ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 0 # Ding ding ding, we have a winner. fi my_old_pid=`head -1 "${my_lock_file}"` TestPid "${my_old_pid}" if [ $? -eq 0 ] ; then kill -0 ${my_old_pid} > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 1 # Rats, the lock is in use. fi fi # At this point, the lock is corrupt, stale, or owned by a different # user. Attempt to delete it, and go for the gold. rm -f "${my_lock_file}" ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 0 # Ding ding ding, we have a winner. fi rm -f "${my_temp_file}" return 1 # Rats, someone else got there first. } ###################################################################### # # DeleteLockFile # ###################################################################### DeleteLockFile() { my_lock_file="$1" if [ -n "${my_lock_file}" -a -f "${my_lock_file}" ] ; then my_old_pid=`head -1 "${my_lock_file}"` TestPid "${my_old_pid}" if [ $? -eq 0 ] ; then if [ ${my_old_pid} -eq $$ ] ; then rm -f "${my_lock_file}" fi fi fi return 0 } ###################################################################### # # AddLineItem # ###################################################################### AddLineItem() { my_body="$1" my_line="$2" my_count="$3" if [ ${my_count} -eq 1 ] ; then my_body="${my_line}" else my_body="${my_body} ${my_line}" fi echo "${my_body}" } ###################################################################### # # Usage # ###################################################################### Usage() { echo 1>&2 echo "Usage: ${PROGRAM} [-aHhSsu] [-d outdir]" 1>&2 echo 1>&2 exit 1 } ###################################################################### # # Process command line arguments. # ###################################################################### target="df" accrue="0" base_outdir="/var/rsync/${target}" print_header="0" run_silent="0" use_seconds="0" utc_option="" while getopts "ad:HhSsu" OPTION ; do case "${OPTION}" in a) accrue="1" ;; d) base_outdir="${OPTARG}" ;; H) print_header="1" ;; h) Usage ;; S) use_seconds="1" ;; s) run_silent="1" ;; u) utc_option="-u" ;; *) Usage ;; esac done if [ ${OPTIND} -le $# ] ; then Usage fi ###################################################################### # # Initialize working variables. # ###################################################################### hostname=`hostname | cut -d. -f1` blocks_regexp="[0-9]+" filesystem_regexp="[^|]+" hms_regexp="[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" mount_point_regexp="/[^|]*" percent_regexp="${blocks_regexp}%" seconds_regexp="[0-9]+" ymd_regexp="[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]" date=`date ${utc_option} "+%Y-%m-%d"` target_header="filesystem|blocks|used_blocks|free_blocks|capacity|mount_point" target_regexp="${filesystem_regexp}[|]${blocks_regexp}[|]${blocks_regexp}[|]${blocks_regexp}[|]${percent_regexp}[|]${mount_point_regexp}" if [ ${use_seconds} -eq 1 ] ; then seconds=`date ${utc_option} "+%s"` header="seconds|${target_header}" output_prefix="${seconds}" output_regexp="${seconds_regexp}[|]${target_regexp}" else time=`date ${utc_option} "+%H:%M:%S"` header="date|time|${target_header}" output_prefix="${date}|${time}" output_regexp="${ymd_regexp}[|]${hms_regexp}[|]${target_regexp}" fi ###################################################################### # # Harvest, preprocess, and check (via regexp) target data. # ###################################################################### OLD_IFS=${IFS} # Limit IFS to newlines only inside the for loop. IFS=' ' GNU_DF_VERSION=`df --version 2> /dev/null` if [ $? -eq 0 -a -n "${GNU_DF_VERSION}" ] ; then target_command=${target_command-"df -k -l --portability"} else target_command=${target_command-"df -k -l"} fi count=1 for line in `eval ${target_command} | sed '1d' 2> /dev/null | awk '{print $1"|"$2"|"$3"|"$4"|"$5"|"$6}'` ; do output_line="${output_prefix}|${line}" echo "${output_line}" | egrep "${output_regexp}" > /dev/null 2>&1 if [ $? -ne 0 ] ; then echo "${PROGRAM}: Error='Output (${output_line}) does not meet basic syntax checks.'" 1>&2 exit 2 fi output=`AddLineItem "${output}" "${output_line}" "${count}"` count=`expr ${count} + 1` done IFS=${OLD_IFS} ###################################################################### # # Write output according to the user's specified options. # ###################################################################### if [ ${accrue} -eq 1 ] ; then out_name=${date}.out out_path=${base_outdir}/${hostname} out_file=${out_path}/${out_name} if [ ! -d ${out_path} ] ; then mkdir -p ${out_path} || exit 2 fi lock_file=${out_path}/harvest_${target}.pid CreateLockFile "${lock_file}" if [ $? -ne 0 ] ; then echo "${PROGRAM}: Error='Unable to secure a lock file. Job aborted.'" 1>&2 exit 2 fi if [ ${run_silent} -eq 1 ] ; then echo "${output}" >> ${out_file} else echo "${output}" | tee -a ${out_file} fi DeleteLockFile "${lock_file}" else if [ ${print_header} -eq 1 ] ; then echo "${header}" fi echo "${output}" fi --- harvest_df --- Appendix 2 The following script combines all the management elements of this recipe (and then some) into one place. To use this script, read through it and change the variables to suit your environment. At a minimum, you should check/set the RSYNC_SERVER variable. Note that this variable can be set at run time through the '-r' command line option. You also need to ensure that the MD5 for harvest_df.pad is correct. To deploy everything on your clients, use the following oneshot job: ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m deploy-all Use the following job to periodically check/update harvest_df, rsync output, and prune old files: ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m periodic -r To remove everything from your clients, use the following oneshot job: ${WEBJOB_HOME}/bin/webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg harvest_df_manager -m remove-all To extract this script from the recipe, run the following command: $ sed -e '1,/^--- harvest_df_manager ---$/d; /^--- harvest_df_manager ---$/,$d' webjob-harvest-df.txt > harvest_df_manager Note: If you are using DSV, sign the harvest_df_manager script after installing it on the WebJob server. --- harvest_df_manager --- #!/bin/sh ###################################################################### # # $Id: harvest_df_manager.base,v 1.17 2010/12/10 05:41:12 klm Exp $ # ###################################################################### # # Copyright 2005-2007 The WebJob Project, All Rights Reserved. # ###################################################################### # # Purpose: Manage harvest_df deployments via webjob. # ###################################################################### IFS=' ' PATH=/sbin:/usr/sbin:/usr/local/sbin:/bin:/usr/bin:/usr/local/bin PROGRAM=`basename $0` HOSTNAME=`hostname | awk -F. '{print $1}'` ###################################################################### # # TestPid # ###################################################################### TestPid() { my_pid="$1" my_pid_regexp="^[0-9]+$" echo "${my_pid}" | egrep "${my_pid_regexp}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then # The PID is valid. return 0; fi return 1; # The PID is not valid. } ###################################################################### # # CreateLockFile # ###################################################################### CreateLockFile() { my_lock_file="$1" # Customize ln(1) options based on the OS. case `uname -s` in NIKOS) # This OS is so old it doesn't support '-n'. ln_options= ;; *) ln_options="-n" ;; esac if [ -z "${my_lock_file}" ] ; then return 1 # Rats, we didn't even get to the gate. fi my_old_umask=`umask` umask 022 my_lock_dir=`dirname "${my_lock_file}"` if [ ! -d "${my_lock_dir}" ] ; then mkdir -p "${my_lock_dir}" if [ $? -ne 0 ] ; then return 1 # Rats, we got bushwhacked. fi fi umask 077 my_temp_file="${my_lock_file}.$$" echo $$ | cat - > "${my_temp_file}" if [ $? -ne 0 ] ; then return 1 # Rats, we didn't even get out of the gate. fi umask ${my_old_umask} ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 0 # Ding ding ding, we have a winner. fi my_old_pid=`head -1 "${my_lock_file}"` TestPid "${my_old_pid}" if [ $? -eq 0 ] ; then kill -0 ${my_old_pid} > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 1 # Rats, the lock is in use. fi fi # At this point, the lock is corrupt, stale, or owned by a different # user. Attempt to delete it, and go for the gold. rm -f "${my_lock_file}" ln ${ln_options} "${my_temp_file}" "${my_lock_file}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then rm -f "${my_temp_file}" return 0 # Ding ding ding, we have a winner. fi rm -f "${my_temp_file}" return 1 # Rats, someone else got there first. } ###################################################################### # # DeleteLockFile # ###################################################################### DeleteLockFile() { my_lock_file="$1" if [ -n "${my_lock_file}" -a -f "${my_lock_file}" ] ; then my_old_pid=`head -1 "${my_lock_file}"` TestPid "${my_old_pid}" if [ $? -eq 0 ] ; then if [ ${my_old_pid} -eq $$ ] ; then rm -f "${my_lock_file}" fi fi fi return 0 } ###################################################################### # # DeployFile # ###################################################################### DeployFile() { MY_CFG_FILE=$1 MY_PAD_FILE=$2 MY_TARGET_PATH=$3 # Full path including filename. MY_TARGET_MODE=$4 MY_TARGET_OWNER=$5 MY_TARGET_GROUP=$6 MY_TARGET_DIR=`dirname ${MY_TARGET_PATH}` MY_MKDIR_CMD="{ umask 022 ; mkdir -p ${MY_TARGET_DIR} ; }" MY_CP_CMD="{ cp %payload ${MY_TARGET_PATH} ; }" MY_CHMOD_CMD="{ chmod ${MY_TARGET_MODE} ${MY_TARGET_PATH} ; }" MY_CHOWN_CMD="{ chown ${MY_TARGET_OWNER}:${MY_TARGET_GROUP} ${MY_TARGET_PATH} ; }" MY_RM_CMD="{ rm -f %payload ; }" MY_ID=`id | sed 's/^uid=\([0-9]\{1,5\}\)(.*$/\1/;'` if [ "${MY_ID}"X = "0"X ] ; then MY_PAD_CMD="{ ${MY_MKDIR_CMD} && ${MY_CP_CMD} && ${MY_CHMOD_CMD} && ${MY_CHOWN_CMD} } ; ${MY_RM_CMD}" else MY_PAD_CMD="{ ${MY_MKDIR_CMD} && ${MY_CP_CMD} && ${MY_CHMOD_CMD} } ; ${MY_RM_CMD}" fi if [ ! -f ${MY_TARGET_PATH} ] ; then webjob -e -f ${MY_CFG_FILE} ${MY_PAD_FILE} ${MY_PAD_CMD} fi } ###################################################################### # # UpdateFile # ###################################################################### UpdateFile() { MY_CFG_FILE=$1 MY_PAD_FILE=$2 MY_TARGET_PATH=$3 # Full path including filename. MY_TARGET_MODE=$4 MY_TARGET_OWNER=$5 MY_TARGET_GROUP=$6 MY_TARGET_HASH=$7 MY_DIGEST_TYPE=$8 # (MD5|SHA1) MY_CP_CMD="{ cp -f %payload ${MY_TARGET_PATH} ; }" MY_CHMOD_CMD="{ chmod ${MY_TARGET_MODE} ${MY_TARGET_PATH} ; }" MY_CHOWN_CMD="{ chown ${MY_TARGET_OWNER}:${MY_TARGET_GROUP} ${MY_TARGET_PATH} ; }" MY_RM_CMD="{ rm -f %payload ; }" MY_ID=`id | sed 's/^uid=\([0-9]\{1,5\}\)(.*$/\1/;'` if [ "${MY_ID}"X = "0"X ] ; then MY_PAD_CMD="{ ${MY_CP_CMD} && ${MY_CHMOD_CMD} && ${MY_CHOWN_CMD} } ; ${MY_RM_CMD}" else MY_PAD_CMD="{ ${MY_CP_CMD} && ${MY_CHMOD_CMD} } ; ${MY_RM_CMD}" fi if [ -f ${MY_TARGET_PATH} ] ; then MY_ACTUAL_HASH=`webjob -h -t ${MY_DIGEST_TYPE} ${MY_TARGET_PATH}` if [ "${MY_ACTUAL_HASH}" != "${MY_TARGET_HASH}" ] ; then webjob -e -f ${MY_CFG_FILE} ${MY_PAD_FILE} ${MY_PAD_CMD} fi fi } ###################################################################### # # DeployFiles # ###################################################################### DeployFiles() { MY_HARVEST_PREFIX=$1 echo "DeployFiles() ..." if [ -z "${MY_HARVEST_PREFIX}" ] ; then false else DeployFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "${MY_HARVEST_PREFIX}/bin/harvest_df" "755" "root" "wheel" fi } ###################################################################### # # UpdateFiles # ###################################################################### UpdateFiles() { MY_HARVEST_PREFIX=$1 echo "UpdateFiles() ..." if [ -z "${MY_HARVEST_PREFIX}" ] ; then false else UpdateFile "${WEBJOB_HOME}/etc/upload.cfg" "harvest_df.pad" "${MY_HARVEST_PREFIX}/bin/harvest_df" "755" "root" "wheel" "6522ea668f691de43f97e88bdf8ebc4d" "md5" fi } ###################################################################### # # RemoveFiles # ###################################################################### RemoveFiles() { MY_HARVEST_PREFIX=$1 echo "RemoveFiles() ..." if [ -z "${MY_HARVEST_PREFIX}" ] ; then false else rm -f ${MY_HARVEST_PREFIX}/bin/harvest_df fi } ###################################################################### # # DeployCronjob # ###################################################################### DeployCronjob() { MY_COMMAND=$1 MY_TIME_SPECIFICATION=$2 MY_COMMAND_LINE=$3 MY_CHECK_FIRST=${4-0} if [ X"${MY_CHECK_FIRST}" != X"0" ] ; then MY_CRONTAB_ENTRIES=`crontab -l` if [ $? -eq 0 -a -n "${MY_CRONTAB_ENTRIES}" ] ; then # Command succeeded, and we have data. echo "${MY_CRONTAB_ENTRIES}" | egrep -v "^#" | egrep "${MY_COMMAND}" > /dev/null 2>&1 if [ $? -eq 0 ] ; then # One or more matches were found, so don't deploy. echo "DeployCronjob() ... skipped" return 0 fi fi fi echo "DeployCronjob() ..." if [ -z "${MY_TIME_SPECIFICATION}" -o -z "${MY_COMMAND_LINE}" ] ; then false else webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload cronjob-manager --deploy -t "${MY_TIME_SPECIFICATION}" -c "${MY_COMMAND_LINE}" fi } ###################################################################### # # RemoveCronjob # ###################################################################### RemoveCronjob() { MY_EXPRESSION=$1 echo "RemoveCronjob() ..." if [ -z "${MY_EXPRESSION}" ] ; then false else webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload cronjob-manager --remove -e "${MY_EXPRESSION}" fi } ###################################################################### # # PruneOutput # ###################################################################### PruneOutput() { MY_OUTDIR=$1 MY_PRUNE_FLAG=$2 MY_PRUNE_DAYS_TO_KEEP=${3-30} echo "PruneOutput() ..." if [ -z "${MY_OUTDIR}" ] ; then false else if [ -f ${MY_PRUNE_FLAG} ] ; then find ${MY_OUTDIR}/${HOSTNAME} -type f -a -mtime +${MY_PRUNE_DAYS_TO_KEEP} -print -exec rm -f {} \; rm -f ${MY_PRUNE_FLAG} else echo "${PROGRAM}: Warning='${MY_PRUNE_FLAG} must exist for pruning to work (check for errors).'" 1>&2 fi fi } ###################################################################### # # RsyncOutput # ###################################################################### RsyncOutput() { MY_RSYNC_SERVER=$1 MY_RSYNC_DIR=$2 MY_RSYNC_ALL_DIR=$3 MY_PRUNE_FLAG=$4 echo "RsyncOutput() ..." if [ -z "${MY_RSYNC_SERVER}" -o -z "${MY_RSYNC_DIR}" ] ; then false else MY_SSH=`ssh -V 2>&1 | egrep "(Sun_|Open)SSH" > /dev/null && echo OpenISH` case "${MY_SSH}" in OpenISH) webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload webjob_rsync_id_df.open.pad \ rsync -avze \"ssh -i %payload -o BatchMode=yes -o StrictHostKeyChecking=no\" ${MY_RSYNC_DIR}/${HOSTNAME} rsync@${MY_RSYNC_SERVER}:${MY_RSYNC_ALL_DIR}/ \&\& \ touch ${MY_PRUNE_FLAG} # This signifies that it's ok to prune. ;; *) webjob -e -f ${WEBJOB_HOME}/etc/upload.cfg --NoUpload webjob_rsync_id_df.pad \ MY_PAD_CWD=\`pwd\` \&\& \ echo \"IdKey \${MY_PAD_CWD}/%payload\" \> \${MY_PAD_CWD}/%payload.id \&\& ssh-keygen -D \${MY_PAD_CWD}/%payload \&\& \ rsync -avze \"ssh -i \${MY_PAD_CWD}/%payload.id -o BatchMode=yes -o StrictHostKeyChecking=no\" \ ${MY_RSYNC_DIR}/${HOSTNAME} rsync@${MY_RSYNC_SERVER}:${MY_RSYNC_ALL_DIR}/ \&\& \ touch ${MY_PRUNE_FLAG} \; \ rm -f \${MY_PAD_CWD}/%payload.id \${MY_PAD_CWD}/%payload.pub ;; esac fi } ###################################################################### # # RunHarvester # ###################################################################### RunHarvester() { MY_HARVESTER=$1 MY_PRINT_HEADER=${2-0} echo "RunHarvester() ..." if [ "${MY_PRINT_HEADER}"X = "1"X ] ; then MY_HARVESTER_OPTS="-H" else MY_HARVESTER_OPTS= fi echo "--- HARVEST_BEGIN ---" ${MY_HARVESTER} ${MY_HARVESTER_OPTS} echo "--- HARVEST_END ---" } ###################################################################### # # Usage # ###################################################################### Usage() { echo 1>&2 echo "Usage: ${PROGRAM} [-H webjob-home] [-k days] [-l lock-file] [-r rsync-server] -m {deploy-{all|cronjob|files}|periodic|prune-output|reload-cronjob|remove-{all|cronjob|files}|rsync-output|run-harvester|update-files}" 1>&2 echo 1>&2 exit 1 } ###################################################################### # # Main # ###################################################################### LOCK_FILE=/var/run/harvest_df_manager.pid PRUNE_DAYS_TO_KEEP=30 PRUNE_DAYS_TO_KEEP_REGEXP="[0-9]+" RSYNC_SERVER="" # INSERT THE HOSTNAME OR ADDRESS OF YOUR RSYNC SERVER HERE, OR USE '-r' COMMAND LINE OPTION. RUN_MODE= while getopts "H:k:l:m:r:" OPTION ; do case "${OPTION}" in H) WEBJOB_HOME="${OPTARG}" ;; k) PRUNE_DAYS_TO_KEEP="${OPTARG}" ;; l) LOCK_FILE="${OPTARG}" ;; m) RUN_MODE="${OPTARG}" ;; r) RSYNC_SERVER="${OPTARG}" ;; *) Usage ;; esac done if [ ${OPTIND} -le $# ] ; then Usage fi if [ -z "${RUN_MODE}" ] ; then Usage else case "${RUN_MODE}" in deploy-all\ |deploy-cronjob\ |deploy-files\ |periodic\ |prune-output\ |reload-cronjob\ |remove-all\ |remove-cronjob\ |remove-files\ |rsync-output\ |run-harvester\ |update-files\ ) : # Run mode is valid. ;; *) echo "${PROGRAM}: Error='The specified run mode (${RUN_MODE}) is not supported.'" 1>&2 exit 2 ;; esac fi if [ -z "${LOCK_FILE}" ] ; then echo "${PROGRAM}: Error='The lock file has been reduced to an empty string.'" 1>&2 exit 2 fi echo "${PRUNE_DAYS_TO_KEEP}" | egrep "${PRUNE_DAYS_TO_KEEP_REGEXP}" > /dev/null 2>&1 if [ $? -ne 0 ] ; then echo "${PROGRAM}: Error='The number of days to keep (${PRUNE_DAYS_TO_KEEP}) does not meet basic syntax checks.'" 1>&2 exit 2 fi PATH=${WEBJOB_HOME=/usr/local/webjob}/bin:${PATH} ; export PATH HARVEST_PREFIX=/usr/local HARVEST_OUTDIR=/var/rsync/df HARVEST_COMMAND=${HARVEST_PREFIX}/bin/harvest_df HARVEST_COMMAND_LINE="[ -x ${HARVEST_COMMAND} ] && ${HARVEST_COMMAND} -as > /dev/null 2>&1" HARVEST_TIME_SPECIFICATION="0,5,10,15,20,25,30,35,40,45,50,55 * * * *" PRUNE_FLAG=/var/run/df.prune RSYNC_ALL_DIR=/var/rsync.all/df RSYNC_DIR=${HARVEST_OUTDIR} CreateLockFile "${LOCK_FILE}" if [ $? -ne 0 ] ; then echo "${PROGRAM}: Error='Unable to secure a lock file.'" 1>&2 exit 2 fi case "${RUN_MODE}" in deploy-all) DeployFiles "${HARVEST_PREFIX}" DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1" ;; deploy-cronjob) DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1" ;; deploy-files) DeployFiles "${HARVEST_PREFIX}" ;; periodic) DeployFiles "${HARVEST_PREFIX}" UpdateFiles "${HARVEST_PREFIX}" DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1" RunHarvester "${HARVEST_COMMAND}" "1" if [ -n "${RSYNC_SERVER}" ] ; then RsyncOutput "${RSYNC_SERVER}" "${RSYNC_DIR}" "${RSYNC_ALL_DIR}" "${PRUNE_FLAG}" PruneOutput "${HARVEST_OUTDIR}" "${PRUNE_FLAG}" "${PRUNE_DAYS_TO_KEEP}" fi ;; prune-output) touch ${PRUNE_FLAG} # This is a forced prune. PruneOutput "${HARVEST_OUTDIR}" "${PRUNE_FLAG}" "${PRUNE_DAYS_TO_KEEP}" ;; reload-cronjob) RemoveCronjob "${HARVEST_COMMAND_LINE}" DeployCronjob "${HARVEST_COMMAND}" "${HARVEST_TIME_SPECIFICATION}" "${HARVEST_COMMAND_LINE}" "1" ;; remove-all) RemoveFiles "${HARVEST_PREFIX}" RemoveCronjob "${HARVEST_COMMAND_LINE}" ;; remove-cronjob) RemoveCronjob "${HARVEST_COMMAND_LINE}" ;; remove-files) RemoveFiles "${HARVEST_PREFIX}" ;; rsync-output) if [ -n "${RSYNC_SERVER}" ] ; then RsyncOutput "${RSYNC_SERVER}" "${RSYNC_DIR}" "${RSYNC_ALL_DIR}" "${PRUNE_FLAG}" else echo "${PROGRAM}: Error='The rsync server is not defined. Use the \"-r\" option to specify.'" 1>&2 exit 2 fi ;; run-harvester) RunHarvester "${HARVEST_COMMAND}" "1" ;; update-files) UpdateFiles "${HARVEST_PREFIX}" ;; esac DeleteLockFile "${LOCK_FILE}" --- harvest_df_manager ---