Wednesday, April 21, 2010

Parallel execution of command in BASH

This script can be used for parallel execution of any command or shell script for faster execution of that command. Say you have a program called prog which process some jpg files. Then you do

parallel -j 3 prog *.jpg

Here 3 prog program will run simultanously
or
parallel -j 3 "prog -r -A=40" *.jpg
to pass args to prog.

Furthermore, -r allows even more sophisticated commands by replacing asterisks in the command string by the argument:
parallel -j 6 -r "convert -scale 50% * small/small_*" *.jpg
I.e. this executes convert -scale 50% file1.jpg small/small_file1.jpg for all the jpg files. This is a real-life example for scaling down images by 50% (requires imagemagick).
Finally, here’s the script. It can be easily manipulated to handle different jobs, too. Just write your command between #DEFINE COMMAND and #DEFINE COMMAND END.

Here is the parallel program
#!/bin/bash
NUM=0
QUEUE=""
MAX_NPROC=2 # default
REPLACE_CMD=0 # no replacement by default
USAGE="A simple wrapper for running processes in parallel.
Usage: `basename $0` [-h] [-r] [-j nb_jobs] command arg_list
  -h  Shows this help
 -r  Replace asterix * in the command string with argument
 -j nb_jobs  Set number of simultanious jobs [2]
 Examples:
  `basename $0` somecommand arg1 arg2 arg3
  `basename $0` -j 3 \"somecommand -r -p\" arg1 arg2 arg3
  `basename $0` -j 6 -r \"convert -scale 50% * small/small_*\" *.jpg"

function queue {
 QUEUE="$QUEUE $1"
 NUM=$(($NUM+1))
}

function regeneratequeue {
 OLDREQUEUE=$QUEUE
 QUEUE=""
 NUM=0
 for PID in $OLDREQUEUE
 do
  if [ -d /proc/$PID  ] ; then
   QUEUE="$QUEUE $PID"
   NUM=$(($NUM+1))
  fi
 done
}

function checkqueue {
 OLDCHQUEUE=$QUEUE
 for PID in $OLDCHQUEUE
 do
  if [ ! -d /proc/$PID ] ; then
   regeneratequeue # at least one PID has finished
   break
  fi
 done
}

# parse command line
if [ $# -eq 0 ]; then #  must be at least one arg
 echo "$USAGE" >&2
 exit 1
fi

while getopts j:rh OPT; do # "j:" waits for an argument "h" doesnt
    case $OPT in
 h) echo "$USAGE"
  exit 0 ;;
 j) MAX_NPROC=$OPTARG ;;
 r) REPLACE_CMD=1 ;;
 \?) # getopts issues an error message
  echo "$USAGE" >&2
  exit 1 ;;
    esac
done

# Main program
echo Using $MAX_NPROC parallel threads
shift `expr $OPTIND - 1` # shift input args, ignore processed args
COMMAND=$1
shift

for INS in $* # for the rest of the arguments
do
 # DEFINE COMMAND
 if [ $REPLACE_CMD -eq 1 ]; then
  CMD=${COMMAND//"*"/$INS}
 else
  CMD="$COMMAND $INS" #append args
 fi
 echo "Running $CMD" 

 $CMD &

 #Change:
 #$CMD &

 #To:
 #eval “$CMD &”

 #If you want to do things like:
 #par.sh ‘tr -d ” ” * > $(basename * .txt)-stripped.txt’ *.txt

 #Without the eval it’ll treat > and $(basename…) as arguments to tr.


 # DEFINE COMMAND END

 PID=$!
 queue $PID

 while [ $NUM -ge $MAX_NPROC ]; do
  checkqueue
  sleep 0.4
 done
done
wait # wait for all processes to finish before exit

Source is at 
http://pebblesinthesand.wordpress.com/2008/05/22/a-srcipt-for-running-processes-in-parallel-in-bash/

1 comment:

Ole Tange said...

Or you could install GNU Parallel:

parallel -j 6 convert -scale 50% {} small/small_{} ::: *.jpg

One of the advantages is the {.} construct which removes the extension. Without -j it will run one job per CPU core:

parallel convert -scale 50% {} {.}.png ::: *.jpg

Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ