Sometimes we need to launch multiple processes parallely and wait for all these parallel processes to complete.
Eg.
In maven , we need to run multiple profiles (eg. test profiles) parallely and wait for them to complete.
Logic will be like
- Launch all the parallel processes
- Capture the PID of each parallel process (using $captured_pid=`echo $!` )
- Loop until a pre-defined maximum time interval for monitoring , with each iteration
- do something like ps -ef | grep and see if PID is still alive
- Continue this step until all the PID in parallel execution completes.
commands until execution for 30 mins (in intervals of 1 min), beyond which it will terminate with the list of still running PID's
But the problem with this approach is Let's say there are 3 mvn commands running parallely with PID's 123,234,345 Let's assume PID 123 completes execution within a minute and this PID is re-allocated by Unix process manager to some other process.
Unfortunately, if that process is a daemon, it will keep the above script running for the 30 mins even if the process with PID 234,345 completes within say 10 mins.
To avoid this pitfall, I changed the logic to something like this Every parallel process launched will create a flat file (with name equivalent to that of the PID of mvn process), and the monitoring loop will check if that file exist [or] not. In this way, we could catch if some PID's completing execution within a minute (monitoring interval) and the monitoring script is not fooled by the re-allocation of PID to some other process.
To do this , we do write another script like ./mvnWrapper.sh which will take the arguments from the main monitoring script
Eg.
In maven , we need to run multiple profiles (eg. test profiles) parallely and wait for them to complete.
Logic will be like
- Launch all the parallel processes
- Capture the PID of each parallel process (using $captured_pid=`echo $!` )
- Loop until a pre-defined maximum time interval for monitoring , with each iteration
- do something like ps -ef | grep
- Continue this step until all the PID in parallel execution completes.
semantic_no=1
for i in `echo ratingsBAT pollsBAT mbBAT abuseBAT`
do
echo "Launcher: Launching BAT for semantic : $i started at `date` "
mvn -Dmaven.test.apiHost.value=$1 -Dmaven.test.apiPort.value=$2 test -P $i &
pidlist[$semantic_no]=`echo $!`
semanticlist[$semantic_no]=`echo $i`
pidrunning[$semantic_no]=1
semantic_no=`expr $semantic_no + 1`
done
for((i1=0;i1<30;i1++))
do
sleep 60
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
ct=`ps -ef | grep ${pidlist[$j]} |grep -v 'grep' | wc -l`
if [ $ct == 0 ]
then
seconds_for_exec=`expr $i1 \* 60`
echo "Launcher: Process with PID : ${pidlist[$j]} completed execution in $seconds_for_exec seconds -- TIME of COMPLETION : `date`"
pidrunning[$j]=0
fi
fi
done
completed=0
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
echo "Launcher: PID : ${pidlist[$j]} of semantic ${semanticlist[$j]} is still running ::::"
completed=1
fi
done
if [ $completed == 0 ]
then
echo "Completed all the Tests.. TIME : `date`"
exit
fi
done
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
echo "Launcher: PID : ${pidlist[$j]} of semantic ${semanticlist[$j]} did not complete at TIME `date`"
fi
done
Above script will continue to monitor for the parallel mvn test -P #---- Contents of mvnWrapper.sh ----- currentPID=`echo $$` echo "Creating flat file with $currentPID" touch $currentPID echo "mvn $* " mvn $* echo "Deleting flat file with $currentPID" rm $currentPID #---- End of mvnWrapper.sh -----
# Main Monitoring script
profileToRun=`echo profile1 profile2`
for i in `echo $profileToRun`
do
echo "Launcher: Launching BAT for semantic : $i started at `date` "
./mvnWrapper.sh -f pom.xml -Dmaven.test.apiHost.value=$1 -Dmaven.test.apiPort.value=$2 $MINUS_D_OPTIONS test -P $i &
captured_pid=`echo $!`
pidlist[$semantic_no]=`echo $captured_pid`
echo "Captured PID : `echo $captured_pid`"
semanticlist[$semantic_no]=`echo $i`
pidrunning[$semantic_no]=1
semantic_no=`expr $semantic_no + 1`
done
# this monitoring script will run for 90 mins, monitoring all the PIDs in array in interval of 1 min
for((i1=0;i1<90;i1++))
do
sleep 60
echo "Checking PID status for $i1 time(s).."
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
#ct=`ps -ef | grep ${pidlist[$j]} |grep -v 'grep' | wc -l`
fileName=`echo ${pidlist[$j]}`
echo "FileName checked is $fileName"
if [ -f $fileName ]
then
ct=1
else
ct=0
fi
if [ $ct == 0 ]
then
seconds_for_exec=`expr $i1 \* 60`
echo "Launcher: Process with PID : ${pidlist[$j]} completed execution in $seconds_for_exec seconds -- TIME of COMPLETION : `date`"
pidrunning[$j]=0
fi
fi
done
completed=0
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
echo "Launcher: PID : ${pidlist[$j]} of semantic ${semanticlist[$j]} is still running ::::"
completed=1
fi
done
if [ $completed == 0 ]
then
echo "Launcher: Completed all the Tests.. TIME : `date`"
exit
fi
done
for((j=1;j<$semantic_no;j++))
do
if [ ${pidrunning[$j]} == 1 ]
then
echo "Launcher: PID : ${pidlist[$j]} of semantic ${semanticlist[$j]} did not complete at TIME `date`"
fi
done Link to my other blogs