Custom Search

Deploying an Slurm installation from a job in an Slurm-managed cluster.

I am about to carry out a set of experiments related to workload processing using Slurm. For this reason, I am using the supercomputer Marenostrum IV (www.bsc.es). However, the resources and the jobs are already managed by Slurm, so, I aim at submitting my jobs in order to deploy my own Slurm set-up in the nodes that my job has been given by the Marenostrum IV's Slurm.
For reason of clarity, let's use an alias for both Slurms. I'll call MNSlurm to the Marenostrum's Slurm, and SSlurm to my own set-up.

Jobs are submitted to MNSlurm with:

$ sbatch script.sh

So, my script.sh must take care of everything.
We are assuming that SSlurm is already installed.

As MNSlurm will assign different nodes to each job, we must generate the SSlurm's configuration file on-the-fly. Nevertheless, we only need to add information about nodes, the rest of the configuration can be taken from a template file.

MNSlurm job

Following, I'll explain, step by step, my script

#!/bin/bash
#SBATCH --time=02:00:00
#SBATCH --nodes=8
#SBATCH --exclusive
SLURM_BIN=$HOME/apps/install/slurm/bin
SLURM_SBIN=$HOME/apps/install/slurm/sbin
SLURM_CONF_DIR=$HOME/slurm-confdir
SLURM_CONF_FILE=$SLURM_CONF_DIR/slurm.conf
cp $SLURM_CONF_DIR/slurm.conf.template $SLURM_CONF_FILE

In this first excerpt, we are setting the MNSlurm job, defining the paths of the SSlurm installation and finally, we copy the configuration template as a configuration file ready to be filled out.


nodes=""
NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d -s)"
for node in $NODELIST; do
if [ "$node" == "$(hostname)" ]; then
echo "ControlMachine=$(hostname)" >> $SLURM_CONF_FILE
else
echo "NodeName=$node CPUs=48 CoresPerSocket=24 ThreadsPerCore=1 State=Idle Port=7009" >> $SLURM_CONF_FILE
nodes="$node,$nodes"
fi
done;
echo "PartitionName=test Nodes=$(echo $nodes | sed 's/.$//') Default=YES MaxTime=INFINITE State=UP" >> $SLURM_CONF_FILE

Then, we get the nodelist from the MNSlurm job, and start to add lines to the configuration file. With the last line, we set the partition of SSlurm where are added all the nodes, except one (I've decided to have one extra node for management).


NNODES=$(($SLURM_NNODES-1))
NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s)"
#mpiexec -n $NNODES --hosts=$NODELIST hostname
$SLURM_SBIN/slurmctld -cDv &
mpiexec -n $NNODES --hosts $NODELIST $SLURM_SBIN/slurmd -cDv &

Once configured, we have to launch the SSlurm daemons. Notice that we have added to $PATH the path of SSlurm instalation, so we must use the whole path when executing the SSlurm binaries.


$SLURM_BIN/sbatch -N$NNODES ./job.sh &

The next step is to submit jobs to SSlurm (remember that this script is for MNSlurm). With that command, I'm asking to SSlurm for all the nodes (expect the controller) in order to execute job.sh (explained later).


aux=$( $SLURM_BIN/squeue | wc -l );
while [ $aux -gt 1 ]; do
aux=$( $SLURM_BIN/squeue | wc -l );
echo "$aux jobs remaining...";
sleep 10;
done
echo "The End!";

Of course, we cannot exit from the MNSlurm script without waiting for the whole workload in SSlurm has been processed. That is why, in this excerpt we check periodically the SSlurm queue until it has been drained.

SSlurm job

This script is submitted to SSlurm through sbatch, and sbatch uses srun to launch the applications. So, we have to add the path of the SSlurm installation.


#!/bin/bash
export PATH=$HOME/apps/install/slurm/bin:$PATH
NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s)"
mpiexec -n $SLURM_JOB_NUM_NODES -hosts $NODELIST ./exec

As shown in the code, we can use the enviromental variables of SSlurm job to execute out application.

Add new comment