Custom Search

Add new comment

Deploying an Slurm installation from a job in an Slurm-managed cluster.

I am about to carry out a set of experiments related to workload processing using Slurm. For this reason, I am using the supercomputer Marenostrum IV ( However, the resources and the jobs are already managed by Slurm, so, I aim at submitting my jobs in order to deploy my own Slurm set-up in the nodes that my job has been given by the Marenostrum IV's Slurm.
For reason of clarity, let's use an alias for both Slurms. I'll call MNSlurm to the Marenostrum's Slurm, and SSlurm to my own set-up.

Jobs are submitted to MNSlurm with:

$ sbatch

So, my must take care of everything.
We are assuming that SSlurm is already installed.

As MNSlurm will assign different nodes to each job, we must generate the SSlurm's configuration file on-the-fly. Nevertheless, we only need to add information about nodes, the rest of the configuration can be taken from a template file.

MNSlurm job

Following, I'll explain, step by step, my script

#SBATCH --time=02:00:00
#SBATCH --nodes=8
#SBATCH --exclusive
cp $SLURM_CONF_DIR/slurm.conf.template $SLURM_CONF_FILE

In this first excerpt, we are setting the MNSlurm job, defining the paths of the SSlurm installation and finally, we copy the configuration template as a configuration file ready to be filled out.

NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d -s)"
for node in $NODELIST; do
if [ "$node" == "$(hostname)" ]; then
echo "ControlMachine=$(hostname)" >> $SLURM_CONF_FILE
echo "NodeName=$node CPUs=48 CoresPerSocket=24 ThreadsPerCore=1 State=Idle Port=7009" >> $SLURM_CONF_FILE
echo "PartitionName=test Nodes=$(echo $nodes | sed 's/.$//') Default=YES MaxTime=INFINITE State=UP" >> $SLURM_CONF_FILE

Then, we get the nodelist from the MNSlurm job, and start to add lines to the configuration file. With the last line, we set the partition of SSlurm where are added all the nodes, except one (I've decided to have one extra node for management).

NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s)"
#mpiexec -n $NNODES --hosts=$NODELIST hostname
$SLURM_SBIN/slurmctld -cDv &
mpiexec -n $NNODES --hosts $NODELIST $SLURM_SBIN/slurmd -cDv &

Once configured, we have to launch the SSlurm daemons. Notice that we have added to $PATH the path of SSlurm instalation, so we must use the whole path when executing the SSlurm binaries.

$SLURM_BIN/sbatch -N$NNODES ./ &

The next step is to submit jobs to SSlurm (remember that this script is for MNSlurm). With that command, I'm asking to SSlurm for all the nodes (expect the controller) in order to execute (explained later).

aux=$( $SLURM_BIN/squeue | wc -l );
while [ $aux -gt 1 ]; do
aux=$( $SLURM_BIN/squeue | wc -l );
echo "$aux jobs remaining...";
sleep 10;
echo "The End!";

Of course, we cannot exit from the MNSlurm script without waiting for the whole workload in SSlurm has been processed. That is why, in this excerpt we check periodically the SSlurm queue until it has been drained.

SSlurm job

This script is submitted to SSlurm through sbatch, and sbatch uses srun to launch the applications. So, we have to add the path of the SSlurm installation.

export PATH=$HOME/apps/install/slurm/bin:$PATH
NODELIST="$(scontrol show hostname $SLURM_JOB_NODELIST | paste -d, -s)"
mpiexec -n $SLURM_JOB_NUM_NODES -hosts $NODELIST ./exec

As shown in the code, we can use the enviromental variables of SSlurm job to execute out application.