6. Integration with Slurm
Preview Release
This is an early release of the Slurm plugin for the IPU. As such the software is subject to change without notice.
The Slurm plugin is available on request from Graphcore support.
The section describes integration of the V-IPU with Slurm. Slurm is a popular open-source cluster management and job scheduling system. The integration of V-IPU with Slurm is provided through a custom V-IPU resource selection plugin for Slurm systems.
For more details about Slurm and it’s architecture, please refer to the Slurm website.
A Slurm plugin is a dynamically linked code object providing customized implementation of well-defined Slurm APIs. Slurm plugins are loaded at runtime by the Slurm libraries and the customized API callbacks are called at appropriate stages.
Resource selection plugins are a type of Slurm plugins which implement the Slurm resource/node selection APIs. The resource selection APIs provide rich interfaces to allow for customized selection of nodes for jobs, as well as performing any tasks needed for preparing the job runs (such as partition creation in our case), and appropriate clean-up code at job termination (such as partition deletion in our case).
6.1. Configuring Slurm to use V-IPU select plugin
Note
This document assumes that you have access to pre-compiled Slurm binaries with the V-IPU plugin support or you have already patched and recompiled your Slurm installation with the V-IPU support.
To enable V-IPU resource selection in Slurm, you need to configure the SelectType
as
select/vipu
in the Slurm configuration. The V-IPU Slurm plugin is a layered plugin, which means it
can enable V-IPU support for existing resource selection plugins. Options pertaining to selected
secondary resource selection plugins can be specified under SelectTypeParameters
.
You must also set PropagateResourceLimitsExcept
to MEMLOCK
. This prevents host memory limits
being propagated to the job, which could cause failures when initialising the IPU.
The following is an example of the Slurm configuration enabling the V-IPU resource selection plugin layered on
top of a consumable resource allocation plugin (select/other_cons_tres
) with the CPU as a consumable resource:
SelectType=select/vipu
SelectTypeParameters=other_cons_tres,CR_CPU
PropagateResourceLimitsExcept=MEMLOCK
For SelectTypeParameters
supported by each of the existing resource selection plugins, refer to the
Slurm documentation.
6.2. Configuration parameters
Configuration parameters for the V-IPU resource selection plugin are set in separate configuration
files that need to be stored in the same directory as slurm.conf
. The default configuration file is
named vipu.conf
. Moreover, administrators can configure additional GRES models for the V-IPU representing
different V-IPU clusters. For the additional GRES models, configuration files are named with the desired
model name. For instance, a GRES model,``pod1``, needs a corresponding configuration file named as pod1.conf
in the Slurm configuration directory.
The following configuration options are supported:
ApiHost: The host name or IP address for the V-IPU controller.
ApiPort: The port number for the V-IPU controller. Default port is 8090.
IpuofDir: The directory where IPUoF configuration files for user jobs will be stored.
MaxIpusPerJob: Maximum IPUs allowed per job. Should not exceed size of the POD. Default value is 256.
ApiTimeout: Timeout in seconds for the V-IPU client. The default value is 50.
ForceDeletePartition: Set to 1 to specify forced deletion of partition in case of failures. The default value is 0.
UseReconfigPartition: Set to 1 to specify that reconfigurable partitions should be created. The default value is 0.
In addition, slurm.conf
should contain the following configuration to allow sharing IPUoF configuration files
needed by the Graphcore Poplar SDK.
VipuIpuofDir: Path to shared storage location writable by scheduler, and readable by all nodes and user accounts.
6.3. The V-IPU GRES plugin
To enable the V-IPU GRES plugin, add vipu
to the list of GRES types defined for the Slurm cluster.
GresTypes=vipu
In addition, for each node that can access a V-IPU resource, the following node GRES configuration must be added:
Gres=vipu:<GRES_MODEL>:no_consume:<max partition size>
6.4. An example Slurm Controller configuration
Note
Note that the following settings will override or take precedence over any values configured
in your existing slurm.conf
configuration file.
In the following, we outline an example of using the V-IPU plugin to configure a Slurm cluster containing a
single IPU-POD64
, with 4 compute nodes having shared access to a directory /home/ipuof
. The GRES model
is named as pod6
and a V-IPU Controller is running using default port without mTLS on the first node.
Node names are assumed to be ipu-pod64-001
through ipu-pod64-004
.
At the end of the
slurm.conf
, add the following line:Include v-ipu-plugin.conf
Create a file called
v-ipu-plugin.conf
in the same directory as theslurm.conf
containing the following parameters:SelectType=select/vipu SelectTypeParameters=other_cons_tres,CR_CPU PropagateResourceLimitsExcept=MEMLOCK VipuIpuofDir=/home/ipuof GresTypes=vipu NodeName=ipu-pod64-001 State=UNKNOWN Gres=vipu:pod64:no_consume:64 CPUs=96 Boards=1 SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=760000 TmpDisk=4760000 NodeName=ipu-pod64-002 State=UNKNOWN Gres=vipu:pod64:no_consume:64 CPUs=96 Boards=1 SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=760000 TmpDisk=4760000 NodeName=ipu-pod64-003 State=UNKNOWN Gres=vipu:pod64:no_consume:64 CPUs=96 Boards=1 SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=760000 TmpDisk=4760000 NodeName=ipu-pod64-004 State=UNKNOWN Gres=vipu:pod64:no_consume:64 CPUs=96 Boards=1 SocketsPerBoard=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=760000 TmpDisk=4760000 PartitionName=v-ipu Nodes=ipu-pod64-00[1-4] Default=NO MaxTime=INFINITE State=UP
Create a file called
vipu.conf
in the same directory asslurm.conf
containing the following parameters:ApiHost=ipu-pod64-001 ApiPort=8090 IpuofDir=/home/ipuof MaxIpusPerJob=64
Create a symbolic link to the
vipu.conf
file calledpod64.conf
in the same directory as theslurm.conf
.
6.5. Job submission and parameters
The V-IPU resource selection plugin supports the following options:
--ipus
: Number of IPUs requested for the job-n
/--ntasks
: Number of tasks for the job. This will correspond to the number of GCDs requested for the job partition.--num-replicas
: Number of model replicas for the job.
These parameters can be configured in both sbatch
and srun
scripts as well as provided on the command line:
$ sbatch --ipus=2 --ntasks=1 --num-replicas=1 myjob.batch
Optional:
If V-IPU GRES has been configured, you can add the following option in the job definition to select a particular GRES model for the V-IPU.
--gres=vipu:<type name>
You can configure the GRES model parameter in both sbatch
and srun
scripts as well as on the command line.
Assuming the desired GRES model to be used is pod64
, the command should look like:
$ sbatch --ipus=2 --ntasks=1 --num-replicas=1 --gres=vipu:pod64 myjob.batch
6.5.1. Job script examples
The following is an example of a single gcd job script:
#!/bin/bash
#SBATCH --job-name single-gcd-job
#SBATCH --ipus 2
#SBATCH -n 1
#SBATCH --time=00:30:00
srun <ipu_program>
wait
You can configure a multi-GCD job in the same way, except for indicating the number of GCDs requested by setting the number of tasks:
#!/bin/bash
#SBATCH --job-name multi-gcd-job
#SBATCH --ipus 2
#SBATCH -n 2
#SBATCH --time=00:30:00
srun <ipu_program>
wait