When enabling shared resources in Slurm as per the article here, you may see the following error in /var/log/slurmctld on the headnode:
we don't have select plugin type 102
Checking through the logs you may also see:
error: Incomplete job record fatal: Incomplete job state save file, start with '-i' to ignore this
Occasionally, when enabling shared resources in Slurm, the job state save file becomes incomplete. To work around this issue, perform the following steps.
First, stop slurmctld in Bright:
# cmsh % device use master % services % stop slurm % quit
Next, have you SelectType and SelectTypeParameters set how you want them to be configured in slurm.conf.
Then, start slurmctld by running the following command on your head node:
# /cm/shared/apps/slurm/current/sbin/slurmctld -i
That will tell slurmctld to start while ignoring the incomplete job state save file error.
After that, kill the process for slurmctld:
# killall slurmctld
Then, start slurmctld from Bright again:
# cmsh % device use master % services % start slurm
Now slurmctld should be starting properly using your desired slurm.conf settings.
You may also need to run the scontrol reconfigure command once slurmctld is started to notify the compute nodes.