SLURM

To find the queued jobs
squeue -u username

To show job details
scontrol show jobid 12345

To show partitions
sinfo

Submit job
sbatch --partition=sb --time=10-00:00:00 --chdir=/home/centos/stuff --output=output.txt --gres=gpu:1 --error=error.txt -w sb010 basic.sh

The basic.sh must also specify the path of the job otherwise it won't find the conent due to the new directory.

Update job time
scontrol update jobid 3 TimeLimit=10-00:00:00

sinfo and nodes - Node States

    idle- all cores are available on the compute node
        no jobs are running on the compute node
    mix - at least one core is available on the compute node
        compute node has one or more jobs running on it
    alloc - all cores on the compute node are assigned to jobs

squeue - Job States

    R - Job is running on compute nodes
    PD - Job is waiting on compute nodes
    CG - Job is completing

scontrol show node

This will show the nodes details such as CPU / RAM.

scancel 10

This will cancel the job ID itself. It will also cancel any array within the job.

scontrol update nodename=worker005 state=drain reason=disabling it

This will let the node finish the current job, afterwards it won't take on any more jobs.

Environmental variables

$SLURM_JOB_ID 	        ID of job allocation
$SLURM_SUBMIT_DIR 	Directory job where was submitted
$SLURM_JOB_NODELIST 	File containing allocated hostnames
$SLURM_NTASKS 	        Total number of cores for job

SLURM

Recent Posts

Recent Comments

Archives

Categories