TOOLs slurm集群的使用

目前用过的集群系统有PBS/qsub系统，感觉是命令简单，容易上手。现在使用的是SLURM系统， SLURM系统常用命令如下： 注意把 username替换成你自己的账户名 1.常用的查看、取消、运行命令

sinfo #查看服务器节点和分区
squeue -u username #查看你当前运行的任务
scontrol show job JOBID #查看指定的jobID的状态
scancel jobid #取消对应jobid任务
sbatch test.s #以批命令的方式运行test.s这个文件。
scontrol show node #显示所有node节点的硬件信息
scontrol show node node02"#查看名字为node02的节点的硬件信息
smap #以图形的方式显示运行的任务

任务状态码说明：PD排队；R运行；S挂起；CG正在退出中心的服务器node01大节点，80核，500G mem,node02-09小节点，56核，120G mem。当提交的任务的cpu数量超过node的总cpu数量时候，任务可以被正常提交，但是无法被执行。 2.test.s的内容和参数

#!/bin/bash

#SBATCH -J hisat2 #作业名
#SBATCH -N 1 #节点数量
#SBATCH -n 4 # 启动的核数量
#SBATCH -t 192:00:00  #最大运行时间
##SBATCH --partition=compute*  #设置运行的分区，不同的分区的硬件不同（中心的服务器有2个分区，compute*是小节点nodes=02-09，big是大节点,nodes=01,）
# set batch script's standard output
#SBATCH --output=~/human/hg38/index/hisat2.out

echo " my job id is $START:$SLURM_JOBID "| tee ~/human/hg38/index/hisat2/hisat2.log
echo run nodes is following: | tee -a ~/human/hg38/index/hisat2/hisat2.log

echo begin time is `date` | tee -a ~/human/hg38/index/hisat2/hisat2.log
id=`echo $PBS_JOBID|awk -F. '{print $1}' `
NP=`cat $PBS_NODEFILE|wc -l`

cd /share/home/chaimao/human/hg38/index/
# run the application
srun sh hisat2.sh
echo end time is `date` | tee -a ~/human/hg38/index/hisat2/hisat2.log

##slurm资源管理系统命令
#运行命令方式：sbatch run.sh
srun sh test.sh #分别运行对应的sh脚本
srun python test.py #运行python程序
srun Rscript test.R  #运行R程序

注意：partition需要自己查看自己的服务器的分区名称，使用sinfo命令可以查看到。参考1 PBS和slurm的命令对比