先使用命令查询该队列节点状态
sinfo -Nel
得到
Tue Feb 25 23:10:37 2025
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
a02r01n04 1 xhacexclu24 allocated 128 8:16:1 513500 0 1 (null) none
a03r03n02 1 xhacexclu24 mixed 128 8:16:1 513500 0 1 (null) none
a04r05n04 1 xhacexclu24 mixed 128 8:16:1 513500 0 1 (null) none
a06r02n04 1 xhacexclu24 allocated 128 8:16:1 513500 0 1 (null) none
可以看到STATE一栏中allocated表示完全占用,mixed表示部分占用,接着我们需要查询部分占用的节点的空余核数
scontrol show node a03r03n02
得到
NodeName=a03r03n02 Arch=x86_64 CoresPerSocket=16 CPUAlloc=96 CPUEfctv=128 CPUTot=128 CPULoad=32.90AvailableFeatures=(null)ActiveFeatures=(null)Gres=(null)NodeAddr=10.21.3.18 NodeHostName=a03r03n02 Version=22.05.8-2.2.1-74-20250110OS=Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 RealMemory=513500 AllocMem=367872 FreeMem=303418 Sockets=8 Boards=1MemSpecLimit=10240State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/APartitions=xhacexclu24 BootTime=2023-11-18T20:33:44 SlurmdStartTime=2025-02-10T14:09:36LastBusyTime=2025-02-24T10:54:37CfgTRES=cpu=128,mem=513500M,billing=128AllocTRES=cpu=96,mem=367872MCapWatts=n/aCurrentWatts=0 AveWatts=0ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
可以看到占用的核数为96,一共核数为128,那么空余核数为128-96=32个,那么我们就可以根据需求使用着32个核。其他部分占用节点同理。
