ACCRE C7 Cluster Quick and Dirty Status
Report generated at Wed Apr 2 08:26:01 PM CDT 2025
Problem Nodes
HOSTNAMES STATE AVAIL_FEATURES TIMESTAMP USER REASON
cn390 down* sandybridge 2025-01-26T08:52:18 slurm Not responding
cn392 down* sandybridge 2025-01-27T22:16:57 slurm Not responding
cn421 down* sandybridge 2025-03-21T05:50:19 slurm Not responding
cn430 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn486 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn912 down* sandybridge 2025-02-28T14:51:14 slurm Not responding
cn1083 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn1091 drained sandybridge 2025-03-31T14:00:11 slurm Prolog error
cn1092 down* sandybridge 2025-03-06T20:58:51 slurm Not responding
cn1096 down* sandybridge 2025-03-06T14:23:48 slurm Not responding
cn1325 drained haswell 2025-03-08T08:45:02 appelte1 Nobody - RT90957 - memory issues, instability
cn1329 drained haswell 2025-03-08T08:47:01 appelte1 Nobody - RT90958 - memory issues, instability
cn1332 down* haswell 2025-03-17T14:57:37 slurm Not responding
cn1350 drained haswell 2025-03-08T08:52:43 appelte1 Nobody - RT90959 - memory issues, instability
cn1356 down* haswell 2025-03-14T21:23:43 slurm Not responding
cn1368 down* haswell 2025-03-17T14:40:56 slurm Not responding
cn1376 drained haswell 2025-03-08T08:54:25 appelte1 Nobody - RT90960 - memory issues, instability
cn1385 drained haswell 2025-03-08T08:56:26 appelte1 Nobody - RT90961 - memory issues, instability
gpu0014 draining broadwell,pascal,p3584 2025-03-31T16:38:53 slurm Prolog error
gpu0015 drained broadwell,pascal,p3584 2025-04-01T11:21:22 slurm Prolog error
gpu0018 drained broadwell,pascal,p3584 2025-03-31T16:41:03 slurm Prolog error
gpu0019 draining broadwell,pascal,p3584 2025-04-01T11:24:22 slurm Prolog error
gpu0021 draining broadwell,pascal,p3584 2025-04-01T11:27:17 slurm Prolog error
gpu0026 draining broadwell,pascal,p3840 2025-04-01T11:29:23 slurm Prolog error
gpu0027 draining broadwell,pascal,p3840 2025-04-01T11:31:30 slurm Prolog error
gpu0030 drained broadwell,pascal,p3840 2025-03-06T15:43:40 root Samuel - RT90936 - Bad gpu
gpu0038 drained skylake,turing,csbtmp 2025-02-17T12:18:42 slurm Nobody - RT90396 - GPU0 in error state : Not respond
gpu0045 drained skylake,turing,csbtmp 2025-02-17T13:00:08 slurm gres/gpu count reported lower than configured (3 < 4
gpu0048 drained skylake,turing,csbtmp 2025-04-01T06:08:57 root Kill task failed
gpu0080 drained* icelake,a6000x4,csbtmp 2024-04-25T13:03:59 root Melo is using this machine for testing
p-matheny-lab- down* zen 2025-03-01T15:59:51 slurm Not responding
Queue Summary (Production)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre 0 0 1 2
appelte1 0 0 1 2
-----------------------------------------------------------------------------------------
accre_guests 0 0 2 36
senthia 0 0 2 36
-----------------------------------------------------------------------------------------
anderson_mri 45 180 155 155
xul13 45 180 155 155
-----------------------------------------------------------------------------------------
behringer_lab 0 0 1 8
haleof 0 0 1 8
-----------------------------------------------------------------------------------------
booth_lab 2 2 0 0
comptoab 2 2 0 0
-----------------------------------------------------------------------------------------
brg_cores 13 76 0 0
desilvt 12 60 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
caldwell_lab 0 0 1 16
humphrjm 0 0 1 16
-----------------------------------------------------------------------------------------
calipari_lab 0 0 1 18
barthb1 0 0 1 18
-----------------------------------------------------------------------------------------
candelaria_group 0 0 2 40
hatche 0 0 2 40
-----------------------------------------------------------------------------------------
capra_lab_csb 2 2 0 0
mothcw 2 2 0 0
-----------------------------------------------------------------------------------------
cms 0 0 5 5
meloam 0 0 5 5
-----------------------------------------------------------------------------------------
cmsadmin 0 0 1 1
autocms 0 0 1 1
-----------------------------------------------------------------------------------------
cms_lowprio 51 81 184 460
cmslocal 12 21 13 40
cmspilot 39 60 171 420
-----------------------------------------------------------------------------------------
coxlab 7 21 11 11
evansp1 7 21 11 11
-----------------------------------------------------------------------------------------
davis_lab 1 1 0 0
tsail2 1 1 0 0
-----------------------------------------------------------------------------------------
edwards_lab 1 1 0 0
parkerac 1 1 0 0
-----------------------------------------------------------------------------------------
g_gamazon_lab 1 4 1 12
kimn13 0 0 1 12
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
g_giri_group 16 50 71 71
basnettb 16 50 71 71
-----------------------------------------------------------------------------------------
h_fabbrilab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
hodges_lab 1 6 0 0
aganve 1 6 0 0
-----------------------------------------------------------------------------------------
h_vmac 0 0 990 990
suny36 0 0 990 990
-----------------------------------------------------------------------------------------
isde-rer 2 6 0 0
vielmej 2 6 0 0
-----------------------------------------------------------------------------------------
jswhep 1 8 0 0
atehort 1 8 0 0
-----------------------------------------------------------------------------------------
l3_manzanas_group 0 0 1 13
manzand 0 0 1 13
-----------------------------------------------------------------------------------------
l3_precision_nutriti 2 13 0 0
baghem1 2 13 0 0
-----------------------------------------------------------------------------------------
l3_wilkey_lab 0 0 10 160
starlii 0 0 10 160
-----------------------------------------------------------------------------------------
lola 0 0 1 8
lifferjt 0 0 1 8
-----------------------------------------------------------------------------------------
maiziezhou_lab 1 34 6 244
chowx 1 34 2 14
fernamm1 0 0 1 50
xiem6 0 0 3 180
-----------------------------------------------------------------------------------------
nbody 1 32 120 300
ligo 0 0 120 300
smitm77 1 32 0 0
-----------------------------------------------------------------------------------------
palmeri_lab 200 210 0 0
bahgg 200 210 0 0
-----------------------------------------------------------------------------------------
p_collins_lab 0 0 1 8
chencl1 0 0 1 8
-----------------------------------------------------------------------------------------
p_matheny_lab 1 2 0 0
koolajd1 1 2 0 0
-----------------------------------------------------------------------------------------
p_neuert_lab 0 0 1 4
hughesjj 0 0 1 4
-----------------------------------------------------------------------------------------
rer 2 32 0 0
hum6 2 32 0 0
-----------------------------------------------------------------------------------------
richmond_lab 1 6 0 0
blackjb2 1 6 0 0
-----------------------------------------------------------------------------------------
r_isde 1 16 0 0
trippej1 1 16 0 0
-----------------------------------------------------------------------------------------
rokaslab 21 131 0 0
borrag 1 1 0 0
lint8 10 50 0 0
rangem1 10 80 0 0
-----------------------------------------------------------------------------------------
ruderferlab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
sbcs 10 60 41 41
guoz18 8 24 0 0
jiag 1 26 41 41
nguyensm 1 10 0 0
-----------------------------------------------------------------------------------------
taylor_group 2 12 0 0
milesmt 1 8 0 0
schultls 1 4 0 0
-----------------------------------------------------------------------------------------
tk_lab 0 0 1 80
yoonh14 0 0 1 80
-----------------------------------------------------------------------------------------
tong_lab 1 16 0 0
lutherzr 1 16 0 0
-----------------------------------------------------------------------------------------
vgi 2 14 0 0
nagait 1 4 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
walker_lab 10 22 0 0
deanrt 9 18 0 0
guox11 1 4 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 1 1 0 0
wankows 1 1 0 0
-----------------------------------------------------------------------------------------
yang_lab_csb 108 123 0 0
shaoq1 107 107 0 0
zhangsw 1 16 0 0
-----------------------------------------------------------------------------------------
Totals: 509 1182 1608 2683
Queue Summary (Pascal)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (Turing)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (A6000x4)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Queue Summary (A6000x2)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
Totals: 0 0 0 0
Partition Summary
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
production* up 14-00:00:0 3 down* cn[1332,1356,1368]
production* up 14-00:00:0 5 drain cn[1325,1329,1350,1376,1385]
production* up 14-00:00:0 47 mix cn[1301,1306,1311,1317,1321-1322,1324,1326,1331,1333,1335-1336,1338,1340,1342-1343,1347,1349,1351-1353,1357,1359-1363,1365,1369,1371-1372,1374-1375,1378,1380,1382,1389-1392,1394-1398,1701,1704]
production* up 14-00:00:0 31 alloc cn[1300,1302,1307,1315,1318,1320,1323,1328,1330,1334,1337,1339,1341,1344-1346,1348,1354-1355,1358,1366-1367,1370,1373,1379,1381,1383-1384,1387-1388,1393]
nogpfs up 14-00:00:0 6 down* cn[390,392,421,912,1092,1096]
nogpfs up 14-00:00:0 4 drain cn[430,486,1083,1091]
nogpfs up 14-00:00:0 139 alloc cn[303-308,311,313,315,317-318,320,322-324,326-333,335-338,340,347-348,351,353,355-360,362-365,367,369-370,372-380,384-385,388-389,394,398-401,403-405,407,411,413-420,423,425,427,429,431-432,435-437,439-443,445-446,448-452,455,460,463-466,468-472,474-477,479,481-485,491,495-496,499,1081-1082,1085-1087,1089-1090,1094-1095,1122-1123,1125-1126,1128-1129,1132,1134]
debug up infinite 2 mix gpu[0022,0059]
debug up infinite 1 alloc gpu0006
debug up infinite 3 idle cn[371,1101],gpu0046
sam up 2-02:00:00 2 idle vm-cms-sam-pri,vm-cms-sam-sec
maxwell up 5-00:00:00 1 mix gpu0004
maxwell up 5-00:00:00 2 idle gpu[0002,0012]
pascal up 5-00:00:00 5 drng gpu[0014,0019,0021,0026-0027]
pascal up 5-00:00:00 3 drain gpu[0015,0018,0030]
pascal up 5-00:00:00 9 mix gpu[0013,0017,0020,0022-0023,0025,0031,0033-0034]
turing up 5-00:00:00 3 drain gpu[0038,0045,0048]
turing up 5-00:00:00 1 mix gpu0035
turing up 5-00:00:00 1 alloc gpu0049
a6000x2 up 5-00:00:00 1 mix gpu0003
a6000x2 up 5-00:00:00 4 alloc gpu[0006-0008,0010]
a6000x2 up 5-00:00:00 1 idle gpu0005
a4000x4 up 14-00:00:0 0 n/a
a6000x4 up 14-00:00:0 1 drain* gpu0080
a6000x4 up 14-00:00:0 4 mix gpu[0059,0077,0079,0082]
a6000x4 up 14-00:00:0 2 idle gpu[0078,0081]
a100 up 14-00:00:0 0 n/a
a100x8 up 14-00:00:0 0 n/a
a4000x8 up 14-00:00:0 0 n/a
cgw-vm-qa-flatearth1 up infinite 1 idle vm-qa-flatearth1
cgw-djroomba up infinite 1 idle djroomba
cgw-cqs1 up infinite 1 idle cqs1
cgw-cqs3 up infinite 1 idle cqs3
cgw-rocksteady up infinite 1 idle rocksteady
cgw-tbi01 up infinite 1 idle tbi01
cgw-capra1 up infinite 1 idle capra1
cgw-horus up infinite 1 idle horus
cgw-dsi-gw up infinite 1 idle dsi-gw
cgw-maizie up infinite 1 idle maizie
cgw-maizie2 up infinite 1 idle maizie2
cgw-maizie3 up infinite 1 idle maizie3
cgw-hgen01 up infinite 1 idle hgen01
cgw-p-matheny-lab-server1 up infinite 1 down* p-matheny-lab-server1
cgw-sideshowbob up infinite 1 idle sideshowbob
cgw-platypus up infinite 1 mix platypus
cgw-hanuman up infinite 1 idle hanuman
cgw-lego up infinite 1 idle lego
cgw-badger up infinite 1 idle badger
cgw-candelaria01 up infinite 1 mix candelaria01
cgw-holowatyj01 up infinite 1 idle holowatyj01
cgw-cartailler01 up infinite 1 idle cartailler01
cgw-gamazon01 up infinite 1 idle gamazon01
Queue Summary (All Partitions)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre 0 0 1 2
appelte1 0 0 1 2
-----------------------------------------------------------------------------------------
accre_guests 0 0 2 36
senthia 0 0 2 36
-----------------------------------------------------------------------------------------
anderson_mri 45 180 155 155
xul13 45 180 155 155
-----------------------------------------------------------------------------------------
behringer_lab 0 0 1 8
haleof 0 0 1 8
-----------------------------------------------------------------------------------------
bme3890 0 0 3 17
909065 0 0 1 15
dhattk 0 0 2 2
-----------------------------------------------------------------------------------------
booth_lab 2 2 0 0
comptoab 2 2 0 0
-----------------------------------------------------------------------------------------
brg_cores 13 76 0 0
desilvt 12 60 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
caldwell_lab 0 0 1 16
humphrjm 0 0 1 16
-----------------------------------------------------------------------------------------
calipari_lab 0 0 1 18
barthb1 0 0 1 18
-----------------------------------------------------------------------------------------
candelaria_group 0 0 2 40
hatche 0 0 2 40
-----------------------------------------------------------------------------------------
capra_lab_csb 2 2 0 0
mothcw 2 2 0 0
-----------------------------------------------------------------------------------------
cgw_candelaria01 2 8 0 0
mcgilldg 1 4 0 0
quintedc 1 4 0 0
-----------------------------------------------------------------------------------------
cgw_djroomba 12 12 0 0
mcphauna 12 12 0 0
-----------------------------------------------------------------------------------------
cgw_maizie 3 72 0 0
wangh67 3 72 0 0
-----------------------------------------------------------------------------------------
cgw_platypus 7 148 0 0
mohamb2 1 32 0 0
rubinom 1 16 0 0
sardarn 5 100 0 0
-----------------------------------------------------------------------------------------
cms 139 1668 39 39
cmslocal 32 384 16 16
cmspilot 107 1284 18 18
meloam 0 0 5 5
-----------------------------------------------------------------------------------------
cmsadmin 0 0 1 1
autocms 0 0 1 1
-----------------------------------------------------------------------------------------
cms_lowprio 51 81 184 460
cmslocal 12 21 13 40
cmspilot 39 60 171 420
-----------------------------------------------------------------------------------------
coxlab 7 21 11 11
evansp1 7 21 11 11
-----------------------------------------------------------------------------------------
cs3892-oguz_acc 0 0 1 1
914505 0 0 1 1
-----------------------------------------------------------------------------------------
csb_gpu_acc 55 122 661 3128
bisigp1 3 12 0 0
cunnik8 24 24 50 50
ger1 0 0 20 20
guox11 7 42 492 2952
howardvr 0 0 1 8
marinot 1 16 0 0
ranx 1 9 0 0
shaoq1 19 19 98 98
-----------------------------------------------------------------------------------------
davis_lab 1 1 0 0
tsail2 1 1 0 0
-----------------------------------------------------------------------------------------
edwards_lab 1 1 0 0
parkerac 1 1 0 0
-----------------------------------------------------------------------------------------
g_gamazon_lab 1 4 1 12
kimn13 0 0 1 12
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
g_giri_group 16 50 71 71
basnettb 16 50 71 71
-----------------------------------------------------------------------------------------
h_fabbrilab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
hodges_lab 1 6 0 0
aganve 1 6 0 0
-----------------------------------------------------------------------------------------
h_vmac 0 0 990 990
suny36 0 0 990 990
-----------------------------------------------------------------------------------------
isde-rer 2 6 0 0
vielmej 2 6 0 0
-----------------------------------------------------------------------------------------
jswhep 1 8 0 0
atehort 1 8 0 0
-----------------------------------------------------------------------------------------
l3_manzanas_group 0 0 1 13
manzand 0 0 1 13
-----------------------------------------------------------------------------------------
l3_precision_nutriti 2 13 0 0
baghem1 2 13 0 0
-----------------------------------------------------------------------------------------
l3_wilkey_lab 0 0 10 160
starlii 0 0 10 160
-----------------------------------------------------------------------------------------
lola 0 0 1 8
lifferjt 0 0 1 8
-----------------------------------------------------------------------------------------
maiziezhou_lab 1 34 6 244
chowx 1 34 2 14
fernamm1 0 0 1 50
xiem6 0 0 3 180
-----------------------------------------------------------------------------------------
nbody 1 32 120 300
ligo 0 0 120 300
smitm77 1 32 0 0
-----------------------------------------------------------------------------------------
nbody_acc 2 48 1 16
khanfm 2 48 1 16
-----------------------------------------------------------------------------------------
neurogroup_acc 1 6 0 0
songrw 1 6 0 0
-----------------------------------------------------------------------------------------
palmeri_lab 200 210 0 0
bahgg 200 210 0 0
-----------------------------------------------------------------------------------------
p_collins_lab 0 0 1 8
chencl1 0 0 1 8
-----------------------------------------------------------------------------------------
p_dsi 0 0 1 4
malikm2 0 0 1 4
-----------------------------------------------------------------------------------------
p_dsi_acc 0 0 2 12
srikas2 0 0 2 12
-----------------------------------------------------------------------------------------
p_matheny_lab 1 2 0 0
koolajd1 1 2 0 0
-----------------------------------------------------------------------------------------
p_neuert_lab 0 0 1 4
hughesjj 0 0 1 4
-----------------------------------------------------------------------------------------
rer 2 32 0 0
hum6 2 32 0 0
-----------------------------------------------------------------------------------------
richmond_lab 1 6 0 0
blackjb2 1 6 0 0
-----------------------------------------------------------------------------------------
r_isde 1 16 0 0
trippej1 1 16 0 0
-----------------------------------------------------------------------------------------
rokaslab 21 131 0 0
borrag 1 1 0 0
lint8 10 50 0 0
rangem1 10 80 0 0
-----------------------------------------------------------------------------------------
rubinov_lab_acc 0 0 1 1
mohamb2 0 0 1 1
-----------------------------------------------------------------------------------------
ruderferlab 1 10 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
sbcs 10 60 41 41
guoz18 8 24 0 0
jiag 1 26 41 41
nguyensm 1 10 0 0
-----------------------------------------------------------------------------------------
taylor_group 2 12 0 0
milesmt 1 8 0 0
schultls 1 4 0 0
-----------------------------------------------------------------------------------------
tk_lab 0 0 1 80
yoonh14 0 0 1 80
-----------------------------------------------------------------------------------------
tong_lab 1 16 0 0
lutherzr 1 16 0 0
-----------------------------------------------------------------------------------------
vgi 2 14 0 0
nagait 1 4 0 0
yec2 1 10 0 0
-----------------------------------------------------------------------------------------
walker_lab 10 22 0 0
deanrt 9 18 0 0
guox11 1 4 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 1 1 0 0
wankows 1 1 0 0
-----------------------------------------------------------------------------------------
yang_lab_csb 108 123 0 0
shaoq1 107 107 0 0
zhangsw 1 16 0 0
-----------------------------------------------------------------------------------------
Totals: 730 3266 2312 5896