ACCRE R9 Cluster Quick and Dirty Status
Report generated at Wed May 13 08:23:01 PM CDT 2026
Problem Nodes
HOSTNAMES STATE TIMESTAMP REASON COMMENT
cn1225 down* 2026-05-11T07:43:54 Not responding (null)
cn1484 down 2026-05-11T07:44:49 Node unexpectedly re Troy - RT97473 - network down
cn1526 drain* 2026-05-11T16:19:57 Troy - RT99047 - rep Troy - RT99047 - replace cmos battery
cn1540 drain* 2026-05-08T09:56:14 Low RealMemory (repo (null)
cn1608 down* 2026-05-12T02:57:31 Not responding (null)
cn1609 drain 2026-05-11T17:05:56 Prolog error (null)
cn1630 drain 2026-05-12T05:23:39 Prolog error : Not r (null)
cn1704 drain 2026-05-11T07:47:15 Prolog error : Not r Scott - RT98513 - draining to use connection to test cn1705
cn1705 down 2026-05-11T07:50:19 Node unexpectedly re nobody - RT98513 - Packet loss, possible bad cable
cn1708 drng 2026-05-11T20:17:23 Prolog error (null)
gpu0200 inval 2026-05-11T07:47:47 gres/gpu count repor Troy - RT97880 - impi unreachable, node unresponsive
gpu0201 inval 2026-05-11T07:48:15 gres/gpu count repor Nobody - RT98050 - Dropping 4th GPU
gpu0202 inval 2026-05-11T07:48:11 gres/gpu count repor Nobody - RT97910 - RMA JBOX caddy
gpu0203 inval 2026-05-11T07:48:05 gres/gpu count repor Thomas - RTNA - Move to dev?
Queue Summary (Batch)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
accre_guests 0 0 1 100
haojz 0 0 1 100
-----------------------------------------------------------------------------------------
atate_lab 0 0 1 24
raya13 0 0 1 24
-----------------------------------------------------------------------------------------
beam_lab 27 448 251 4048
marshazm 1 32 2 64
zhuj29 26 416 249 3984
-----------------------------------------------------------------------------------------
bias_group 1 10 0 0
biasds 1 10 0 0
-----------------------------------------------------------------------------------------
booth_lab 2 16 100 798
mathura 1 8 100 798
vessa 1 8 0 0
-----------------------------------------------------------------------------------------
brg_cores 1 16 0 0
kandelr 1 16 0 0
-----------------------------------------------------------------------------------------
cgg 2 12 1 64
liy110 0 0 1 64
sandk13 1 4 0 0
songl8 1 8 0 0
-----------------------------------------------------------------------------------------
cms 239 4562 525 1364
cmslocal 142 2013 225 570
cmspilot 97 2549 300 794
-----------------------------------------------------------------------------------------
coxlab 1 3 0 0
blostf1 1 3 0 0
-----------------------------------------------------------------------------------------
cqs_si 0 0 4 8
chenarsw 0 0 4 8
-----------------------------------------------------------------------------------------
das_lab 1 1 0 0
shiltmh1 1 1 0 0
-----------------------------------------------------------------------------------------
davis_lab 0 0 1 16
bluejor 0 0 1 16
-----------------------------------------------------------------------------------------
escudero_lab 1 24 0 0
seifis1 1 24 0 0
-----------------------------------------------------------------------------------------
feng_lab 0 0 4 16
jiangl1 0 0 4 16
-----------------------------------------------------------------------------------------
g_gamazon_lab 1 4 0 0
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
goldring_group 0 0 2 20
mcgrawke 0 0 2 20
-----------------------------------------------------------------------------------------
haslag_group 3 40 222 3552
haslagph 3 40 222 3552
-----------------------------------------------------------------------------------------
h_biostat_kang 493 493 361 361
yanb1 493 493 361 361
-----------------------------------------------------------------------------------------
h_biostat_student 8 85 1 4
blackmh2 0 0 1 4
koy2 5 75 0 0
liuk20 1 8 0 0
namy1 1 1 0 0
yangc16 1 1 0 0
-----------------------------------------------------------------------------------------
h_cqs 2 39 2 15
shengq1 1 7 2 15
wangy89 1 32 0 0
-----------------------------------------------------------------------------------------
h_vmac 0 0 5460 5461
yangy48 0 0 5459 5459
zhanm32 0 0 1 2
-----------------------------------------------------------------------------------------
h_vmac_imaging 0 0 210 210
jackb13 0 0 210 210
-----------------------------------------------------------------------------------------
h_vuiis 1 8 0 0
viswam1 1 8 0 0
-----------------------------------------------------------------------------------------
isde-rer 0 0 3 24
champaca 0 0 3 24
-----------------------------------------------------------------------------------------
kaczkurkin_lab 1 20 0 0
abbasia 1 20 0 0
-----------------------------------------------------------------------------------------
l3_aboud_lab 1 64 0 0
hongm1 1 64 0 0
-----------------------------------------------------------------------------------------
l3_jasper_lab 1 2 0 0
hattleee 1 2 0 0
-----------------------------------------------------------------------------------------
l3_precision_nutrition_lab 1 2 0 0
baghem1 1 2 0 0
-----------------------------------------------------------------------------------------
l3_runnoe_group 2 8 0 0
kaldorme 2 8 0 0
-----------------------------------------------------------------------------------------
l3_watts_lab 1 30 0 0
rosena 1 30 0 0
-----------------------------------------------------------------------------------------
l3_wilkey_lab 1 8 0 0
rubina4 1 8 0 0
-----------------------------------------------------------------------------------------
lea_lab 53 212 13 56
arneram 0 0 1 2
brassl1 0 0 1 1
petersrm 53 212 9 36
songm6 0 0 1 1
wilsorm5 0 0 1 16
-----------------------------------------------------------------------------------------
maha 0 0 1 1
wardbm1 0 0 1 1
-----------------------------------------------------------------------------------------
mahmoud_group 0 0 1 128
amaraii 0 0 1 128
-----------------------------------------------------------------------------------------
mchaourab 0 0 114 114
kaot1 0 0 114 114
-----------------------------------------------------------------------------------------
moro_lab 1 2 0 0
moroa 1 2 0 0
-----------------------------------------------------------------------------------------
nasa_imqcam 23 736 181 5792
fangc7 23 736 181 5792
-----------------------------------------------------------------------------------------
nbody 9 157 63 694
ligo 7 28 59 182
smitm77 2 129 4 512
-----------------------------------------------------------------------------------------
ng_lab 1 8 0 0
kimj119 1 8 0 0
-----------------------------------------------------------------------------------------
p_csb_meiler 650 1694 61192 142633
arifovl 1 1 0 0
huntek1 516 516 14617 14617
moreljl 17 17 36049 36049
mothcw 0 0 1000 1000
tydingcw 116 1160 9049 90490
yange8 0 0 477 477
-----------------------------------------------------------------------------------------
p_dsi 0 0 2 4
yangi1 0 0 2 4
-----------------------------------------------------------------------------------------
p_englot_group 0 0 1 24
redaa1 0 0 1 24
-----------------------------------------------------------------------------------------
p_masi 80 320 0 0
lorenzas 80 320 0 0
-----------------------------------------------------------------------------------------
p_matheny_lab 8 38 185 925
koolajd1 8 38 185 925
-----------------------------------------------------------------------------------------
p_meiler 0 0 1 3
yange8 0 0 1 3
-----------------------------------------------------------------------------------------
rer 2 40 7 46
cantrekb 0 0 2 6
hum6 1 16 0 0
karomnj 0 0 5 40
wonge7 1 24 0 0
-----------------------------------------------------------------------------------------
r_isde 1 4 0 0
trippej1 1 4 0 0
-----------------------------------------------------------------------------------------
rke_group 2 12 0 0
sleethmr 1 4 0 0
yangz31 1 8 0 0
-----------------------------------------------------------------------------------------
rokaslab 81 96 348 348
copea1 80 80 348 348
riedlio 1 16 0 0
-----------------------------------------------------------------------------------------
rubinov_lab 0 0 1 4
rubinom 0 0 1 4
-----------------------------------------------------------------------------------------
ruderferlab 100 100 457 470
palmesa3 100 100 456 464
paull2 0 0 1 6
-----------------------------------------------------------------------------------------
sbcs 37 37 704 1406
liq17 37 37 1 1
lyul1 0 0 701 1402
xus15 0 0 2 3
-----------------------------------------------------------------------------------------
taylor_group 1 3 0 0
petrop3 1 3 0 0
-----------------------------------------------------------------------------------------
vgi 2 10 1 5
parkj71 1 6 1 5
salerl1 1 4 0 0
-----------------------------------------------------------------------------------------
walker_lab 2 18 0 0
davishl4 1 16 0 0
deanrt 1 2 0 0
-----------------------------------------------------------------------------------------
wankowicz_lab 1553 1553 9710 9710
wankows 1553 1553 9710 9710
-----------------------------------------------------------------------------------------
williams_roberson_lab 1 1 0 0
yeohb1 1 1 0 0
-----------------------------------------------------------------------------------------
womelsdorf_lab 1 10 0 0
gerritcg 1 10 0 0
-----------------------------------------------------------------------------------------
yang_lab_csb 2 36 0 0
zhengm9 2 36 0 0
-----------------------------------------------------------------------------------------
Totals: 3401 10982 80131 178448
Queue Summary (Batch GPU)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
accre_guests_acc 3 3 0 0
liy110 1 1 0 0
whitejt2 2 2 0 0
-----------------------------------------------------------------------------------------
csb_gpu_acc 6 8 0 0
arifovl 1 1 0 0
dongj11 1 1 0 0
karadim 1 1 0 0
lybrantp 1 1 0 0
walkeas2 2 4 0 0
-----------------------------------------------------------------------------------------
h_oguz_lab_acc 2 2 1 1
wanj119 2 2 1 1
-----------------------------------------------------------------------------------------
maple_lab_acc 0 0 1 1
lif12 0 0 1 1
-----------------------------------------------------------------------------------------
mccabe_gpu_acc 0 0 1 1
jonec68 0 0 1 1
-----------------------------------------------------------------------------------------
mchaourab_acc 0 0 114 114
kaot1 0 0 114 114
-----------------------------------------------------------------------------------------
p_dsi_acc 0 0 10 10
yangi1 0 0 10 10
-----------------------------------------------------------------------------------------
p_meiler_acc 1 4 0 0
agarwm5 1 4 0 0
-----------------------------------------------------------------------------------------
psychology_gpu_acc 0 0 3 3
gerritcg 0 0 3 3
-----------------------------------------------------------------------------------------
taylor_group_acc 2 2 0 0
laaln1 2 2 0 0
-----------------------------------------------------------------------------------------
Totals: 14 19 130 130
Queue Summary (interactive)
GROUP USER ACTIVE_JOBS ACTIVE_CORES PENDING_JOBS PENDING_CORES
-----------------------------------------------------------------------------------------
candelaria_group_int 1 4 0 0
shimozkk 1 4 0 0
-----------------------------------------------------------------------------------------
cgg_int 1 8 0 0
molinp2 1 8 0 0
-----------------------------------------------------------------------------------------
h_cutting_lab_int 1 2 0 0
huertan 1 2 0 0
-----------------------------------------------------------------------------------------
l3_precision_nutrition_lab_int 1 128 0 0
baghem1 1 128 0 0
-----------------------------------------------------------------------------------------
maiziezhou_lab_int 12 60 78 390
tangk10 12 60 78 390
-----------------------------------------------------------------------------------------
rubinov_lab_int 3 16 0 0
rubinom 2 6 0 0
sardarn 1 10 0 0
-----------------------------------------------------------------------------------------
vgi_int 1 2 0 0
shellejp 1 2 0 0
-----------------------------------------------------------------------------------------
Totals: 20 220 78 390
Queue Summary (interactive_gpu)
GROUP USER ACTIVE_JOBS ACTIVE_GPUS PENDING_JOBS PENDING_GPUS
-----------------------------------------------------------------------------------------
dsi_dgx_iacc 6 9 0 0
criswea 1 1 0 0
mohamb2 1 4 0 0
samkn 1 1 0 0
schultls 2 2 0 0
wut18 1 1 0 0
-----------------------------------------------------------------------------------------
p_matheny_lab_iacc 1 2 0 0
koolajd1 1 2 0 0
-----------------------------------------------------------------------------------------
Totals: 7 11 0 0
Partition Summary
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
interactive up 14-00:00:0 7 mix cn[1287,1301,1328-1329,1812-1814]
interactive up 14-00:00:0 2 alloc cn[1302,1800]
interactive up 14-00:00:0 21 idle cn[1322-1326,1330,1707,1801-1811,1815-1817]
batch* up 14-00:00:0 5 mix- cn[1357,1366,1401,1568,1570]
batch* up 14-00:00:0 2 drain* cn[1526,1540]
batch* up 14-00:00:0 2 down* cn[1225,1608]
batch* up 14-00:00:0 1 drng cn1708
batch* up 14-00:00:0 3 drain cn[1609,1630,1704]
batch* up 14-00:00:0 128 mix cn[1202-1203,1205-1206,1208-1211,1213,1216-1218,1220-1224,1226-1228,1230,1232-1233,1235,1237-1238,1242,1262,1264-1265,1272,1280-1281,1283,1288,1295,1297-1299,1361,1364-1365,1378,1380-1385,1387,1397,1399,1443,1454-1456,1468,1470-1477,1481,1485-1486,1488-1492,1495-1496,1498,1503,1505,1508-1509,1512-1513,1525,1527,1529,1531-1532,1535-1538,1543-1544,1546,1548,1551,1562,1565,1571,1573-1574,1576-1578,1582-1583,1586,1588,1592-1593,1598-1600,1603,1613,1617-1618,1621,1627-1629,1633,1700-1703,1709,2000]
batch* up 14-00:00:0 247 alloc cn[1204,1207,1212,1215,1219,1229,1231,1234,1236,1239-1241,1257-1261,1266-1271,1273-1275,1277-1279,1282,1284-1286,1289-1294,1296,1303-1318,1320-1321,1327,1331-1354,1358-1360,1362-1363,1367-1369,1371-1377,1379,1388-1396,1398,1400,1402-1411,1414-1427,1430-1432,1434-1439,1441-1442,1445-1450,1452-1453,1457-1458,1460-1464,1466-1467,1469,1478-1480,1482-1483,1487,1493-1494,1497,1499-1502,1504,1506-1507,1510-1511,1514-1520,1522-1524,1528,1530,1533-1534,1545,1547,1549-1550,1552-1559,1561,1563-1564,1566-1567,1569,1575,1579-1581,1584-1585,1587,1589,1594-1597,1601-1602,1604-1607,1610,1612,1614-1616,1619-1620,1622-1626,1631-1632,1706,1710]
batch* up 14-00:00:0 2 down cn[1484,1705]
batch_gpu up 14-00:00:0 4 inval gpu[0200-0203]
batch_gpu up 14-00:00:0 9 mix gpu[0059,0062,0068,0081,0300,0303],gracehopper[01-02],hgx03
batch_gpu up 14-00:00:0 31 idle gpu[0063-0067,0069-0080,0082,0084-0085,0208,0301-0302,0304-0310],hgx02
interactive_gpu up 14-00:00:0 3 mix dgx[01,03],gpu0207
interactive_gpu up 14-00:00:0 4 idle dgx04,gpu[0058,0060-0061]
sam up 2-02:00:00 1 alloc cms-sam-01
sam up 2-02:00:00 1 idle cms-sam-02
reserved inact infinite 0 n/a