[00:01:47] (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1071334 (owner: 10TrainBranchBot) [00:03:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, mw1386.eqiad.wmnet, mw1462.eqiad.wmnet, mw1484.eqiad.wmnet, kubernetes1030.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, kubernetes1017.eqiad.wmnet [00:03:14] .eqiad.wmnet, kubernetes1012.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, mw1419.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, wikikube-worker1021.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, mw1468.eqiad.wmnet, wikikube-worker1010.eqiad.wmn [00:03:14] rnetes1015.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1031.eqiad.wmnet, mw1439.eqiad.wmnet, parse1021.eqiad.wmnet, parse1003.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, https://wikitech.wikimedia.org/wiki/PyBal [00:03:16] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.wmnet, kubernetes1030.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1424.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqia [00:03:16] mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1017.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, mw1458.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, kuberne [00:03:16] eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad https://wikitech.wikimedia.org/wiki/PyBal [00:04:16] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:06:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:09:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1415.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, mw1424.eqiad.wmnet, mw1370.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eq [00:09:14] t, mw1469.eqiad.wmnet, mw1360.eqiad.wmnet, parse1001.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1031.eqiad.wmnet, mw1452.eqiad.wmnet, parse1006.eqiad.wmnet, parse1003.eqiad.wmnet, mw1472.eqiad.wmnet, mw1451.eqiad.wmnet, mw1379.eqiad.wmnet, mw1416.eqiad.wmnet, wikikube-worker1002.eqiad.wmnet, parse1014.eqiad.wmnet, parse1007.eqiad.wmnet, mw1374.eqiad.wmnet, wikikube-worker1013.eqiad.wmnet, mw1439.eq [00:09:14] t, wikikube-worker1032.eqiad.wmnet, mw1482.eqiad.wmnet, wikikube-worker1012.eqiad.wmnet, kubernetes1016.eqiad.wmnet, mw1371.eqiad.wmnet, mw1461.eqiad.wmnet, wikikube-worker1018.eqiad.wm https://wikitech.wikimedia.org/wiki/PyBal [00:09:16] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1442.eqiad.wmnet, mw1386.eqiad.wmnet, mw1470.eqiad.wmnet, mw1388.eqiad.wmnet, kubernetes1030.eqiad.wmnet, mw1424.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1483.eqiad.wmnet, mw1369.eqiad.wmnet, mw1 [00:09:16] d.wmnet, mw1394.eqiad.wmnet, mw1360.eqiad.wmnet, mw1458.eqiad.wmnet, parse1012.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1019.eqiad.wmnet, mw1464.eqiad.wmnet, mw1381.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, mw1472.eqiad.wmnet, wikikube-worker1011.eqiad.wmnet, kubernetes1036.eqiad.wmnet, mw1416.eqiad.wmnet, mw1354.eqiad.wmnet, wikikube-worker1007.eqiad.wmnet, parse1014.eqiad.wmnet, parse1007.eqiad.wmnet, kubernetes1022.eqiad.w [00:09:16] 1478.eqiad.wmnet, kubernetes1037.eqiad.wmnet, mw1384.eqiad.wmnet, wikikube-worker1032.eqiad.wmnet, mw1390.eqiad.wmnet, mw1476.eqiad.wmnet, mw1449.eqiad.wmnet, mw1495.eqiad.wmnet, mw1371 https://wikitech.wikimedia.org/wiki/PyBal [00:12:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:12:16] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:22:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1380.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1462.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw [00:22:14] ad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, parse1012.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kuber [00:22:14] 9.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, parse1003.eqiad.wmnet, mw1352.eqiad.wmnet, parse1006.eqiad.wmnet, wikikube-worker1028.eqiad.wmnet, mw1472.eqiad.wmnet, parse102 https://wikitech.wikimedia.org/wiki/PyBal [00:22:16] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1454.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1425. [00:22:16] net, mw1395.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1018.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, wikikube-worker1021.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1483.eqiad.wmnet, parse1001.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1015.eqia [00:22:16] kubernetes1008.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1452.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, parse1021.eqiad.wmnet, wikikube-worker1028.eqiad.wmnet, mw1431.eq https://wikitech.wikimedia.org/wiki/PyBal [00:24:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:24:16] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [00:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [01:39:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1430.eqiad.wmnet, parse1009.eqiad.wmnet, kubernetes1030.eqiad.wmnet, mw1435.eqiad.wmnet, parse1010.eqiad.wmnet, wikikube-worker100 [01:39:14] wmnet, mw1425.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, parse1006.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1019.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, [01:39:14] es1056.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, parse1022.eqiad.wmnet, mw1376.eqiad.wmnet, kubernetes1039.eqiad.wmnet, mw1409.eqiad.w https://wikitech.wikimedia.org/wiki/PyBal [01:39:18] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1479.eqiad.wmnet, mw1462.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.wmnet, mw1405.eqiad.wmnet, parse1021.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1389.eqiad.wmnet, mw13 [01:39:18] .wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, mw1483.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1458.eqiad.wmnet, parse1001.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1024.eqiad.wmnet, parse1019.eqiad.wmnet, mw13 [01:39:18] .wmnet, mw1391.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, mw1472.eqiad.wmnet, parse1022.eqiad.wmnet, mw1376 https://wikitech.wikimedia.org/wiki/PyBal [01:50:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:53:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1380.eqiad.wmnet, mw1419.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1470.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1005.eqiad.w [01:53:14] 1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet, mw1352.eqiad.wmnet, mw1472.eqiad.wmnet, mw1451.eqiad.wmnet, kubernetes1039.eqiad.wmnet, mw1409.eqiad.wmnet, mw1383.eqiad.wmnet, kubernetes1057.eqiad.wmnet, mw1416.eqiad.wmnet, kubernetes1023.eqiad.wmnet, wikikube-worker1002.eqiad.wmnet, mw1354.eqiad.wmnet, wikikube-worker1007.eqiad.wmnet, parse1014.eqiad.wmnet, mw1374.eqi [01:53:14] , wikikube-worker1013.eqiad.wmnet, mw1439.eqiad.wmnet, mw1432.eqiad.wmnet, kubernetes1022.eqiad.wmnet, mw1387.eqiad.wmnet, kubernetes1040.eqiad.wmnet, kubernetes1006.eqiad.wmnet, mw1449 https://wikitech.wikimedia.org/wiki/PyBal [01:56:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [01:56:18] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [02:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [02:36:13] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:00:43] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [03:01:10] PROBLEM - Host an-worker1168 is DOWN: PING CRITICAL - Packet loss = 100% [03:06:42] RECOVERY - Host an-worker1168 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [03:12:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1479.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, mw1399.eqiad.wmnet, kubernetes1038.eqiad.wmnet, m [03:12:14] iad.wmnet, mw1395.eqiad.wmnet, mw1488.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1369.eqiad.wmnet, mw1419.eqiad.wmnet, kubernetes1059.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, kubernetes1028.eqiad.wmne [03:12:14] netes1019.eqiad.wmnet, kubernetes1024.eqiad.wmnet, kubernetes1062.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, mw1441.eqiad.wmnet, parse1006.eqiad.wmnet, mw1355.eqiad.wmnet, par https://wikitech.wikimedia.org/wiki/PyBal [03:12:18] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1388.eqiad.wmnet, mw1399.eqiad.wmnet, mw1393.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, mw1389.eqiad.wmnet, mw1425.eqiad.wm [03:12:18] ikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmne [03:12:18] netes1024.eqiad.wmnet, mw1439.eqiad.wmnet, mw1391.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, parse1006.eqiad.wmnet, wikikube-worker1028.eqiad.wmnet https://wikitech.wikimedia.org/wiki/PyBal [03:16:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:16:18] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:27:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1433.eqiad.wmnet, mw1470.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1405.eqiad.w [03:27:14] 1399.eqiad.wmnet, mw1391.eqiad.wmnet, mw1435.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1005.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1395.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, mw1419.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, [03:27:14] qiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, pars https://wikitech.wikimedia.org/wiki/PyBal [03:27:18] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, mw1442.eqiad.wmnet, mw1479.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1462.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1454 [03:27:18] mnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1369.eqiad.wmnet, wikikube-worker1021.eqiad.wmnet, mw1360.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernete [03:27:18] iad.wmnet, mw1439.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, parse1003.eqiad.wmnet, mw1382.eqiad.wmnet, mw1431.eqiad.wmnet, wikikube-worker1028.eqiad.wm https://wikitech.wikimedia.org/wiki/PyBal [03:29:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:29:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:29:18] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:33:14] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1430.eqiad.wmnet, mw1484.eqiad.wmnet, kubernetes1030.eqiad.wmnet, mw1488.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1012.eqiad.wmnet, k [03:33:14] s1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1483.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, mw1360.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1062.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, kubernet [03:33:14] qiad.wmnet, mw1355.eqiad.wmnet, parse1003.eqiad.wmnet, parse1022.eqiad.wmnet, wikikube-worker1031.eqiad.wmnet, kubernetes1039.eqiad.wmnet, kubernetes1026.eqiad.wmnet, kubernetes1036.eqi https://wikitech.wikimedia.org/wiki/PyBal [03:33:18] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1470.eqiad.wmnet, mw1430.eqiad.wmnet, mw1405.eqiad.wmnet, mw1424.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1005.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmne [03:33:18] netes1017.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1483.eqiad.wmnet, mw1369.eqiad.wmnet, mw1367.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1058.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes10 [03:33:18] .wmnet, mw1439.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, parse1021.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1441.eqiad.wmnet, parse1006.eqiad.wmnet, parse1022.eqiad.wmne https://wikitech.wikimedia.org/wiki/PyBal [03:34:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:35:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [03:37:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:41:14] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:41:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [03:42:18] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [03:49:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [03:54:45] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [04:31:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [04:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [04:37:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [04:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [05:14:12] RECOVERY - Backup freshness on backup1001 is OK: Fresh: 139 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [05:26:56] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [05:29:44] PROBLEM - Juniper virtual chassis ports on asw2-d-eqiad is CRITICAL: CRIT: Down: 2 Unknown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status [06:04:21] FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:09:21] RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues [06:20:06] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [06:20:48] RECOVERY - Juniper virtual chassis ports on asw2-d-eqiad is OK: OK: UP: 20 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status [06:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [06:24:45] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:27:15] FIRING: MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?panelId=18&fullscreen&orgId=1&var-datasource=codfw%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [06:32:15] FIRING: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240908T0700) [07:07:15] RESOLVED: [2x] MediaWikiHighErrorRate: Elevated rate of MediaWiki errors - kube-mw-web - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiHighErrorRate [08:32:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [08:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [08:38:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [08:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [09:16:10] PROBLEM - Backup freshness on backup1001 is CRITICAL: Stale: 1 (gerrit1003), Fresh: 138 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring [09:32:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [09:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [09:38:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [09:58:04] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw2424.codfw.wmnet, mw2396.codfw.wmnet, wikikube-worker2063.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2081.codfw.wmnet, mw2375.codfw.wmnet, wikikube-worker2026.codfw.wmnet, mw2368.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2076.codfw.wmnet, wikikube-worker2083.codfw.wmnet, parse2004. [09:58:04] net, wikikube-worker2044.codfw.wmnet, wikikube-worker2031.codfw.wmnet, wikikube-worker2027.codfw.wmnet, wikikube-worker2030.codfw.wmnet, mw2313.codfw.wmnet, wikikube-worker2055.codfw.wmnet, kubernetes2013.codfw.wmnet, mw2397.codfw.wmnet, mw2413.codfw.wmnet, mw2356.codfw.wmnet, wikikube-worker2014.codfw.wmnet, mw2304.codfw.wmnet, wikikube-worker2018.codfw.wmnet, wikikube-worker2013.codfw.wmnet, mw2390.codfw.wmnet, parse2008.codfw.wmnet, mw [09:58:04] fw.wmnet, wikikube-worker2035.codfw.wmnet, wikikube-worker2007.codfw.wmnet, mw2442.codfw.wmnet, wikikube-worker2088.codfw.wmnet, mw2414.codfw.wmnet, wikikube-worker2012.codfw.wmnet, mw2 https://wikitech.wikimedia.org/wiki/PyBal [10:00:04] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [10:19:20] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, parse1013.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1399.eqiad.wmnet, mw1463.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmn [10:19:20] 93.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1425.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1014.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, mw1458.eqiad.wmnet, parse1012.eqiad.wmnet, mw1468.eqiad.wmnet, wikik [10:19:20] er1010.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1019.eqiad.wmnet, mw1439.eqiad.wmnet, mw1464.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, parse1006.eqiad.wmnet, mw1355 https://wikitech.wikimedia.org/wiki/PyBal [10:19:20] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, mw1367.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, parse1013.eqiad.wmnet, mw1479.eqiad.wmnet, mw1415.eqiad.wmnet, parse1009.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1393.eqiad.wmnet, mw1454.eqiad.wmnet, mw1408.eqiad.wmnet, kubernetes1 [10:19:20] d.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, mw1419.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, mw1486.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1019.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, parse1003.eqiad.wmnet, kubernetes1056 [10:19:21] mnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, parse1018.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, wikikube-worker1028.eqiad.wmnet, kubernetes1039.eqiad.wmnet, kubernetes1035. https://wikitech.wikimedia.org/wiki/PyBal [10:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [10:33:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [10:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [10:39:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:18:44] FIRING: KubernetesDeploymentUnavailableReplicas: ... [11:18:44] Deployment mw-wikifunctions.eqiad.main in mw-wikifunctions at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=mw-wikifunctions&var-deployment=mw-wikifunctions.eqiad.main - ... [11:18:44] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [11:21:20] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:21:20] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:23:44] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [11:23:44] Deployment mw-wikifunctions.eqiad.main in mw-wikifunctions at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=mw-wikifunctions&var-deployment=mw-wikifunctions.eqiad.main - ... [11:23:44] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [11:33:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [11:39:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:19:22] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1433.eqiad.wmnet, mw1470.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, mw1463.eqiad.wmnet, mw1488.eqiad.wmnet, mw1370.eq [12:19:22] t, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, mw1486.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, kubernetes1018.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, kubernetes1062.eqiad.wmnet, parse1019.eqiad.wmnet, mw1391.eqiad.wmnet, kubernetes1 [12:19:22] d.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, parse1006.eqiad.wmnet, mw1355.eqiad.wmnet, mw1472.eqiad.wmnet, parse1022.eqiad.wmnet, wikikube-worker1031.eqiad.wmnet, wikikube-worker10 https://wikitech.wikimedia.org/wiki/PyBal [12:19:24] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1388.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, parse1021.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.w [12:19:24] 1454.eqiad.wmnet, parse1005.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, mw1389.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, mw1360.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1483.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet [12:19:24] be-worker1024.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1 https://wikitech.wikimedia.org/wiki/PyBal [12:34:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [12:40:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [12:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [13:20:22] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:20:24] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [13:34:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [13:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [13:40:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:19:24] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1470.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, kubernetes1030.eqiad.wmnet, par [14:19:24] qiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, mw1395.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1425.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1005.eqia [14:19:24] mw1360.eqiad.wmnet, mw1458.eqiad.wmnet, mw1468.eqiad.wmnet, mw1431.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1008.eqiad.wmnet, mw1439.eqiad.wmnet, https://wikitech.wikimedia.org/wiki/PyBal [14:19:26] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1419.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1479.eqiad.wmnet, mw1462.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, parse1009.eqiad.wmnet, mw1405.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqia [14:19:26] mw1454.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, mw1395.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1367.eqiad.wmnet, mw1486.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1062.eqiad.wmnet, mw14 [14:19:26] .wmnet, parse1021.eqiad.wmnet, wikikube-worker1028.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, parse1003.eqiad.wmnet, mw1472.eqi https://wikitech.wikimedia.org/wiki/PyBal [14:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [14:35:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:36:13] FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [14:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [14:41:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:54:16] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes2046.codfw.wmnet, mw2396.codfw.wmnet, wikikube-worker2033.codfw.wmnet, wikikube-worker2086.codfw.wmnet, wikikube-worker2063.codfw.wmnet, parse2006.codfw.wmnet, wikikube-worker2102.codfw.wmnet, mw2375.codfw.wmnet, wikikube-worker2026.codfw.wmnet, wikikube-worker2036.codfw.wmnet, parse2009.codfw.wmnet, mw2370.co [14:54:16] t, wikikube-worker2084.codfw.wmnet, wikikube-worker2099.codfw.wmnet, mw2443.codfw.wmnet, kubernetes2048.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2076.codfw.wmnet, kubernetes2059.codfw.wmnet, wikikube-worker2071.codfw.wmnet, parse2004.codfw.wmnet, wikikube-worker2044.codfw.wmnet, mw2431.codfw.wmnet, mw2427.codfw.wmnet, kubernetes2042.codfw.wmnet, wikikube-worker2043.codfw.wmnet, kubernetes2006.codfw.wmnet, wikikube-work [14:54:16] odfw.wmnet, wikikube-worker2008.codfw.wmnet, mw2352.codfw.wmnet, wikikube-worker2041.codfw.wmnet, mw2359.codfw.wmnet, wikikube-worker2002.codfw.wmnet, mw2313.codfw.wmnet, wikikube-worke https://wikitech.wikimedia.org/wiki/PyBal [14:54:16] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers wikikube-worker2021.codfw.wmnet, mw2396.codfw.wmnet, parse2001.codfw.wmnet, wikikube-worker2033.codfw.wmnet, parse2017.codfw.wmnet, kubernetes2056.codfw.wmnet, wikikube-worker2063.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2017.codfw.wmnet, mw2375.codfw.wmnet, wikikube-worker2026.codfw.wmnet, mw2447.co [14:54:16] t, mw2370.codfw.wmnet, wikikube-worker2099.codfw.wmnet, kubernetes2014.codfw.wmnet, mw2443.codfw.wmnet, kubernetes2048.codfw.wmnet, parse2003.codfw.wmnet, kubernetes2059.codfw.wmnet, mw2315.codfw.wmnet, wikikube-worker2071.codfw.wmnet, wikikube-worker2044.codfw.wmnet, mw2431.codfw.wmnet, mw2427.codfw.wmnet, wikikube-worker2027.codfw.wmnet, kubernetes2042.codfw.wmnet, wikikube-worker2096.codfw.wmnet, wikikube-worker2065.codfw.wmnet, wikiku [14:54:16] r2060.codfw.wmnet, mw2371.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2090.codfw.wmnet, mw2302.codfw.wmnet, parse2013.codfw.wmnet, kubernetes2039.codfw.wmnet, wikikube- https://wikitech.wikimedia.org/wiki/PyBal [14:57:16] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [14:57:16] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [15:01:13] RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable [15:04:32] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [15:06:40] PROBLEM - Juniper virtual chassis ports on asw2-d-eqiad is CRITICAL: CRIT: Down: 2 Unknown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status [15:20:28] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [15:20:28] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [15:21:42] RECOVERY - Juniper virtual chassis ports on asw2-d-eqiad is OK: OK: UP: 20 https://wikitech.wikimedia.org/wiki/Network_monitoring%23VCP_status [15:35:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [15:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [15:41:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [16:16:18] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw2313.codfw.wmnet, mw2394.codfw.wmnet, mw2314.codfw.wmnet, wikikube-worker2098.codfw.wmnet, kubernetes2013.codfw.wmnet, wikikube-worker2101.codfw.wmnet, wikikube-worker2013.codfw.wmnet, mw2371.codfw.wmnet, mw2374.codfw.wmnet, kubernetes2053.codfw.wmnet, wikikube-worker2053.codfw.wmnet, parse2015.codfw.wmnet are marked [16:16:18] pooled https://wikitech.wikimedia.org/wiki/PyBal [16:16:18] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers wikikube-worker2033.codfw.wmnet, wikikube-worker2036.codfw.wmnet, mw2338.codfw.wmnet, mw2443.codfw.wmnet, parse2003.codfw.wmnet, wikikube-worker2089.codfw.wmnet, kubernetes2039.codfw.wmnet, wikikube-worker2050.codfw.wmnet, mw2440.codfw.wmnet, mw2444.codfw.wmnet, wikikube-worker2028.codfw.wmnet, wikikube-worker2013.codfw [16:16:18] mw2416.codfw.wmnet, wikikube-worker2085.codfw.wmnet, wikikube-worker2074.codfw.wmnet, wikikube-worker2042.codfw.wmnet, wikikube-worker2039.codfw.wmnet, kubernetes2053.codfw.wmnet, mw2412.codfw.wmnet, mw2282.codfw.wmnet, wikikube-worker2038.codfw.wmnet, wikikube-worker2067.codfw.wmnet, wikikube-worker2011.codfw.wmnet, mw2303.codfw.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [16:17:18] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:18:18] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:24:28] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, wikikube-worker1012.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1388.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmn [16:24:28] rnetes1030.eqiad.wmnet, mw1391.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, kubernetes1012.eqiad.wmnet, parse1005.eqiad.wmnet, wikikube-worker1003.eqiad.wmnet, mw1370.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, mw1465.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, kubernetes1059.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1360.eq [16:24:28] t, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1458.eqiad.wmnet, parse1001.eqiad.wmnet, parse1012.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, ku https://wikitech.wikimedia.org/wiki/PyBal [16:24:28] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1367.eqiad.wmnet, mw1442.eqiad.wmnet, mw1386.eqiad.wmnet, parse1013.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.w [16:24:28] 1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1370.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, mw1465.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, mw1419.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, wikikube [16:24:28] 024.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1062.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, p https://wikitech.wikimedia.org/wiki/PyBal [16:24:54] <_Gerges> Hi [16:27:28] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:27:28] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:28:18] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers wikikube-worker2021.codfw.wmnet, mw2396.codfw.wmnet, parse2017.codfw.wmnet, parse2006.codfw.wmnet, wikikube-worker2102.codfw.wmnet, wikikube-worker2081.codfw.wmnet, mw2375.codfw.wmnet, wikikube-worker2026.codfw.wmnet, kubernetes2024.codfw.wmnet, kubernetes2052.codfw.wmnet, wikikube-worker2099.codfw.wmnet, mw2443.codfw.w [16:28:18] kikube-worker2076.codfw.wmnet, parse2018.codfw.wmnet, wikikube-worker2083.codfw.wmnet, parse2004.codfw.wmnet, kubernetes2050.codfw.wmnet, wikikube-worker2010.codfw.wmnet, wikikube-worker2031.codfw.wmnet, wikikube-worker2022.codfw.wmnet, mw2427.codfw.wmnet, mw2440.codfw.wmnet, parse2020.codfw.wmnet, wikikube-worker2030.codfw.wmnet, wikikube-worker2023.codfw.wmnet, mw2398.codfw.wmnet, wikikube-worker2002.codfw.wmnet, mw2302.codfw.wmnet, par [16:28:18] odfw.wmnet, kubernetes2039.codfw.wmnet, kubernetes2016.codfw.wmnet, parse2012.codfw.wmnet, wikikube-worker2045.codfw.wmnet, mw2413.codfw.wmnet, kubernetes2042.codfw.wmnet, wikikube-work https://wikitech.wikimedia.org/wiki/PyBal [16:28:18] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw2424.codfw.wmnet, wikikube-worker2021.codfw.wmnet, kubernetes2050.codfw.wmnet, kubernetes2056.codfw.wmnet, wikikube-worker2102.codfw.wmnet, mw2375.codfw.wmnet, mw2338.codfw.wmnet, wikikube-worker2099.codfw.wmnet, kubernetes2014.codfw.wmnet, mw2443.codfw.wmnet, wikikube-worker2040.codfw.wmnet, wikikube-worker2083.codfw [16:28:18] wikikube-worker2071.codfw.wmnet, parse2004.codfw.wmnet, wikikube-worker2044.codfw.wmnet, wikikube-worker2091.codfw.wmnet, mw2431.codfw.wmnet, mw2351.codfw.wmnet, mw2427.codfw.wmnet, parse2020.codfw.wmnet, wikikube-worker2027.codfw.wmnet, wikikube-worker2030.codfw.wmnet, kubernetes2006.codfw.wmnet, mw2398.codfw.wmnet, wikikube-worker2041.codfw.wmnet, mw2302.codfw.wmnet, wikikube-worker2096.codfw.wmnet, wikikube-worker2055.codfw.wmnet, kube [16:28:18] 39.codfw.wmnet, wikikube-worker2062.codfw.wmnet, mw2353.codfw.wmnet, mw2449.codfw.wmnet, wikikube-worker2045.codfw.wmnet, wikikube-worker2050.codfw.wmnet, mw2314.codfw.wmnet, wikikube-w https://wikitech.wikimedia.org/wiki/PyBal [16:29:18] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:29:20] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:31:28] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, mw1434.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1462.eqiad.wmnet, mw1388.eqiad.wmnet, mw1405.eqiad.wmnet, kubernetes1030.eqiad.wmnet, mw1389.eqiad.wmnet, mw1425.eqiad.wmnet, kubernetes1033.eqiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worke [16:31:28] iad.wmnet, mw1431.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, mw1464.eqiad.wmnet, mw1381.eqiad.wmnet, parse1021.eqiad.wmnet, kubernetes1006.eqiad.wmnet, mw1441.eqiad.wmnet, parse1018.eqiad.wmnet, mw1376.eqiad.wmnet, wikikube-worker1011.eqiad.wmnet, mw1451.eqiad.wmnet, kubernetes1035.eqiad.wmnet, mw1409.eqiad.wmnet, mw1383.eqiad.wmnet, mw1392.eqiad.wmnet, mw1416.eqiad.wmnet, mw1354.eqiad.wmnet, wikikube-worker [16:31:28] ad.wmnet, mw1374.eqiad.wmnet, wikikube-worker1013.eqiad.wmnet, mw1387.eqiad.wmnet, kubernetes1021.eqiad.wmnet, mw1476.eqiad.wmnet, mw1449.eqiad.wmnet, kubernetes1016.eqiad.wmnet, parse1 https://wikitech.wikimedia.org/wiki/PyBal [16:31:28] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1013.eqiad.wmnet, mw1479.eqiad.wmnet, mw1415.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, mw1435.eqiad.wmnet, parse1010.eqiad.wmnet, mw1370.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet [16:31:28] .eqiad.wmnet, kubernetes1059.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1356.eqiad.wmnet, mw1458.eqiad.wmnet, parse1012.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1028.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1439.eqiad.wmnet, mw1381.eqiad.wmnet, mw1352.eqiad.wmnet, mw1431.eqiad.wmnet, parse1003.eqiad.wmnet, mw1376.eqiad. [16:31:28] ikikube-worker1011.eqiad.wmnet, kubernetes1039.eqiad.wmnet, mw1379.eqiad.wmnet, kubernetes1026.eqiad.wmnet, mw1409.eqiad.wmnet, mw1383.eqiad.wmnet, mw1387.eqiad.wmnet, mw1416.eqiad.wmne https://wikitech.wikimedia.org/wiki/PyBal [16:36:25] FIRING: SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin2002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:36:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [16:41:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [16:42:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [16:43:42] <_Gerges> Hi, Is it possible to create a new namespace with editinterface protection in a Wikimedia project? [16:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [16:53:20] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw2424.codfw.wmnet, kubernetes2046.codfw.wmnet, mw2396.codfw.wmnet, wikikube-worker2033.codfw.wmnet, kubernetes2056.codfw.wmnet, parse2006.codfw.wmnet, wikikube-worker2017.codfw.wmnet, mw2375.codfw.wmnet, mw2427.codfw.wmnet, wikikube-worker2036.codfw.wmnet, parse2009.codfw.wmnet, mw2368.codfw.wmnet, kubernetes2014.codfw [16:53:20] mw2443.codfw.wmnet, kubernetes2048.codfw.wmnet, wikikube-worker2091.codfw.wmnet, wikikube-worker2076.codfw.wmnet, kubernetes2059.codfw.wmnet, wikikube-worker2040.codfw.wmnet, parse2018.codfw.wmnet, wikikube-worker2083.codfw.wmnet, mw2315.codfw.wmnet, wikikube-worker2071.codfw.wmnet, parse2004.codfw.wmnet, kubernetes2050.codfw.wmnet, wikikube-worker2010.codfw.wmnet, mw2431.codfw.wmnet, mw2351.codfw.wmnet, wikikube-worker2086.codfw.wmnet, k [16:53:20] s2022.codfw.wmnet, parse2020.codfw.wmnet, kubernetes2006.codfw.wmnet, wikikube-worker2097.codfw.wmnet, wikikube-worker2023.codfw.wmnet, mw2398.codfw.wmnet, wikikube-worker2041.codfw.wmn https://wikitech.wikimedia.org/wiki/PyBal [16:53:20] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes2046.codfw.wmnet, wikikube-worker2021.codfw.wmnet, wikikube-worker2033.codfw.wmnet, parse2017.codfw.wmnet, kubernetes2056.codfw.wmnet, wikikube-worker2081.codfw.wmnet, wikikube-worker2017.codfw.wmnet, parse2013.codfw.wmnet, mw2375.codfw.wmnet, wikikube-worker2026.codfw.wmnet, kubernetes2024.codfw.wmnet, mw2447 [16:53:20] mnet, mw2370.codfw.wmnet, wikikube-worker2099.codfw.wmnet, mw2443.codfw.wmnet, kubernetes2048.codfw.wmnet, kubernetes2059.codfw.wmnet, parse2018.codfw.wmnet, mw2315.codfw.wmnet, parse2004.codfw.wmnet, wikikube-worker2010.codfw.wmnet, mw2431.codfw.wmnet, wikikube-worker2086.codfw.wmnet, parse2020.codfw.wmnet, mw2425.codfw.wmnet, wikikube-worker2030.codfw.wmnet, kubernetes2006.codfw.wmnet, wikikube-worker2060.codfw.wmnet, wikikube-worker202 [16:53:20] wmnet, mw2359.codfw.wmnet, wikikube-worker2002.codfw.wmnet, wikikube-worker2090.codfw.wmnet, wikikube-worker2055.codfw.wmnet, wikikube-worker2089.codfw.wmnet, kubernetes2039.codfw.wmnet https://wikitech.wikimedia.org/wiki/PyBal [16:55:20] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:55:20] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:56:28] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [16:56:30] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [17:01:28] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1367.eqiad.wmnet, mw1386.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1484.eqiad.wmnet, kubernetes1030.eqiad.wmnet, parse1021.eqiad.wmnet, kubernetes101 [17:01:28] wmnet, wikikube-worker1003.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1033.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1419.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1360.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, parse1012.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernet [17:01:28] qiad.wmnet, kubernetes1031.eqiad.wmnet, mw1464.eqiad.wmnet, mw1391.eqiad.wmnet, parse1003.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqiad.wmnet, wikikube-worke https://wikitech.wikimedia.org/wiki/PyBal [17:01:28] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, mw1395.eqiad.wmnet, mw1488.eqiad.wmnet, mw1425.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmne [17:01:28] 6.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1019.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, parse1018.eqiad.wmnet, mw1431.eqiad.wmnet, mw1355.eqi [17:01:28] , mw1451.eqiad.wmnet, mw1487.eqiad.wmnet, mw1379.eqiad.wmnet, kubernetes1026.eqiad.wmnet, kubernetes1057.eqiad.wmnet, mw1387.eqiad.wmnet, kubernetes1054.eqiad.wmnet, mw1354.eqiad.wmnet, https://wikitech.wikimedia.org/wiki/PyBal [17:12:28] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [17:12:28] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [17:15:28] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, kubernetes1030.eqiad.wmnet, parse1021.eqiad.wmnet, mw1435.eqiad. [17:15:28] w1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1010.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1017.eqiad.wmnet, kubernetes1012.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, wikikube-worker1021.eqiad.wmnet, kubernetes1 [17:15:28] d.wmnet, kubernetes1038.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, mw1468.eqiad.wmnet, parse1006.eqiad.wmnet, ku https://wikitech.wikimedia.org/wiki/PyBal [17:15:28] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, mw1380.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1433.eqiad.wmnet, mw1479.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1430.eqiad.wmnet, mw1480.eqiad.wmnet [17:15:28] .eqiad.wmnet, mw1405.eqiad.wmnet, mw1391.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1425.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1465.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes105 [17:15:28] wmnet, kubernetes1038.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1483.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, wikikube-worker10 https://wikitech.wikimedia.org/wiki/PyBal [18:10:44] FIRING: KubernetesDeploymentUnavailableReplicas: ... [18:10:44] Deployment mw-wikifunctions.eqiad.main in mw-wikifunctions at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=mw-wikifunctions&var-deployment=mw-wikifunctions.eqiad.main - ... [18:10:44] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [18:32:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [18:35:30] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [18:36:30] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [18:40:44] RESOLVED: KubernetesDeploymentUnavailableReplicas: ... [18:40:44] Deployment mw-wikifunctions.eqiad.main in mw-wikifunctions at eqiad has persistently unavailable replicas - https://wikitech.wikimedia.org/wiki/Kubernetes/Troubleshooting#Troubleshooting_a_deployment - https://grafana.wikimedia.org/d/a260da06-259a-4ee4-9540-5cab01a246c8/kubernetes-deployment-details?var-site=eqiad&var-cluster=k8s&var-namespace=mw-wikifunctions&var-deployment=mw-wikifunctions.eqiad.main - ... [18:40:44] https://alerts.wikimedia.org/?q=alertname%3DKubernetesDeploymentUnavailableReplicas [18:43:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:33:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [19:36:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [20:02:53] <_Gerges> jouncebot: next [20:02:53] In 10 hour(s) and 57 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240909T0700) [20:03:45] FIRING: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 22.03% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [20:08:45] RESOLVED: PHPFPMTooBusy: Not enough idle PHP-FPM workers for Mediawiki mw-api-ext at eqiad: 24.7% idle - https://bit.ly/wmf-fpmsat - https://grafana.wikimedia.org/d/U7JT--knk/mw-on-k8s?orgId=1&viewPanel=84&var-dc=eqiad%20prometheus/k8s&var-service=mediawiki&var-namespace=mw-api-ext&var-container_name=All - https://alerts.wikimedia.org/?q=alertname%3DPHPFPMTooBusy [20:20:30] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1025.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1479.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1480.eqiad.wmnet, mw1484.eqiad.wmnet, mw1399.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1389.eqiad.wmnet, kubernetes1012.eqiad [20:20:30] kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, mw1483.eqiad.wmnet, mw1419.eqiad.wmnet, wikikube-worker1021.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1024.eqiad.wmnet, parse1019.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.eqiad.wmnet, mw1472.eqiad.wmnet, wikikube-work [20:20:30] qiad.wmnet, mw1451.eqiad.wmnet, kubernetes1039.eqiad.wmnet, mw1392.eqiad.wmnet, wikikube-worker1002.eqiad.wmnet, wikikube-worker1007.eqiad.wmnet, parse1014.eqiad.wmnet, wikikube-worker1 https://wikitech.wikimedia.org/wiki/PyBal [20:20:32] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1484.eqiad.wmnet, mw1405.eqiad.wmnet, kubernetes1030.eqiad.wmnet, parse1005.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1014.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw136 [20:20:32] wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, kubernetes1058.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, parse1006.eqiad.wmnet, kubernetes1015.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.eqiad.wmnet, parse1003.eqiad.wmnet, mw1376.eqiad.wmnet, wikikube-worker1011.eqiad.wm [20:20:32] 451.eqiad.wmnet, mw1409.eqiad.wmnet, mw1479.eqiad.wmnet, mw1368.eqiad.wmnet, parse1014.eqiad.wmnet, mw1457.eqiad.wmnet, wikikube-worker1020.eqiad.wmnet, wikikube-worker1022.eqiad.wmnet, https://wikitech.wikimedia.org/wiki/PyBal [20:23:30] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [20:23:34] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [20:51:57] FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections [20:54:30] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1442.eqiad.wmnet, mw1479.eqiad.wmnet, mw1388.eqiad.wmnet, mw1405.eqiad.wmnet, kubernetes1030.eqiad.wmnet, parse1021.eqiad.wmnet, mw1435.eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1465.eqiad.wmnet, wikikub [20:54:30] 1009.eqiad.wmnet, mw1369.eqiad.wmnet, mw1419.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, parse1001.eqiad.wmnet, parse1012.eqiad.wmnet, mw1453.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1464.eqiad.wmnet, mw1381.eqiad.wmnet, mw1391.eqiad.wmnet, kubernetes1056.eqiad.wmnet, mw1441.eqiad.wmnet, parse1003.eqiad.wmnet, [20:54:30] 22.eqiad.wmnet, kubernetes1039.eqiad.wmnet, kubernetes1035.eqiad.wmnet, mw1409.eqiad.wmnet, kubernetes1057.eqiad.wmnet, parse1007.eqiad.wmnet, mw1475.eqiad.wmnet, mw1439.eqiad.wmnet, ku https://wikitech.wikimedia.org/wiki/PyBal [20:55:32] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, parse1013.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, mw1480.eqiad.w [20:55:32] 1405.eqiad.wmnet, mw1393.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1369.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1486.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-work [20:55:32] qiad.wmnet, mw1371.eqiad.wmnet, parse1012.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eqiad.wmnet, kubernetes1031.eqiad.wmnet, mw1439.eqiad.wmnet, parse1021.eqiad.wmnet, mw1352 https://wikitech.wikimedia.org/wiki/PyBal [21:01:30] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:01:34] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:47:36] PROBLEM - PyBal backends health check on lvs2014 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers wikikube-worker2021.codfw.wmnet, parse2017.codfw.wmnet, kubernetes2024.codfw.wmnet, mw2447.codfw.wmnet, wikikube-worker2084.codfw.wmnet, mw2443.codfw.wmnet, kubernetes2048.codfw.wmnet, mw2315.codfw.wmnet, wikikube-worker2071.codfw.wmnet, wikikube-worker2044.codfw.wmnet, mw2351.codfw.wmnet, wikikube-worker2022.codfw.wmne [21:47:36] netes2052.codfw.wmnet, wikikube-worker2097.codfw.wmnet, parse2013.codfw.wmnet, wikikube-worker2062.codfw.wmnet, wikikube-worker2045.codfw.wmnet, mw2356.codfw.wmnet, mw2314.codfw.wmnet, wikikube-worker2059.codfw.wmnet, wikikube-worker2098.codfw.wmnet, mw2451.codfw.wmnet, parse2012.codfw.wmnet, mw2399.codfw.wmnet, wikikube-worker2048.codfw.wmnet, kubernetes2044.codfw.wmnet, wikikube-worker2073.codfw.wmnet, mw2301.codfw.wmnet, parse2015.codf [21:47:36] wikikube-worker2049.codfw.wmnet, wikikube-worker2031.codfw.wmnet, wikikube-worker2003.codfw.wmnet, wikikube-worker2100.codfw.wmnet, wikikube-worker2085.codfw.wmnet, wikikube-worker2034 https://wikitech.wikimedia.org/wiki/PyBal [21:47:42] PROBLEM - PyBal backends health check on lvs2013 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes2056.codfw.wmnet, wikikube-worker2036.codfw.wmnet, mw2443.codfw.wmnet, kubernetes2059.codfw.wmnet, kubernetes2050.codfw.wmnet, wikikube-worker2007.codfw.wmnet, wikikube-worker2065.codfw.wmnet, mw2302.codfw.wmnet, wikikube-worker2055.codfw.wmnet, wikikube-worker2062.codfw.wmnet, mw2353.codfw.wmnet, mw2449.codfw [21:47:42] wikikube-worker2045.codfw.wmnet, mw2397.codfw.wmnet, mw2413.codfw.wmnet, mw2314.codfw.wmnet, wikikube-worker2059.codfw.wmnet, mw2440.codfw.wmnet, kubernetes2042.codfw.wmnet, mw2451.codfw.wmnet, mw2304.codfw.wmnet, kubernetes2036.codfw.wmnet, wikikube-worker2101.codfw.wmnet, kubernetes2051.codfw.wmnet, mw2301.codfw.wmnet, parse2014.codfw.wmnet, wikikube-worker2066.codfw.wmnet, wikikube-worker2088.codfw.wmnet, wikikube-worker2094.codfw.wmne [21:47:42] 4.codfw.wmnet, mw2450.codfw.wmnet, wikikube-worker2100.codfw.wmnet, wikikube-worker2085.codfw.wmnet, wikikube-worker2034.codfw.wmnet, mw2437.codfw.wmnet, mw2373.codfw.wmnet, mw2305.codf https://wikitech.wikimedia.org/wiki/PyBal [21:48:36] RECOVERY - PyBal backends health check on lvs2014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [21:48:40] RECOVERY - PyBal backends health check on lvs2013 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:20:32] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, kubernetes1025.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, kubernetes1023.eqiad.wmnet, mw1430.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, parse1009.eqiad.wmnet, mw1399.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqi [22:20:32] , wikikube-worker1003.eqiad.wmnet, mw1391.eqiad.wmnet, mw1370.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1395.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1469.eqiad.wmnet, mw1394.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1371.eqiad.wmnet, mw1453.eqiad.wmnet, wikikube-worker1024.eqiad.wmnet, mw1468.eq [22:20:32] t, kubernetes1028.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, parse1021.eqiad.wmnet, parse1006.eq https://wikitech.wikimedia.org/wiki/PyBal [22:20:34] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, mw1380.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1462.eqiad.wmnet, mw1415.eqiad.wmnet, mw1388.eqiad.wmnet, parse1009.eqiad.wmnet, mw1405.eqiad.wmnet, mw1399.eqiad.wmnet, kubernetes1038.eqiad.wmnet, mw1435 [22:20:34] mnet, mw1424.eqiad.wmnet, parse1010.eqiad.wmnet, mw1408.eqiad.wmnet, kubernetes1012.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1466.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1369.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1005.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, kubernetes1028.eqiad.wmnet, kuberne [22:20:34] eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, mw1464.eqiad.wmnet, mw1381.eqiad.wmnet, mw1352.eqiad.wmnet, mw1441.eqiad.wmnet, mw1431.e https://wikitech.wikimedia.org/wiki/PyBal [22:20:57] FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors [22:28:14] PROBLEM - Host an-worker1168 is DOWN: PING CRITICAL - Packet loss = 100% [22:28:34] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:30:32] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:32:46] RECOVERY - Host an-worker1168 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [22:34:04] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [22:36:25] FIRING: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [22:37:32] PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, parse1011.eqiad.wmnet, mw1433.eqiad.wmnet, kubernetes1025.eqiad.wmnet, mw1419.eqiad.wmnet, mw1434.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, parse1013.eqiad.wmnet, mw1479.eqiad.wmnet, mw1470.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, mw1484.eqiad.wmnet, kubern [22:37:32] .eqiad.wmnet, mw1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1488.eqiad.wmnet, mw1454.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, kubernetes1017.eqiad.wmnet, mw1425.eqiad.wmnet, mw1395.eqiad.wmnet, kubernetes1014.eqiad.wmnet, mw1483.eqiad.wmnet, mw1469.eqiad.wmnet, kubernetes1058.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, mw1458.eqiad.wmnet, mw1371.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1015.eqiad.wmnet, kub [22:37:32] 008.eqiad.wmnet, kubernetes1019.eqiad.wmnet, kubernetes1031.eqiad.wmnet, kubernetes1024.eqiad.wmnet, kubernetes1062.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad. https://wikitech.wikimedia.org/wiki/PyBal [22:37:36] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers kubernetes1010.eqiad.wmnet, mw1433.eqiad.wmnet, mw1380.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1386.eqiad.wmnet, mw1479.eqiad.wmnet, mw1470.eqiad.wmnet, mw1462.eqiad.wmnet, mw1415.eqiad.wmnet, mw1480.eqiad.wmnet, parse1009.eqiad.wmnet, mw1405.eqiad.wmnet, parse1021.eqiad.w [22:37:36] 1424.eqiad.wmnet, mw1393.eqiad.wmnet, mw1454.eqiad.wmnet, mw1408.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1033.eqiad.wmnet, wikikube-worker1009.eqiad.wmnet, mw1483.eqiad.wmnet, mw1369.eqiad.wmnet, mw1367.eqiad.wmnet, kubernetes1059.eqiad.wmnet, mw1394.eqiad.wmnet, mw1486.eqiad.wmnet, mw1360.eqiad.wmnet, mw1356.eqiad.wmnet, wikikube-worker1001.eqiad.wmnet, kubernetes1018.eqiad.wmnet, mw1458.eqiad.wmnet, parse1001.eqiad.wmnet, mw1453.eqia [22:37:36] mw1468.eqiad.wmnet, wikikube-worker1010.eqiad.wmnet, kubernetes1008.eqiad.wmnet, kubernetes1031.eqiad.wmnet, mw1464.eqiad.wmnet, parse1019.eqiad.wmnet, mw1391.eqiad.wmnet, parse1003.eq https://wikitech.wikimedia.org/wiki/PyBal [22:37:46] PROBLEM - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is CRITICAL: CRITICAL: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [22:39:32] RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:39:36] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [22:52:36] PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - mw-wikifunctions_4451: Servers parse1011.eqiad.wmnet, mw1419.eqiad.wmnet, mw1442.eqiad.wmnet, wikikube-worker1029.eqiad.wmnet, mw1470.eqiad.wmnet, parse1005.eqiad.wmnet, mw1408.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1425.eqiad.wmnet, mw1465.eqiad.wmnet, kubernetes1018.eqiad.wmnet, parse1001.eqiad.wmnet, mw1468.eqiad.wmnet, parse1006.e [22:52:36] et, kubernetes1024.eqiad.wmnet, parse1019.eqiad.wmnet, mw1381.eqiad.wmnet, parse1021.eqiad.wmnet, mw1431.eqiad.wmnet, parse1022.eqiad.wmnet, wikikube-worker1011.eqiad.wmnet, mw1379.eqiad.wmnet, kubernetes1026.eqiad.wmnet, mw1409.eqiad.wmnet, mw1392.eqiad.wmnet, mw1375.eqiad.wmnet, kubernetes1057.eqiad.wmnet, mw1368.eqiad.wmnet, wikikube-worker1002.eqiad.wmnet, parse1007.eqiad.wmnet, wikikube-worker1020.eqiad.wmnet, mw1374.eqiad.wmnet, wik [22:52:36] rker1013.eqiad.wmnet, mw1476.eqiad.wmnet, mw1449.eqiad.wmnet, mw1495.eqiad.wmnet, kubernetes1016.eqiad.wmnet, parse1024.eqiad.wmnet, wikikube-worker1017.eqiad.wmnet, mw1357.eqiad.wmnet, https://wikitech.wikimedia.org/wiki/PyBal [22:53:36] RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [23:31:33] 06SRE, 06Data-Engineering, 10MediaWiki-extensions-CentralNotice, 10MediaWiki-extensions-EventLogging, 06Traffic: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#10128526 (10jeremyb) [23:34:04] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin1002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [23:36:25] RESOLVED: [2x] SystemdUnitFailed: httpbb_kubernetes_mw-wikifunctions_hourly.service on cumin1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [23:37:46] RECOVERY - Check unit status of httpbb_kubernetes_mw-wikifunctions_hourly on cumin2002 is OK: OK: Status of the systemd unit httpbb_kubernetes_mw-wikifunctions_hourly https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [23:38:09] (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1071363 [23:38:10] (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1071363 (owner: 10TrainBranchBot)