[00:02:40] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1120 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:03:32] <icinga-wm>	 PROBLEM - SSH on puppetserver1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:03:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: dump_cloud_ip_ranges.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:03:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1001:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:05:32] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1092 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:05:53] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1072635 (owner: 10TrainBranchBot)
[00:08:44] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1072637
[00:08:44] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1072637 (owner: 10TrainBranchBot)
[00:09:06] <jinxer-wm>	 FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[00:14:06] <jinxer-wm>	 RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[00:14:40] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1120 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:20:38] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1171 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:29:48] <wikibugs>	 (03CR) 10JHathaway: [V:03+1] "PCC SUCCESS (CORE_DIFF 24 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node" [puppet] - 10https://gerrit.wikimedia.org/r/1072593 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[00:35:34] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1140 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:38:53] <wikibugs>	 (03Merged) 10jenkins-bot: Branch commit for wmf/next [core] (wmf/next) - 10https://gerrit.wikimedia.org/r/1072637 (owner: 10TrainBranchBot)
[00:44:40] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1171 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[00:46:30] <icinga-wm>	 RECOVERY - SSH on puppetserver1002 is OK: SSH OK - OpenSSH_9.2p1 Debian-2+deb12u3 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring
[00:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[00:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[00:55:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10142991 (10phaultfinder)
[01:00:24] <wikibugs>	 (03CR) 10JHathaway: [V:03+1] "PCC SUCCESS (CORE_DIFF 24 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node" [puppet] - 10https://gerrit.wikimedia.org/r/1072593 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[01:18:06] <jinxer-wm>	 FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[01:23:06] <jinxer-wm>	 RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_nologinattempt) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[01:47:42] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1158 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[01:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[01:57:46] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1166 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[02:01:44] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1166 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[02:15:42] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1158 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[02:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[02:39:12] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[02:59:12] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[03:54:14] <wikibugs>	 (03PS4) 10Ebrahim: Remove ProofreadPage dark mode namespaces exception [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072600
[03:55:26] <wikibugs>	 (03PS5) 10Ebrahim: Remove ProofreadPage dark mode namespaces exception [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072600
[03:55:55] <wikibugs>	 (03PS6) 10Ebrahim: Remove ProofreadPage dark mode namespaces exception [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072600
[03:56:39] <wikibugs>	 (03PS7) 10Ebrahim: Remove ProofreadPage dark mode namespaces exception [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072600
[04:20:25] <wikibugs>	 (03CR) 10Ebrahim: "Asked on Meta:Babel, https://meta.wikimedia.org/wiki/Meta:Babel#Enable_the_dark_mode_for_Grants,_Research_and_Iberocoop_namespaces" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072623 (owner: 10Ebrahim)
[04:28:06] <jinxer-wm>	 FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[04:33:06] <jinxer-wm>	 RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[04:45:10] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143166 (10phaultfinder)
[04:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[04:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[05:20:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143172 (10phaultfinder)
[05:44:06] <jinxer-wm>	 FIRING: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[05:49:06] <jinxer-wm>	 RESOLVED: MediaWikiLoginFailures: Elevated MediaWiki centrallogin failures (centralauth_error_badtoken) - https://wikitech.wikimedia.org/wiki/Application_servers/Runbook - https://grafana.wikimedia.org/d/000000438/mediawiki-exceptions-alerts?viewPanel=3 - https://alerts.wikimedia.org/?q=alertname%3DMediaWikiLoginFailures
[05:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[06:02:01] <wikibugs>	 (03CR) 10Muehlenhoff: Allow users to see rejected requests for permissions. (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072552 (owner: 10Slyngshede)
[06:04:21] <jinxer-wm>	 FIRING: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:06:43] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks good from a technical perspective, but a few comments inline which are more from a process angle." [software/bitu] - 10https://gerrit.wikimedia.org/r/1072552 (owner: 10Slyngshede)
[06:09:21] <jinxer-wm>	 RESOLVED: PoolcounterFullQueues: Full queues for poolcounter1004:9106 poolcounter - https://www.mediawiki.org/wiki/PoolCounter#Request_tracing_in_production - https://grafana.wikimedia.org/d/aIcYxuxZk/poolcounter?orgId=1&viewPanel=6&from=now-1h&to=now&var-dc=eqiad%20prometheus/ops - https://alerts.wikimedia.org/?q=alertname%3DPoolcounterFullQueues
[06:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[06:26:50] <wikibugs>	 (03PS1) 10Stevemunene: Add new an worker keytabs [labs/private] - 10https://gerrit.wikimedia.org/r/1072655 (https://phabricator.wikimedia.org/T353788)
[06:31:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] puppet8: account for unknown probe types [puppet] - 10https://gerrit.wikimedia.org/r/1072303 (https://phabricator.wikimedia.org/T372664) (owner: 10JHathaway)
[06:32:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM, commit message needs adjusting tho" [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[06:34:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] alert: Failover from alert2002 to alert1002 [puppet] - 10https://gerrit.wikimedia.org/r/1071701 (https://phabricator.wikimedia.org/T372418) (owner: 10Andrea Denisse)
[06:34:54] <wikibugs>	 (03CR) 10Muehlenhoff: Menu: Add menu entry for managers to view pending permission requests. (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072547 (owner: 10Slyngshede)
[06:34:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: "This will need rebasing on current dns repo (e.g. alerts CNAME alert2002 not alert1001)" [dns] - 10https://gerrit.wikimedia.org/r/1063078 (https://phabricator.wikimedia.org/T372418) (owner: 10Andrea Denisse)
[06:38:34] <wikibugs>	 (03CR) 10Muehlenhoff: Permission validation: Handle validation for manager approvals better. (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072528 (owner: 10Slyngshede)
[06:43:48] <wikibugs>	 (03PS1) 10Filippo Giunchedi: team-sre: tweak MediaWikiLoginFailures threshold [alerts] - 10https://gerrit.wikimedia.org/r/1072657 (https://phabricator.wikimedia.org/T350597)
[06:48:00] <wikibugs>	 (03CR) 10Slyngshede: Menu: Add menu entry for managers to view pending permission requests. (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072547 (owner: 10Slyngshede)
[06:53:43] <wikibugs>	 (03CR) 10Muehlenhoff: Menu: Add menu entry for managers to view pending permission requests. (031 comment) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072547 (owner: 10Slyngshede)
[06:54:52] <jayme>	 !log evacuating leadership for all partitions assigned to broker id 2005 on kafka-main-codfw - T363210
[06:54:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:56] <stashbot>	 T363210: kafka-main200[6789] and kafka-main2010 implementation tracking - https://phabricator.wikimedia.org/T363210
[06:56:20] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[2005,2010].codfw.wmnet with reason: Hardware refresh
[06:56:38] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[2005,2010].codfw.wmnet with reason: Hardware refresh
[07:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240913T0700)
[07:02:03] <wikibugs>	 (03PS1) 10Stevemunene: hdfs: Add new worker hosts to net_topology [puppet] - 10https://gerrit.wikimedia.org/r/1072660 (https://phabricator.wikimedia.org/T353788)
[07:02:05] <wikibugs>	 (03PS1) 10Stevemunene: hdfs: Assign the worker role to new hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/1072661 (https://phabricator.wikimedia.org/T353788)
[07:03:11] <wikibugs>	 (03Abandoned) 10Stevemunene: trafficserver: add airflow-analytics-test discovery record [puppet] - 10https://gerrit.wikimedia.org/r/1057830 (https://phabricator.wikimedia.org/T371210) (owner: 10Stevemunene)
[07:09:58] <wikibugs>	 (03PS1) 10JMeybohm: kafka-main: Replace kafka-main2005 with kafka-main2010 [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210)
[07:13:20] <wikibugs>	 (03PS1) 10JMeybohm: Replace kafka-main2005 with kafka-main2010 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072663 (https://phabricator.wikimedia.org/T363210)
[07:23:46] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] kafka-main: Replace kafka-main2005 with kafka-main2010 [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:27:33] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw
[07:31:54] <wikibugs>	 (03PS1) 10Muehlenhoff: envoy: Add support for passing an array of sets to the firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1072690
[07:32:46] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] START helmfile.d/admin 'apply'.
[07:32:53] <logmsgbot>	 !log jayme@deploy1003 helmfile [eqiad] DONE helmfile.d/admin 'apply'.
[07:32:55] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/admin 'apply'.
[07:33:18] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/admin 'apply'.
[07:33:19] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
[07:33:31] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
[07:33:32] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] START helmfile.d/admin 'apply'.
[07:34:03] <logmsgbot>	 !log jayme@deploy1003 helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
[07:34:05] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
[07:34:19] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
[07:34:21] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
[07:34:31] <wikibugs>	 (03CR) 10CI reject: [V:04-1] envoy: Add support for passing an array of sets to the firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1072690 (owner: 10Muehlenhoff)
[07:34:54] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
[07:34:56] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
[07:35:30] <logmsgbot>	 !log jayme@deploy1003 helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
[07:35:31] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:35:44] <logmsgbot>	 !log jayme@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:35:45] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
[07:35:56] <logmsgbot>	 !log jayme@deploy1003 helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[07:36:15] <wikibugs>	 10ops-codfw, 06DC-Ops, 10decommission-hardware, 06serviceops: decommission kafka-main2005.codfw.wmnet - https://phabricator.wikimedia.org/T374688 (10JMeybohm) 03NEW
[07:36:28] <wikibugs>	 10ops-codfw, 06DC-Ops, 10decommission-hardware, 06serviceops: decommission kafka-main2005.codfw.wmnet - https://phabricator.wikimedia.org/T374688#10143297 (10JMeybohm)
[07:39:19] <wikibugs>	 (03PS1) 10JMeybohm: Decom kafka-main2005 [puppet] - 10https://gerrit.wikimedia.org/r/1072695 (https://phabricator.wikimedia.org/T374688)
[07:39:39] <wikibugs>	 (03CR) 10Jelto: "one question in line" [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:42:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10vm-requests: eqiad: 2 VM request for poolcounter - https://phabricator.wikimedia.org/T374629#10143306 (10elukey)
[07:43:09] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] kafka-main: Replace kafka-main2005 with kafka-main2010 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:43:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10vm-requests: eqiad: 2 VM request for poolcounter - https://phabricator.wikimedia.org/T374629#10143308 (10elukey) ` +-------+-------+-----------+----------+-----------+---------+-----------+ | Group | Nodes | Instances |  MFree   | MFree avg |  DFree  | DFree avg | +----...
[07:43:34] <wikibugs>	 (03PS2) 10Muehlenhoff: envoy: Add support for passing an array of sets to the firewall service [puppet] - 10https://gerrit.wikimedia.org/r/1072690
[07:44:42] <wikibugs>	 (03PS1) 10Elukey: Add configuration for poolcounter100[6,7] [puppet] - 10https://gerrit.wikimedia.org/r/1072696 (https://phabricator.wikimedia.org/T374629)
[07:45:35] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.provision: improve Supermicro's bios settings [cookbooks] - 10https://gerrit.wikimedia.org/r/1071553 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[07:45:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/1072696 (https://phabricator.wikimedia.org/T374629) (owner: 10Elukey)
[07:45:42] <wikibugs>	 (03CR) 10Elukey: [C:03+2] sre.hosts.provision: refactor _config_dell_pxe() [cookbooks] - 10https://gerrit.wikimedia.org/r/1072553 (https://phabricator.wikimedia.org/T365372) (owner: 10Elukey)
[07:46:11] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10vm-requests, 13Patch-For-Review: eqiad: 2 VM request for poolcounter - https://phabricator.wikimedia.org/T374629#10143312 (10MoritzMuehlenhoff) +1
[07:46:23] <wikibugs>	 (03CR) 10Elukey: [C:03+2] Add configuration for poolcounter100[6,7] [puppet] - 10https://gerrit.wikimedia.org/r/1072696 (https://phabricator.wikimedia.org/T374629) (owner: 10Elukey)
[07:46:40] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw
[07:46:44] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072663 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:47:29] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.ganeti.makevm for new host poolcounter1006.eqiad.wmnet
[07:47:30] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[07:50:25] <wikibugs>	 (03CR) 10Jelto: [C:03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/1072695 (https://phabricator.wikimedia.org/T374688) (owner: 10JMeybohm)
[07:50:43] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM poolcounter1006.eqiad.wmnet - elukey@cumin1002"
[07:50:47] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM poolcounter1006.eqiad.wmnet - elukey@cumin1002"
[07:50:47] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[07:50:47] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.wipe-cache poolcounter1006.eqiad.wmnet on all recursors
[07:50:50] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) poolcounter1006.eqiad.wmnet on all recursors
[07:51:16] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM poolcounter1006.eqiad.wmnet - elukey@cumin1002"
[07:51:21] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM poolcounter1006.eqiad.wmnet - elukey@cumin1002"
[07:52:13] <moritzm>	 !log installing nano updates from Bookworm point release
[07:52:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:52:38] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072690 (owner: 10Muehlenhoff)
[07:53:06] <wikibugs>	 (03CR) 10Jelto: kafka-main: Replace kafka-main2005 with kafka-main2010 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:53:18] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host poolcounter1006.eqiad.wmnet with OS bookworm
[07:53:29] <wikibugs>	 (03CR) 10Jelto: [C:03+1] kafka-main: Replace kafka-main2005 with kafka-main2010 [puppet] - 10https://gerrit.wikimedia.org/r/1072662 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[07:55:37] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache:haproxy: hardcode $schema field [puppet] - 10https://gerrit.wikimedia.org/r/1072577 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[07:56:41] <wikibugs>	 (03PS1) 10Fabfur: hiera: testing haproxykafka on cp4037 [puppet] - 10https://gerrit.wikimedia.org/r/1072697 (https://phabricator.wikimedia.org/T374473)
[07:58:12] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache:haproxy: hardcode $schema field [puppet] - 10https://gerrit.wikimedia.org/r/1072577 (https://phabricator.wikimedia.org/T370668) (owner: 10Fabfur)
[08:01:53] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10143324 (10MoritzMuehlenhoff)
[08:02:25] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on poolcounter1006.eqiad.wmnet with reason: host reimage
[08:02:43] <wikibugs>	 (03CR) 10Vgutierrez: "no longer relevant for the latest PS" [puppet] - 10https://gerrit.wikimedia.org/r/1072590 (https://phabricator.wikimedia.org/T370837) (owner: 10BCornwall)
[08:05:23] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on poolcounter1006.eqiad.wmnet with reason: host reimage
[08:06:34] <wikibugs>	 (03CR) 10JMeybohm: [C:04-1] "Feel free to ignore the nits ofc., but the CHANGELOG format should follow the rest of the modules" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072502 (owner: 10Effie Mouzeli)
[08:11:44] <wikibugs>	 (03CR) 10Elukey: [V:03+2 C:03+2] spark: force a rebuild to pick up OS package upgrades [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/1072150 (https://phabricator.wikimedia.org/T371874) (owner: 10Elukey)
[08:12:13] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Reenable notifications for db2184 [puppet] - 10https://gerrit.wikimedia.org/r/1072699 (https://phabricator.wikimedia.org/T335640)
[08:13:10] <wikibugs>	 (03CR) 10Elukey: [C:03+2] blubber: force rebuild to pick up git upgrades [software/tegola] (wmf/v0.19.x) - 10https://gerrit.wikimedia.org/r/1071802 (https://phabricator.wikimedia.org/T373976) (owner: 10Elukey)
[08:14:16] <wikibugs>	 (03CR) 10Jcrespo: [C:03+2] mariadb: Reenable notifications for db2184 [puppet] - 10https://gerrit.wikimedia.org/r/1072699 (https://phabricator.wikimedia.org/T335640) (owner: 10Jcrespo)
[08:14:20] <wikibugs>	 (03Merged) 10jenkins-bot: blubber: force rebuild to pick up git upgrades [software/tegola] (wmf/v0.19.x) - 10https://gerrit.wikimedia.org/r/1071802 (https://phabricator.wikimedia.org/T373976) (owner: 10Elukey)
[08:15:55] <wikibugs>	 (03CR) 10Elukey: [C:03+1] puppetmaster::frontend|backend: Read the puppet-merge server from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/1072543 (https://phabricator.wikimedia.org/T374443) (owner: 10Muehlenhoff)
[08:16:25] <logmsgbot>	 !log jmm@cumin2002 START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: gerrit1004.wikimedia.org
[08:16:25] <logmsgbot>	 !log jmm@cumin2002 END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: gerrit1004.wikimedia.org
[08:16:39] <wikibugs>	 (03CR) 10Elukey: [C:03+1] puppetserver: Pass the value of puppet_merge_server [puppet] - 10https://gerrit.wikimedia.org/r/1072494 (https://phabricator.wikimedia.org/T374443) (owner: 10Muehlenhoff)
[08:17:43] <wikibugs>	 (03CR) 10Elukey: [C:03+1] Do not use a login shell when dropping privileges [software/debmonitor-client] (debian) - 10https://gerrit.wikimedia.org/r/1060789 (https://phabricator.wikimedia.org/T216832) (owner: 10Hashar)
[08:18:06] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host poolcounter1006.eqiad.wmnet with OS bookworm
[08:18:06] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host poolcounter1006.eqiad.wmnet
[08:18:56] <wikibugs>	 (03CR) 10Elukey: [C:03+1] test-cookbook: read spicerack config with sudo [puppet] - 10https://gerrit.wikimedia.org/r/1071810 (owner: 10Volans)
[08:19:26] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.ganeti.makevm for new host poolcounter1007.eqiad.wmnet
[08:19:27] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.netbox
[08:21:29] <wikibugs>	 (03PS2) 10Fabfur: hiera: testing haproxykafka on cp4037 [puppet] - 10https://gerrit.wikimedia.org/r/1072697 (https://phabricator.wikimedia.org/T374473)
[08:25:17] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM poolcounter1007.eqiad.wmnet - elukey@cumin1002"
[08:27:13] <moritzm>	 !log remove djangorestframework 3.14.0-2+wmf12u1 from apt.wikimedia.org, the bug fixed in that custom build has been integrated into Debian Bookworm via a point update and is no longer needed
[08:27:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:27:53] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] hiera: testing haproxykafka on cp4037 [puppet] - 10https://gerrit.wikimedia.org/r/1072697 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[08:28:05] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM poolcounter1007.eqiad.wmnet - elukey@cumin1002"
[08:28:05] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[08:28:05] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.dns.wipe-cache poolcounter1007.eqiad.wmnet on all recursors
[08:28:08] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) poolcounter1007.eqiad.wmnet on all recursors
[08:28:35] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM poolcounter1007.eqiad.wmnet - elukey@cumin1002"
[08:28:40] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM poolcounter1007.eqiad.wmnet - elukey@cumin1002"
[08:29:51] <moritzm>	 !log rolling out djangorestbase update from Bookworm point release (replacing our previous bespoke build)
[08:29:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:04] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
[08:30:12] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143366 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1002 for host ml-l...
[08:32:09] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host poolcounter1007.eqiad.wmnet with OS bookworm
[08:35:33] <logmsgbot>	 !log fabfur@cumin1002 conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
[08:37:19] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10143379 (10MoritzMuehlenhoff)
[08:39:12] <logmsgbot>	 !log fabfur@cumin1002 conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
[08:40:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143389 (10phaultfinder)
[08:40:11] <wikibugs>	 (03PS1) 10Klausman: preseed: Add missing wildcard for ml-lab partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/1072703
[08:42:07] <wikibugs>	 (03CR) 10Klausman: [C:03+2] preseed: Add missing wildcard for ml-lab partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/1072703 (owner: 10Klausman)
[08:42:42] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on poolcounter1007.eqiad.wmnet with reason: host reimage
[08:43:25] <logmsgbot>	 !log klausman@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-lab1001.eqiad.wmnet with OS bookworm
[08:43:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143390 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1002 for host ml-lab10...
[08:45:10] <logmsgbot>	 !log elukey@cumin1002 START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
[08:46:09] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on poolcounter1007.eqiad.wmnet with reason: host reimage
[08:47:33] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
[08:47:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143392 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1002 for host ml-l...
[08:48:11] <logmsgbot>	 !log elukey@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
[08:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[08:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[08:59:33] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure, 10LDAP-Access-Requests: Requesting access to `contint-admins`, `contint-docker`, LDAP `ciadmin` for 'Arthur taylor' - https://phabricator.wikimedia.org/T373969#10143404 (10hashar) 05Stalled→03Open
[09:00:11] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143405 (10phaultfinder)
[09:01:19] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host poolcounter1007.eqiad.wmnet with OS bookworm
[09:01:19] <logmsgbot>	 !log elukey@cumin1002 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host poolcounter1007.eqiad.wmnet
[09:02:28] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
[09:02:47] <logmsgbot>	 !log btullis@deploy1003 helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
[09:06:46] <wikibugs>	 (03CR) 10Btullis: [C:03+1] "Looks good to me." [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072243 (https://phabricator.wikimedia.org/T373195) (owner: 10Bking)
[09:07:36] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Replace kafka-main2005 with kafka-main2010 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072663 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[09:08:48] <wikibugs>	 10ops-codfw, 06SRE, 10SRE-swift-storage, 06collaboration-services, and 3 others: Migrate servers in codfw racks D1 & D2 from asw to lsw - https://phabricator.wikimedia.org/T373102#10143412 (10cmooney) 05Open→03Resolved a:03cmooney
[09:09:20] <wikibugs>	 (03Merged) 10jenkins-bot: Replace kafka-main2005 with kafka-main2010 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072663 (https://phabricator.wikimedia.org/T363210) (owner: 10JMeybohm)
[09:09:56] <logmsgbot>	 !log klausman@cumin1002 END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-lab1001.eqiad.wmnet with OS bookworm
[09:10:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143433 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1002 for host ml-lab10...
[09:12:37] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.reimage for host ml-lab1001.eqiad.wmnet with OS bookworm
[09:12:49] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143460 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by klausman@cumin1002 for host ml-l...
[09:14:57] <jayme>	 !log restoring leadership for all partitions assigned to broker id 2005 on kafka-main-codfw - T363210
[09:15:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:01] <stashbot>	 T363210: kafka-main200[6789] and kafka-main2010 implementation tracking - https://phabricator.wikimedia.org/T363210
[09:15:18] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.remove-downtime for kafka-main2010.codfw.wmnet
[09:15:19] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2010.codfw.wmnet
[09:19:15] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
[09:20:08] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
[09:20:09] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/changeprop: apply
[09:20:23] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.hosts.decommission for hosts kafka-main2005.codfw.wmnet
[09:20:26] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/changeprop: apply
[09:20:28] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
[09:20:57] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2059.codfw.wmnet
[09:21:05] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
[09:21:07] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/eventgate-main: apply
[09:21:31] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2059.codfw.wmnet
[09:21:36] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes2060.codfw.wmnet
[09:21:52] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
[09:21:53] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/eventstreams: apply
[09:22:10] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes2060.codfw.wmnet
[09:22:15] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2301.codfw.wmnet
[09:22:49] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
[09:22:49] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2301.codfw.wmnet
[09:22:50] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
[09:22:54] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2302.codfw.wmnet
[09:23:03] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
[09:23:04] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
[09:23:23] <logmsgbot>	 !log jayme@deploy1003 helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
[09:23:33] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2302.codfw.wmnet
[09:23:37] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2303.codfw.wmnet
[09:24:11] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2303.codfw.wmnet
[09:24:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2304.codfw.wmnet
[09:24:38] <wikibugs>	 (03CR) 10DCausse: flink-app: customize calico label selector (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072236 (https://phabricator.wikimedia.org/T373195) (owner: 10Bking)
[09:24:51] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2304.codfw.wmnet
[09:24:51] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1001.eqiad.wmnet with reason: host reimage
[09:24:54] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: kubernetes20[59-60], mw230[1-5] -> wikikube-worker21[14-20] [puppet] - 10https://gerrit.wikimedia.org/r/1072712 (https://phabricator.wikimedia.org/T372878)
[09:24:56] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2305.codfw.wmnet
[09:25:29] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2305.codfw.wmnet
[09:25:43] <jinxer-wm>	 FIRING: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[09:27:24] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.dns.netbox
[09:28:12] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1001.eqiad.wmnet with reason: host reimage
[09:28:47] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] kubernetes20[59-60], mw230[1-5] -> wikikube-worker21[14-20] [puppet] - 10https://gerrit.wikimedia.org/r/1072712 (https://phabricator.wikimedia.org/T372878) (owner: 10Alexandros Kosiaris)
[09:28:56] <jinxer-wm>	 FIRING: [2x] RdfStreamingUpdaterFlinkJobUnstable: WCQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[09:30:43] <jinxer-wm>	 RESOLVED: ElevatedMaxLagWDQS: WDQS lag is above 10 minutes - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Runbook - https://grafana.wikimedia.org/d/000000489/wikidata-query-service - https://alerts.wikimedia.org/?q=alertname%3DElevatedMaxLagWDQS
[09:33:56] <jinxer-wm>	 RESOLVED: [2x] RdfStreamingUpdaterFlinkJobUnstable: WCQS_Streaming_Updater in codfw (k8s) is unstable - https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater  - https://alerts.wikimedia.org/?q=alertname%3DRdfStreamingUpdaterFlinkJobUnstable
[09:34:23] <logmsgbot>	 !log jayme@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2005.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
[09:34:40] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main2005.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin1002"
[09:34:40] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[09:34:40] <logmsgbot>	 !log jayme@cumin1002 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-main2005.codfw.wmnet
[09:34:54] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware, and 2 others: decommission kafka-main2005.codfw.wmnet - https://phabricator.wikimedia.org/T374688#10143508 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by jayme@cumin1002 for hosts: `kafka-main2005.codfw.wmnet` - kafka-main2005.codf...
[09:36:57] <wikibugs>	 (03CR) 10Vgutierrez: hiera: let purged use closest cluster on codfw, ulsfo and eqsin [puppet] - 10https://gerrit.wikimedia.org/r/1071844 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[09:37:35] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10decommission-hardware, and 2 others: decommission kafka-main2005.codfw.wmnet - https://phabricator.wikimedia.org/T374688#10143510 (10JMeybohm) a:05JMeybohm→03None
[09:37:56] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] "kafka-main-codfw is done, this can be merged now" [puppet] - 10https://gerrit.wikimedia.org/r/1071844 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[09:38:22] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1071844 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[09:38:53] <wikibugs>	 (03PS1) 10Fabfur: hiera: enable haproxykafka on cp3066 for testing [puppet] - 10https://gerrit.wikimedia.org/r/1072714 (https://phabricator.wikimedia.org/T374473)
[09:39:51] <wikibugs>	 (03PS2) 10Fabfur: hiera: enable haproxykafka on cp3066 for testing [puppet] - 10https://gerrit.wikimedia.org/r/1072714 (https://phabricator.wikimedia.org/T374473)
[09:41:09] <logmsgbot>	 !log klausman@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin1002"
[09:41:37] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin1002"
[09:41:38] <logmsgbot>	 !log klausman@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1001.eqiad.wmnet with OS bookworm
[09:41:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143519 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by klausman@cumin1002 for host ml-lab10...
[09:42:36] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10143533 (10klausman)
[09:46:58] <wikibugs>	 (03CR) 10Ilias Sarantopoulos: "@tklausmann@wikimedia.org can you merge please? I don't have +2 on this repo. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1063213 (https://phabricator.wikimedia.org/T360455) (owner: 10Ilias Sarantopoulos)
[09:54:20] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10vm-requests: eqiad: 2 VM request for poolcounter - https://phabricator.wikimedia.org/T374629#10143560 (10elukey) 05Open→03Resolved a:03elukey
[09:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[09:55:00] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10143564 (10MoritzMuehlenhoff)
[09:55:44] <icinga-wm>	 PROBLEM - Swift https backend on ms-fe2011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift
[09:56:22] <icinga-wm>	 PROBLEM - Swift https frontend on ms-fe2013 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 260 bytes in 1.104 second response time https://wikitech.wikimedia.org/wiki/Swift
[09:56:34] <icinga-wm>	 RECOVERY - Swift https backend on ms-fe2011 is OK: HTTP OK: HTTP/1.1 200 OK - 506 bytes in 0.079 second response time https://wikitech.wikimedia.org/wiki/Swift
[09:57:20] <icinga-wm>	 RECOVERY - Swift https frontend on ms-fe2013 is OK: HTTP OK: HTTP/1.1 200 OK - 295 bytes in 0.096 second response time https://wikitech.wikimedia.org/wiki/Swift
[09:57:54] <wikibugs>	 (03PS1) 10Elukey: services: update thumbor-eqiad to poolcounter1006 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072716 (https://phabricator.wikimedia.org/T332015)
[09:57:56] <wikibugs>	 (03PS1) 10Elukey: services: add new poolcounter nodes to MW configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072717 (https://phabricator.wikimedia.org/T332015)
[09:58:00] <jinxer-wm>	 FIRING: Primary inbound port utilisation over 80%  #page: Alert for device cr4-ulsfo.wikimedia.org - Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[09:58:00] <jinxer-wm>	 FIRING: Primary outbound port utilisation over 80%  #page: Alert for device cr1-codfw.wikimedia.org - Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[09:58:16] <vgutierrez>	 !incidents
[09:58:17] <sirenbot>	 5164 (UNACKED)  Primary inbound port utilisation over 80%  (paged) global noc (cr4-ulsfo.wikimedia.org)
[09:58:17] <sirenbot>	 5165 (UNACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-codfw.wikimedia.org)
[09:58:22] <vgutierrez>	 !ack 5164
[09:58:23] <sirenbot>	 5164 (ACKED)  Primary inbound port utilisation over 80%  (paged) global noc (cr4-ulsfo.wikimedia.org)
[09:58:23] <vgutierrez>	 !ack 5165
[09:58:24] <sirenbot>	 5165 (ACKED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-codfw.wikimedia.org)
[09:58:27] <jayme>	 thanks
[09:58:37] <vgutierrez>	 I'm wondering if we triggered that with kafka :_)
[10:00:27] <jayme>	 it's already down again, isn't it?
[10:02:26] <wikibugs>	 (03PS1) 10Vgutierrez: Revert "hiera: let purged use closest cluster on codfw, ulsfo and eqsin" [puppet] - 10https://gerrit.wikimedia.org/r/1072718
[10:03:00] <jinxer-wm>	 RESOLVED: Primary inbound port utilisation over 80%  #page: Device cr4-ulsfo.wikimedia.org recovered from Primary inbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+inbound+port+utilisation+over+80%25++%23page
[10:03:00] <jinxer-wm>	 RESOLVED: Primary outbound port utilisation over 80%  #page: Device cr1-codfw.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page   - https://alerts.wikimedia.org/?q=alertname%3DPrimary+outbound+port+utilisation+over+80%25++%23page
[10:03:15] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] Revert "hiera: let purged use closest cluster on codfw, ulsfo and eqsin" [puppet] - 10https://gerrit.wikimedia.org/r/1072718 (owner: 10Vgutierrez)
[10:04:28] <wikibugs>	 (03CR) 10Elukey: Swap poolcounter2003 with poolcounter2005 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072206 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[10:07:19] <wikibugs>	 (03PS2) 10Elukey: Update the Debian changelog to build on Bookworm [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969)
[10:08:01] <icinga-wm>	 PROBLEM - MariaDB Replica Lag: s8 #page on db1172 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 86326.51 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
[10:08:09] <vgutierrez>	 !incidents
[10:08:10] <sirenbot>	 5166 (UNACKED)  db1172 (paged)/MariaDB Replica Lag: s8 (paged)
[10:08:10] <sirenbot>	 5165 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-codfw.wikimedia.org)
[10:08:10] <sirenbot>	 5164 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (cr4-ulsfo.wikimedia.org)
[10:08:12] <vgutierrez>	 !ack 5166
[10:08:13] <sirenbot>	 5166 (ACKED)  db1172 (paged)/MariaDB Replica Lag: s8 (paged)
[10:09:03] <jynus>	 that must be a schema change that went over downtime
[10:09:20] <jynus>	 Amir1 ^
[10:09:48] <jynus>	 let me double check it is depooled
[10:10:20] <jayme>	 it's pooled
[10:10:28] <jynus>	 pooled?
[10:10:28] <jayme>	 for apu
[10:10:31] <jayme>	 *api
[10:10:45] <jayme>	 but not pooled in general ... whatever that means :)
[10:11:13] <jayme>	 sections.s8.groups.api.pooled: true
[10:11:19] <jynus>	 yeah, I belive that overrides it
[10:11:25] <jynus>	 let me double check the generated config
[10:11:49] <jynus>	 one one is the normal state and the other is the global temporary status
[10:12:20] <jynus>	 yeah, no reference of it at eqiad.json
[10:12:40] <jynus>	 it would should otherwise errors on mw of db not availible of the probe
[10:12:52] <jayme>	 so "sections.s8.pooled: false" overrides the groups.api one
[10:12:59] <jynus>	 yeah
[10:13:08] <wikibugs>	 (03PS1) 10Fabfur: prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[10:13:29] <wikibugs>	 (03CR) 10CI reject: [V:04-1] prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[10:13:47] <jynus>	 jayme: after all, the documentation says: to depool a host, set is a depooled, otherwise it would be a hell to depool
[10:13:50] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: Cloud IPv6 subnets - https://phabricator.wikimedia.org/T187929#10143598 (10cmooney) Talking about this again I'm ok with the revised plan, with allocations similar to our POP sites.  So for instnace for codfw we can probably move ahead on this basis:  * 2a02:ec8...
[10:13:53] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] team-sre: tweak MediaWikiLoginFailures threshold [alerts] - 10https://gerrit.wikimedia.org/r/1072657 (https://phabricator.wikimedia.org/T350597) (owner: 10Filippo Giunchedi)
[10:14:09] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update the Debian changelog to build on Bookworm [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:14:53] <jayme>	 jynus: okidoke. So rn I just downtime it and open a ticket for DBA's to inspect?
[10:15:06] <jynus>	 jayme: vgutierrez I will downtime the host until monday, then pull Amir's ears
[10:15:14] <jynus>	 I will handle it, no worries
[10:15:19] <vgutierrez>	 thx jynus 
[10:15:21] <jayme>	 sweet, thanks!
[10:18:14] <logmsgbot>	 !log jynus@cumin1002 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1172.eqiad.wmnet with reason: ongoing schema change
[10:18:30] <wikibugs>	 (03CR) 10Elukey: "The lintian errors are:" [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:18:30] <logmsgbot>	 !log jynus@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1172.eqiad.wmnet with reason: ongoing schema change
[10:19:19] <jynus>	 actually, that may be arnaudb, not amir, according to https://wikitech.wikimedia.org/wiki/Map_of_database_maintenance
[10:20:08] <wikibugs>	 (03CR) 10Muehlenhoff: Update the Debian changelog to build on Bookworm (031 comment) [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:23:21] <wikibugs>	 (03CR) 10Elukey: Update the Debian changelog to build on Bookworm (031 comment) [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:23:30] <wikibugs>	 (03PS3) 10Elukey: Update the Debian changelog to build on Bookworm [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969)
[10:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[10:25:18] <jynus>	 jayme, vgutierrez I will be around for some time still, but I am not sure the dbas are, today. Can you pass the Americas time the idea of what happened- I cannot be 100% sure it won't happen again on another host until automation/run changes
[10:25:57] <xSavitar>	 !log T12345 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'IloveuFlyTek' 'Theology1937' --ignorestatus
[10:25:58] <vgutierrez>	 jynus: will do
[10:26:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:01] <stashbot>	 T12345: Create "annotation" namespace on Hebrew Wikisource - https://phabricator.wikimedia.org/T12345
[10:26:30] <jynus>	 vgutierrez: basically, IF not pooled, like it was the case, ack/downtime and not worry
[10:28:21] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service kubestagemaster1003:6443 has failed probes (http_staging_eqiad_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster1003:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:29:14] <xSavitar>	 !log T374684 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'IloveuFlyTek' 'Theology1937' --ignorestatus
[10:29:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:18] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good, you can ignore the CI test." [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:29:18] <stashbot>	 T374684: Unblock stuck global rename of IloveuFlyTek, Iosonopony, Mohamadanisahmad5, Monty.ch - https://phabricator.wikimedia.org/T374684
[10:29:53] <xSavitar>	 !log T374684 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'Iosonopony' 'L.Sala'
[10:29:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:30:08] <wikibugs>	 (03CR) 10CI reject: [V:04-1] Update the Debian changelog to build on Bookworm [debs/chartmuseum] - 10https://gerrit.wikimedia.org/r/1071561 (https://phabricator.wikimedia.org/T331969) (owner: 10Elukey)
[10:30:36] <jinxer-wm>	 RESOLVED: [2x] ProbeDown: Service kubestagemaster1003:6443 has failed probes (http_staging_eqiad_kube_apiserver_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#kubestagemaster1003:6443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/custom&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[10:30:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072716 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[10:32:10] <wikibugs>	 (03CR) 10Muehlenhoff: "(The failing nodes in PCC fail for unrelated reasons to this change)" [puppet] - 10https://gerrit.wikimedia.org/r/1072690 (owner: 10Muehlenhoff)
[10:32:13] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Switch purged@cp2037 back to main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/1072720 (https://phabricator.wikimedia.org/T363210)
[10:33:18] <xSavitar>	 !log T374684 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mohamadanisahmad5' 'Vanished user a53a2dd4f79a7bde25cf2ea2b2a309cb'
[10:33:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:59] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3975/co" [puppet] - 10https://gerrit.wikimedia.org/r/1072720 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[10:35:00] <wikibugs>	 (03CR) 10JMeybohm: [C:03+1] hiera: Switch purged@cp2037 back to main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/1072720 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[10:36:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on mw2302:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2302 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[10:37:03] <xSavitar>	 !log T374684 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Monty.ch' 'MajorFault'
[10:37:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:37:07] <stashbot>	 T374684: Unblock stuck global rename of IloveuFlyTek, Iosonopony, Mohamadanisahmad5, Monty.ch - https://phabricator.wikimedia.org/T374684
[10:37:44] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] services: update thumbor-eqiad to poolcounter1006 [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072716 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[10:40:52] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1 C:03+2] hiera: Switch purged@cp2037 back to main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/1072720 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[10:44:24] <logmsgbot>	 !log arnaudb@cumin1002 START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1172.eqiad.wmnet with reason: Depooled recovering replag
[10:44:28] <logmsgbot>	 !log arnaudb@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1172.eqiad.wmnet with reason: Depooled recovering replag
[10:45:26] <wikibugs>	 (03PS2) 10Fabfur: prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[10:45:49] <wikibugs>	 (03CR) 10CI reject: [V:04-1] prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[10:49:50] <wikibugs>	 (03PS3) 10Fabfur: prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[10:53:30] <wikibugs>	 (03PS1) 10Btullis: Add ORKG triplestore to WDQS federation allowlist [puppet] - 10https://gerrit.wikimedia.org/r/1072723 (https://phabricator.wikimedia.org/T366485)
[10:54:07] <wikibugs>	 (03PS4) 10Fabfur: prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[10:56:02] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3976/co" [puppet] - 10https://gerrit.wikimedia.org/r/1072723 (https://phabricator.wikimedia.org/T366485) (owner: 10Btullis)
[10:56:45] <wikibugs>	 (03CR) 10CI reject: [V:04-1] prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[10:57:56] <Amir1>	 jynus: thanks. I take care of ti
[10:58:01] <Amir1>	 sorry for the mess
[11:00:05] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240913T0700)
[11:00:05] <jouncebot>	 eoghan, jelto, arnoldokoth, and mutante: I, the Bot under the Fountain, call upon thee, The Deployer, to do GitLab version upgrades deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20240913T1100).
[11:00:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143766 (10phaultfinder)
[11:00:31] <wikibugs>	 (03PS5) 10Fabfur: prometheus: enable haproxykafka scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[11:01:23] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] services: add new poolcounter nodes to MW configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072717 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[11:01:51] <wikibugs>	 (03PS8) 10Slyngshede: Permission validation: Handle validation for manager approvals better. [software/bitu] - 10https://gerrit.wikimedia.org/r/1072528
[11:02:11] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Swap poolcounter2003 with poolcounter2005 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072206 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[11:03:37] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[11:04:05] <wikibugs>	 06SRE, 10Wikimedia-Mailing-lists, 07Upstream: "list has X moderation requests waiting" email should provide a link - https://phabricator.wikimedia.org/T374694#10143781 (10Ladsgroup)
[11:05:10] <wikibugs>	 (03CR) 10Ladsgroup: "Legal gave their seal of approval" [puppet] - 10https://gerrit.wikimedia.org/r/1072265 (owner: 10Varnent)
[11:05:15] <wikibugs>	 (03PS2) 10Varnent: Updated license information from CC 3.0 to CC 4.0 per request from Legal. [puppet] - 10https://gerrit.wikimedia.org/r/1072265
[11:05:19] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] Updated license information from CC 3.0 to CC 4.0 per request from Legal. [puppet] - 10https://gerrit.wikimedia.org/r/1072265 (owner: 10Varnent)
[11:06:56] <wikibugs>	 (03CR) 10Fabfur: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072714 (https://phabricator.wikimedia.org/T374473) (owner: 10Fabfur)
[11:15:12] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143808 (10phaultfinder)
[11:16:56] <wikibugs>	 (03CR) 10Muehlenhoff: P:idp More precise base_dn for user lookup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1060396 (https://phabricator.wikimedia.org/T371930) (owner: 10Slyngshede)
[11:25:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10143827 (10phaultfinder)
[11:28:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Enable profile::auto_restarts::service for prometheus::pushgateway [puppet] - 10https://gerrit.wikimedia.org/r/1072733 (https://phabricator.wikimedia.org/T135991)
[11:30:37] <wikibugs>	 (03PS1) 10Btullis: Update the URL of the WikiPathways SPARQL endpoint to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/1072734 (https://phabricator.wikimedia.org/T364448)
[11:32:18] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072733 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[11:33:07] <wikibugs>	 (03CR) 10Btullis: [V:03+1] "PCC SUCCESS (CORE_DIFF 5): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3977/co" [puppet] - 10https://gerrit.wikimedia.org/r/1072734 (https://phabricator.wikimedia.org/T364448) (owner: 10Btullis)
[11:34:10] <wikibugs>	 (03PS2) 10Slyngshede: Menu: Add menu entry for managers to view pending permission requests. [software/bitu] - 10https://gerrit.wikimedia.org/r/1072547
[11:41:35] <wikibugs>	 (03CR) 10Slyngshede: Permission validation: Handle validation for manager approvals better. (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072528 (owner: 10Slyngshede)
[11:44:02] <wikibugs>	 (03PS9) 10Slyngshede: Permission validation: Handle validation for manager approvals better. [software/bitu] - 10https://gerrit.wikimedia.org/r/1072528
[11:47:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2059 to wikikube-worker2114
[11:47:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[11:50:39] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2059 to wikikube-worker2114 - akosiaris@cumin1002"
[11:52:12] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2059 to wikikube-worker2114 - akosiaris@cumin1002"
[11:52:12] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[11:52:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2114
[11:53:04] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2114
[11:53:43] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2059 to wikikube-worker2114
[11:54:20] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from kubernetes2060 to wikikube-worker2115
[11:54:43] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[11:59:51] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2060 to wikikube-worker2115 - akosiaris@cumin1002"
[12:05:17] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes2060 to wikikube-worker2115 - akosiaris@cumin1002"
[12:05:18] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:05:18] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2115
[12:05:32] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2115
[12:06:11] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes2060 to wikikube-worker2115
[12:07:15] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2301 to wikikube-worker2116
[12:07:26] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:07:42] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10143968 (10MoritzMuehlenhoff)
[12:09:21] <wikibugs>	 (03PS2) 10Slyngshede: Allow users to see log entires made by managers. [software/bitu] - 10https://gerrit.wikimedia.org/r/1072552
[12:11:33] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2114.codfw.wmnet
[12:11:43] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2115.codfw.wmnet
[12:11:59] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2301 to wikikube-worker2116 - akosiaris@cumin1002"
[12:12:39] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2114.codfw.wmnet with OS bullseye
[12:12:43] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2115.codfw.wmnet with OS bullseye
[12:12:49] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2114
[12:12:51] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2301 to wikikube-worker2116 - akosiaris@cumin1002"
[12:12:52] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:12:52] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2116
[12:13:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2116
[12:13:43] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2301 to wikikube-worker2116
[12:15:11] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:15:26] <wikibugs>	 (03CR) 10Slyngshede: Allow users to see log entires made by managers. (032 comments) [software/bitu] - 10https://gerrit.wikimedia.org/r/1072552 (owner: 10Slyngshede)
[12:15:42] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2303 to wikikube-worker2118
[12:16:16] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1562942216 and 4009 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:17:16] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1510545320 and 4069 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:18:16] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 6120 and 343 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:18:16] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 343 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:18:21] <jinxer-wm>	 FIRING: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:18:21] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2114 - akosiaris@cumin1002"
[12:19:57] <jinxer-wm>	 FIRING: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:20:07] <vgutierrez>	 !incidents
[12:20:08] <sirenbot>	 5166 (ACKED)  db1172 (paged)/MariaDB Replica Lag: s8 (paged)
[12:20:08] <sirenbot>	 5167 (UNACKED)  ProbeDown sre (10.2.2.88 ip4 mw-wikifunctions:4451 probes/service http_mw-wikifunctions_ip4 eqiad)
[12:20:08] <sirenbot>	 5165 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-codfw.wikimedia.org)
[12:20:08] <sirenbot>	 5164 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (cr4-ulsfo.wikimedia.org)
[12:20:12] <jynus>	 :-(
[12:20:12] <vgutierrez>	 !ack 5167
[12:20:12] <sirenbot>	 5167 (ACKED)  ProbeDown sre (10.2.2.88 ip4 mw-wikifunctions:4451 probes/service http_mw-wikifunctions_ip4 eqiad)
[12:20:24] <jayme>	 ah..I was wondering already
[12:20:40] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:20:48] <vgutierrez>	 neverending fun with wikifunctions
[12:22:14] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2114 - akosiaris@cumin1002"
[12:22:15] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:22:15] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2114.codfw.wmnet 102.0.192.10.in-addr.arpa 2.0.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:22:18] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2114.codfw.wmnet 102.0.192.10.in-addr.arpa 2.0.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:22:19] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2114
[12:23:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2114
[12:23:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2114
[12:24:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2303 to wikikube-worker2118 - akosiaris@cumin1002"
[12:24:04] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2303 to wikikube-worker2118 - akosiaris@cumin1002"
[12:24:04] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:24:06] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2118
[12:24:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2115
[12:24:21] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2118
[12:24:22] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] "current code doesn't enable scraping given that haproxykafka ensure parameter is never set to present" [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[12:24:43] <jinxer-wm>	 RESOLVED: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:24:57] <jinxer-wm>	 RESOLVED: ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#mw-wikifunctions:4451 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:25:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2303 to wikikube-worker2118
[12:25:26] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2304 to wikikube-worker2119
[12:25:51] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:26:02] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:26:02] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[12:26:36] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2116.codfw.wmnet
[12:26:58] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2116.codfw.wmnet with OS bullseye
[12:27:29] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+2] team-sre: tweak MediaWikiLoginFailures threshold [alerts] - 10https://gerrit.wikimedia.org/r/1072657 (https://phabricator.wikimedia.org/T350597) (owner: 10Filippo Giunchedi)
[12:27:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] Enable profile::auto_restarts::service for prometheus::pushgateway [puppet] - 10https://gerrit.wikimedia.org/r/1072733 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[12:29:14] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2304 to wikikube-worker2119 - akosiaris@cumin1002"
[12:29:44] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2304 to wikikube-worker2119 - akosiaris@cumin1002"
[12:29:44] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:29:45] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2119
[12:29:57] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2119
[12:30:36] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2304 to wikikube-worker2119
[12:30:54] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:31:11] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2305 to wikikube-worker2120
[12:31:57] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2117.codfw.wmnet on all recursors
[12:32:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2117.codfw.wmnet on all recursors
[12:33:50] <wikibugs>	 07Puppet, 06SRE, 06Infrastructure-Foundations, 10Keyholder: keyholder-proxy doesn't restart on config change - https://phabricator.wikimedia.org/T374711 (10fgiunchedi) 03NEW
[12:34:10] <jinxer-wm>	 RESOLVED: KubernetesRsyslogDown: rsyslog on mw2302:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2302 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[12:35:07] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove old parsoid certs [puppet] - 10https://gerrit.wikimedia.org/r/1072737 (https://phabricator.wikimedia.org/T359387)
[12:35:24] <wikibugs>	 (03PS1) 10Muehlenhoff: labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750)
[12:35:27] <jinxer-wm>	 FIRING: [2x] ProbeDown: Service mw-wikifunctions:4451 has failed probes (http_mw-wikifunctions_ip4) #page  - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:35:35] <wikibugs>	 (03PS2) 10Muehlenhoff: Remove old parsoid certs [puppet] - 10https://gerrit.wikimedia.org/r/1072737 (https://phabricator.wikimedia.org/T359387)
[12:35:38] <jayme>	 !incidents
[12:35:39] <sirenbot>	 5166 (ACKED)  db1172 (paged)/MariaDB Replica Lag: s8 (paged)
[12:35:39] <sirenbot>	 5168 (UNACKED)  ProbeDown sre (ip4 probes/service eqiad)
[12:35:39] <sirenbot>	 5167 (RESOLVED)  ProbeDown sre (10.2.2.88 ip4 mw-wikifunctions:4451 probes/service http_mw-wikifunctions_ip4 eqiad)
[12:35:39] <sirenbot>	 5165 (RESOLVED)  Primary outbound port utilisation over 80%  (paged) global noc (cr1-codfw.wikimedia.org)
[12:35:39] <sirenbot>	 5164 (RESOLVED)  Primary inbound port utilisation over 80%  (paged) global noc (cr4-ulsfo.wikimedia.org)
[12:35:44] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2115 - akosiaris@cumin1002"
[12:35:48] <jayme>	 !ack 5168
[12:35:48] <sirenbot>	 5168 (ACKED)  ProbeDown sre (ip4 probes/service eqiad)
[12:35:59] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:36:04] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1020 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers titan1002.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers titan1002.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[12:36:04] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1019 is CRITICAL: PYBAL CRITICAL - CRITICAL - thanos-query_443: Servers titan1002.eqiad.wmnet are marked down but pooled: thanos-web_443: Servers titan1002.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[12:36:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2115 - akosiaris@cumin1002"
[12:36:05] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:36:05] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2115.codfw.wmnet 124.0.192.10.in-addr.arpa 4.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:36:08] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2115.codfw.wmnet 124.0.192.10.in-addr.arpa 4.2.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:36:09] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2115
[12:36:29] <godog>	 mmhh thanos' unhappy too
[12:36:35] <godog>	 I'm taking a look
[12:37:00] <godog>	 jayme: ^ FYI
[12:37:12] <jayme>	 saw, thanks
[12:37:17] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2115
[12:37:17] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2115
[12:37:47] <jayme>	 so...given that wikifunctions is known to be broken - how do we feel about making it non-paging until it's fixed?
[12:38:07] <jayme>	 there is really no reason to pull anyone out of the weekend when nothing can be done
[12:38:29] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2116
[12:38:41] <godog>	 +1
[12:38:51] <bblack>	 is it completely broken, or just can't handle load or what?
[12:39:05] <jayme>	 more like completely broken
[12:39:12] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:39:33] <bblack>	 if it's timing out or not taking conns or something, we might want to disable it somewhere in traffic if we're leaving it dead for a while
[12:39:35] <wikibugs>	 (03PS6) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[12:39:37] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2305 to wikikube-worker2120 - akosiaris@cumin1002"
[12:39:41] <wikibugs>	 (03CR) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[12:39:41] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2305 to wikikube-worker2120 - akosiaris@cumin1002"
[12:39:41] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:39:42] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2118.codfw.wmnet on all recursors
[12:39:42] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2120
[12:39:42] <bblack>	 so the impact doesn't spread through cache clusters
[12:39:45] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2118.codfw.wmnet on all recursors
[12:39:55] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:40:05] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2118.codfw.wmnet
[12:40:19] <jayme>	 bblack: IIUC there are a bunch of URLs that lead to some infinite loop that is killed after 60s, which saturates workers
[12:40:28] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2120
[12:40:38] <bblack>	 yeah that sounds like a good reason to disable it
[12:40:47] <bblack>	 those 60s tie up cp-node threads, too
[12:40:51] <jayme>	 but its an isolated mw instance, so the worker saturation does not spread to actual wikis
[12:41:07] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2305 to wikikube-worker2120
[12:41:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2118.codfw.wmnet with OS bullseye
[12:41:14] <jayme>	 indeed, might still be a problem for cp nodes
[12:41:30] <godog>	 !log bounce thanos-query-frontend on titan eqiad
[12:41:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:49] <jayme>	 but we have not seen that yet
[12:42:01] <wikibugs>	 (03PS7) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[12:43:59] <wikibugs>	 (03PS1) 10Hashar: rdbms: only count replication sources toward "masterConns" in getServerConnection() [core] (wmf/1.43.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1072739 (https://phabricator.wikimedia.org/T374534)
[12:43:59] <bblack>	 or maybe we can just turn down the timeout for the mw-wikifunctions backend to reduce the impact there
[12:44:04] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1020 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[12:44:04] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1019 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[12:44:07] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.rename from mw2302 to wikikube-worker2117
[12:44:12] <jinxer-wm>	 FIRING: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:44:15] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10netops: netbox: create IPv6 entries for Cloud VPS - https://phabricator.wikimedia.org/T374712 (10aborrero) 03NEW
[12:44:59] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:45:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144154 (10phaultfinder)
[12:45:07] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2119.codfw.wmnet
[12:45:10] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2116 - akosiaris@cumin1002"
[12:45:25] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2119.codfw.wmnet with OS bullseye
[12:45:27] <jinxer-wm>	 RESOLVED: ProbeDown: Service thanos-query:443 has failed probes (http_thanos-query_ip4) #page - https://wikitech.wikimedia.org/wiki/Runbook#thanos-query:443 - https://grafana.wikimedia.org/d/O0nHhdhnz/network-probes-overview?var-job=probes/service&var-module=All - https://alerts.wikimedia.org/?q=alertname%3DProbeDown
[12:45:36] <jayme>	 bblack: given that cp did raise any issues up until now we might be okay without...do you know what the timeout is currently? With >60s at least users get some kind of proper error message
[12:45:56] <wikibugs>	 (03CR) 10Hashar: [C:03+2] "Amir suggested to backport it immediately in the interest of cutting the log spam in `rdbms` :)" [core] (wmf/1.43.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1072739 (https://phabricator.wikimedia.org/T374534) (owner: 10Hashar)
[12:46:09] <wikibugs>	 (03PS8) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[12:46:15] <wikibugs>	 (03PS1) 10JMeybohm: Disable paging for mw-wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1072740 (https://phabricator.wikimedia.org/T374231)
[12:46:48] <bblack>	 jayme: yeah we might be ok, so long as the traffic to those slow requests remains stable
[12:47:12] <bblack>	 I guess if not, someone could use standard requestctl to shut it off, vs figuring out some tricky timeout thing today.
[12:47:52] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: cloudsw: codfw: enable IPv6 - https://phabricator.wikimedia.org/T374713 (10aborrero) 03NEW
[12:48:13] <jayme>	 right...also we've been in the "ballpark" of 5rps for it - I don't think it's expected to raise
[12:48:21] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: openstack: verify security groups settings for IPv6 - https://phabricator.wikimedia.org/T374714 (10aborrero) 03NEW
[12:49:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2116 - akosiaris@cumin1002"
[12:49:00] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:49:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2116.codfw.wmnet 171.0.192.10.in-addr.arpa 1.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:49:03] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2116.codfw.wmnet 171.0.192.10.in-addr.arpa 1.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:49:04] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2116
[12:49:12] <jinxer-wm>	 RESOLVED: [4x] JobUnavailable: Reduced availability for job thanos-query in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[12:49:32] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2302 to wikikube-worker2117 - akosiaris@cumin1002"
[12:49:36] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2302 to wikikube-worker2117 - akosiaris@cumin1002"
[12:49:36] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:49:36] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2117
[12:49:47] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: openstack: work out IPv6 and designate integration - https://phabricator.wikimedia.org/T374715 (10aborrero) 03NEW
[12:50:25] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: openstack: work out IPv6 and designate integration - https://phabricator.wikimedia.org/T374715#10144206 (10aborrero)
[12:50:28] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: openstack: verify security groups settings for IPv6 - https://phabricator.wikimedia.org/T374714#10144207 (10aborrero)
[12:50:33] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 early PoC - https://phabricator.wikimedia.org/T245495#10144208 (10aborrero)
[12:50:52] <wikibugs>	 (03CR) 10Clément Goubert: [C:03+1] Disable paging for mw-wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1072740 (https://phabricator.wikimedia.org/T374231) (owner: 10JMeybohm)
[12:51:45] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2117
[12:51:58] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2116
[12:51:58] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2116
[12:52:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2119
[12:52:20] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2120.codfw.wmnet
[12:52:24] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2302 to wikikube-worker2117
[12:52:35] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2120.codfw.wmnet with OS bullseye
[12:52:52] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[12:53:17] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2117.codfw.wmnet
[12:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[12:53:37] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2117.codfw.wmnet
[12:53:58] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2117.codfw.wmnet on all recursors
[12:54:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2117.codfw.wmnet on all recursors
[12:54:10] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2117.codfw.wmnet
[12:54:33] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2117.codfw.wmnet with OS bullseye
[12:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[12:55:00] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144214 (10aborrero)
[12:55:16] <wikibugs>	 (03CR) 10Fabfur: "Don't know spicerack kafka APIs but aside from the two minor observations on docstrings looks good!" [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[12:55:29] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144223 (10aborrero)
[12:56:05] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: cloudgw: add support and enable IPv6 - https://phabricator.wikimedia.org/T374716 (10aborrero) 03NEW
[12:56:19] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: cloudsw: codfw: enable IPv6 - https://phabricator.wikimedia.org/T374713#10144237 (10aborrero)
[12:56:22] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: cloudgw: add support and enable IPv6 - https://phabricator.wikimedia.org/T374716#10144238 (10aborrero)
[12:56:28] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144239 (10aborrero)
[12:56:34] <wikibugs>	 (03CR) 10JMeybohm: [C:03+2] Disable paging for mw-wikifunctions [puppet] - 10https://gerrit.wikimedia.org/r/1072740 (https://phabricator.wikimedia.org/T374231) (owner: 10JMeybohm)
[12:57:11] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144245 (10aborrero)
[12:57:19] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2119 - akosiaris@cumin1002"
[12:57:23] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2119 - akosiaris@cumin1002"
[12:57:23] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[12:57:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2119.codfw.wmnet 174.0.192.10.in-addr.arpa 4.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:57:27] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2119.codfw.wmnet 174.0.192.10.in-addr.arpa 4.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[12:57:27] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2119
[12:57:33] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144241 (10aborrero)
[12:57:35] <wikibugs>	 06SRE, 06cloud-services-team, 06Infrastructure-Foundations, 10netops: CloudVPS: IPv6 in codfw1dev - https://phabricator.wikimedia.org/T245495#10144246 (10aborrero)
[13:00:06] <logmsgbot>	 !log aqu@deploy1003 Started deploy [airflow-dags/analytics_test@5315c8d]: Test Refine through Airflow
[13:00:14] <wikibugs>	 (03PS9) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[13:00:19] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:00:36] <logmsgbot>	 !log aqu@deploy1003 Finished deploy [airflow-dags/analytics_test@5315c8d]: Test Refine through Airflow (duration: 00m 31s)
[13:01:26] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2119
[13:01:26] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2119
[13:01:31] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2120
[13:01:45] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:01:47] <wikibugs>	 (03PS10) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[13:02:02] <wikibugs>	 (03CR) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:03:36] <wikibugs>	 (03CR) 10Vgutierrez: cache:haproxykafka: first stub classes to allow prometheus scraping (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[13:05:19] <wikibugs>	 (03PS1) 10Muehlenhoff: deployment servers: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1072744
[13:05:28] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2120 - akosiaris@cumin1002"
[13:05:33] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2120 - akosiaris@cumin1002"
[13:05:33] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:05:33] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2120.codfw.wmnet 175.0.192.10.in-addr.arpa 5.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:05:36] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2120.codfw.wmnet 175.0.192.10.in-addr.arpa 5.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:05:37] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2120
[13:09:42] <wikibugs>	 (03CR) 10Fabfur: sre.cdn: Add transfer-purged-positions cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:10:04] <wikibugs>	 (03PS4) 10Ssingh: P:ntp and nagios_core: add new command ntp_check_peer_and_stratum [puppet] - 10https://gerrit.wikimedia.org/r/1072276
[13:10:41] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2118.codfw.wmnet with OS bullseye
[13:10:42] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2118.codfw.wmnet
[13:10:55] <wikibugs>	 (03CR) 10Ssingh: P:ntp and nagios_core: add new command ntp_check_peer_and_stratum (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[13:11:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2120
[13:11:01] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2120
[13:11:01] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (CORE_DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3978/co" [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[13:11:33] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072744 (owner: 10Muehlenhoff)
[13:11:49] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2118.codfw.wmnet
[13:12:06] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2118.codfw.wmnet with OS bullseye
[13:12:13] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2117
[13:12:21] <wikibugs>	 (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/output/1072276/3979/dns1004.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[13:15:13] <wikibugs>	 (03Merged) 10jenkins-bot: rdbms: only count replication sources toward "masterConns" in getServerConnection() [core] (wmf/1.43.0-wmf.22) - 10https://gerrit.wikimedia.org/r/1072739 (https://phabricator.wikimedia.org/T374534) (owner: 10Hashar)
[13:15:23] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:16:53] <logmsgbot>	 !log hashar@deploy1003 Started scap sync-world: Backport for [[gerrit:1072739|rdbms: only count replication sources toward "masterConns" in getServerConnection() (T374534)]]
[13:16:57] <stashbot>	 T374534: Lots of "Expectation (masterConns <= 0) by ApiMain::setRequestExpectations not met" involving external store (2024-09-05) - https://phabricator.wikimedia.org/T374534
[13:17:12] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:17:20] <wikibugs>	 (03PS2) 10Muehlenhoff: deployment servers: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1072744
[13:17:53] <wikibugs>	 (03PS11) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[13:20:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144279 (10phaultfinder)
[13:20:10] <logmsgbot>	 !log hashar@deploy1003 hashar: Backport for [[gerrit:1072739|rdbms: only count replication sources toward "masterConns" in getServerConnection() (T374534)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
[13:21:03] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2117 - akosiaris@cumin1002"
[13:21:07] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2117 - akosiaris@cumin1002"
[13:21:07] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:21:07] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2117.codfw.wmnet 172.0.192.10.in-addr.arpa 2.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:21:10] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2117.codfw.wmnet 172.0.192.10.in-addr.arpa 2.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:21:12] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2117
[13:22:10] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2117
[13:22:11] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2117
[13:22:16] <wikibugs>	 (03CR) 10Bking: flink-app: customize calico label selector (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072236 (https://phabricator.wikimedia.org/T373195) (owner: 10Bking)
[13:22:17] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2118
[13:22:47] <logmsgbot>	 !log hashar@deploy1003 hashar: Continuing with sync
[13:27:28] <logmsgbot>	 !log hashar@deploy1003 Finished scap sync-world: Backport for [[gerrit:1072739|rdbms: only count replication sources toward "masterConns" in getServerConnection() (T374534)]] (duration: 10m 34s)
[13:27:32] <stashbot>	 T374534: Lots of "Expectation (masterConns <= 0) by ApiMain::setRequestExpectations not met" involving external store (2024-09-05) - https://phabricator.wikimedia.org/T374534
[13:27:39] <wikibugs>	 (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072744 (owner: 10Muehlenhoff)
[13:28:53] <wikibugs>	 (03PS7) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[13:29:06] <wikibugs>	 (03CR) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[13:33:50] <wikibugs>	 (03CR) 10Vgutierrez: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[13:33:52] <wikibugs>	 06SRE, 06Infrastructure-Foundations: Integrate Bookworm 12.6 point update - https://phabricator.wikimedia.org/T374536#10144303 (10MoritzMuehlenhoff)
[13:34:17] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] sre.cdn: Add transfer-purged-positions cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:37:24] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.netbox
[13:38:21] <wikibugs>	 (03CR) 10Ladsgroup: [V:03+2 C:03+2] "Acknowledged" [puppet] - 10https://gerrit.wikimedia.org/r/1072265 (owner: 10Varnent)
[13:39:09] <wikibugs>	 (03PS12) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736
[13:39:34] <wikibugs>	 (03CR) 10Vgutierrez: sre.cdn: Add transfer-purged-positions cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:39:47] <wikibugs>	 (03PS1) 10Muehlenhoff: wmcs::novaproxy: Avoid Ferm-specific syntax [puppet] - 10https://gerrit.wikimedia.org/r/1072751
[13:39:55] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:40:44] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2118 - akosiaris@cumin1002"
[13:40:49] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2118 - akosiaris@cumin1002"
[13:40:49] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:40:49] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.dns.wipe-cache wikikube-worker2118.codfw.wmnet 173.0.192.10.in-addr.arpa 3.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:40:52] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2118.codfw.wmnet 173.0.192.10.in-addr.arpa 3.7.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[13:40:53] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2118
[13:42:02] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Add wikikube-worker2117-2120 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1072752
[13:42:03] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2114.codfw.wmnet with OS bullseye
[13:42:04] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2114.codfw.wmnet
[13:44:26] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Add wikikube-worker2117-2120 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/1072752 (owner: 10Alexandros Kosiaris)
[13:48:34] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: switch purged@codfw,ulsfo,eqsin back to codfw kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/1072753 (https://phabricator.wikimedia.org/T363210)
[13:50:28] <wikibugs>	 (03CR) 10Vgutierrez: [C:04-1] cache:haproxykafka: first stub classes to allow prometheus scraping (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[13:52:07] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "I made https://gerrit.wikimedia.org/r/c/operations/puppet/+/1072690 for this" [puppet] - 10https://gerrit.wikimedia.org/r/1071925 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[13:52:29] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2118
[13:52:29] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2118
[13:52:39] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: account for unknown probe types [puppet] - 10https://gerrit.wikimedia.org/r/1072303 (https://phabricator.wikimedia.org/T372664) (owner: 10JHathaway)
[13:52:44] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2114.codfw.wmnet
[13:53:08] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1] "PCC SUCCESS (CORE_DIFF 1 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/" [puppet] - 10https://gerrit.wikimedia.org/r/1072753 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[13:53:14] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2114.codfw.wmnet with OS bullseye
[13:53:39] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2114.codfw.wmnet with OS bullseye
[13:53:40] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2114.codfw.wmnet
[13:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[13:55:43] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2115.codfw.wmnet with OS bullseye
[13:55:43] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2115.codfw.wmnet
[13:57:27] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] sre.cdn: Add transfer-purged-positions cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/1072736 (owner: 10Vgutierrez)
[13:57:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Temporarily disable stunnel for the Puppet 7 migration of deployment hosts" [puppet] - 10https://gerrit.wikimedia.org/r/1072754 (https://phabricator.wikimedia.org/T349619)
[13:58:32] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2114.codfw.wmnet
[13:59:00] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2114.codfw.wmnet with OS bullseye
[13:59:25] <wikibugs>	 (03PS8) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[13:59:40] <wikibugs>	 (03CR) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[13:59:47] <wikibugs>	 (03CR) 10CI reject: [V:04-1] cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[14:00:05] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "Site list looks good, PCC looks good." [puppet] - 10https://gerrit.wikimedia.org/r/1072753 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[14:00:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2115.codfw.wmnet
[14:00:24] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2115.codfw.wmnet
[14:00:56] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2115.codfw.wmnet
[14:01:19] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.reimage for host wikikube-worker2115.codfw.wmnet with OS bullseye
[14:01:21] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2114.codfw.wmnet with OS bullseye
[14:01:22] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2114.codfw.wmnet
[14:01:26] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] "https://puppet-compiler.wmflabs.org/output/1072753/3981/cp4052.ulsfo.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/1072753 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[14:01:30] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2115.codfw.wmnet with OS bullseye
[14:01:30] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2115.codfw.wmnet
[14:02:28] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/1070681 (https://phabricator.wikimedia.org/T352245) (owner: 10Scott French)
[14:03:13] <wikibugs>	 (03PS1) 10FNegri: R:wmcs::db::wikireplicas remove access from cloudcumin [puppet] - 10https://gerrit.wikimedia.org/r/1072755 (https://phabricator.wikimedia.org/T344599)
[14:04:08] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+1] "Makes sense to me." [puppet] - 10https://gerrit.wikimedia.org/r/1072740 (https://phabricator.wikimedia.org/T374231) (owner: 10JMeybohm)
[14:05:38] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2120.codfw.wmnet with reason: host reimage
[14:06:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C:03+1] P:etcd::tlsproxy: add support for PKI certs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/1070681 (https://phabricator.wikimedia.org/T352245) (owner: 10Scott French)
[14:07:32] <jinxer-wm>	 FIRING: KubernetesCalicoDown: wikikube-worker2114.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations - https://grafana.wikimedia.org/d/G8zPL7-Wz/?var-dc=codfw%20prometheus%2Fk8s&var-instance=wikikube-worker2114.codfw.wmnet - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[14:08:11] <wikibugs>	 (03CR) 10FNegri: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072755 (https://phabricator.wikimedia.org/T344599) (owner: 10FNegri)
[14:08:39] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops: Degraded RAID on wikikube-worker2092 - https://phabricator.wikimedia.org/T374409#10144395 (10Jhancock.wm) or the delivery gets messed up. will update when I have it in hand.
[14:08:48] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2116.codfw.wmnet with OS bullseye
[14:08:49] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2116.codfw.wmnet
[14:09:02] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2120.codfw.wmnet with reason: host reimage
[14:09:08] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2118.codfw.wmnet with reason: host reimage
[14:10:32] <wikibugs>	 (03CR) 10Vgutierrez: [V:03+1 C:03+2] hiera: switch purged@codfw,ulsfo,eqsin back to codfw kafka cluster [puppet] - 10https://gerrit.wikimedia.org/r/1072753 (https://phabricator.wikimedia.org/T363210) (owner: 10Vgutierrez)
[14:11:48] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2118.codfw.wmnet with reason: host reimage
[14:11:59] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops: Degraded RAID on cloudvirt2004-dev - https://phabricator.wikimedia.org/T374422#10144401 (10Jhancock.wm) shipping has gone awry. will update when it's in hand
[14:15:28] <wikibugs>	 (03PS9) 10Fabfur: cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696)
[14:17:29] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.transfer-purged-positions rolling custom on P{cp2036*} and A:cp
[14:18:20] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2119.codfw.wmnet with OS bullseye
[14:18:20] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2119.codfw.wmnet
[14:19:13] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.transfer-purged-positions (exit_code=0) rolling custom on P{cp2036*} and A:cp
[14:19:28] <logmsgbot>	 !log ladsgroup@cumin1002 START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1172.eqiad.wmnet with reason: Schema change (T367856)
[14:19:32] <stashbot>	 T367856: Cleanup revision table schema - https://phabricator.wikimedia.org/T367856
[14:19:32] <logmsgbot>	 !log ladsgroup@cumin1002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1172.eqiad.wmnet with reason: Schema change (T367856)
[14:20:49] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: add explicit typecast [puppet] - 10https://gerrit.wikimedia.org/r/1072301 (https://phabricator.wikimedia.org/T372664) (owner: 10JHathaway)
[14:20:50] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[14:21:50] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review: Strict mode enabled by default - https://phabricator.wikimedia.org/T372664#10144446 (10jhathaway)
[14:23:14] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] "LGTM re: prometheus bits" [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[14:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[14:25:05] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144482 (10phaultfinder)
[14:30:02] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2118.codfw.wmnet with OS bullseye
[14:31:07] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure, 10LDAP-Access-Requests: Requesting access to `contint-admins`, `contint-docker`, LDAP `ciadmin` for 'Arthur taylor' - https://phabricator.wikimedia.org/T373969#10144491 (10Ladsgroup)
[14:31:42] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure, 10LDAP-Access-Requests: Requesting access to `contint-admins`, `contint-docker`, LDAP `ciadmin` for 'Arthur taylor' - https://phabricator.wikimedia.org/T373969#10144496 (10Ladsgroup)
[14:31:53] <wikibugs>	 (03CR) 10Ssingh: [C:03+2] P:ntp and nagios_core: add new command ntp_check_peer_and_stratum [puppet] - 10https://gerrit.wikimedia.org/r/1072276 (owner: 10Ssingh)
[14:32:49] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.transfer-purged-positions rolling custom on P{cp2035*} and A:cp
[14:33:17] <akosiaris>	 !log homer cr*codfw* commit 'T372878'
[14:33:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:20] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[14:33:24] <akosiaris>	 !log homer lsw1-a6-codfw* commit 'T372878'
[14:33:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:42] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.transfer-purged-positions (exit_code=0) rolling custom on P{cp2035*} and A:cp
[14:35:34] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops: Degraded RAID on wikikube-worker2092 - https://phabricator.wikimedia.org/T374409#10144500 (10Clement_Goubert) Logistics... Thanks for the update!
[14:35:38] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10Continuous-Integration-Infrastructure, 10LDAP-Access-Requests: Requesting access to `contint-admins`, `contint-docker`, LDAP `ciadmin` for 'Arthur taylor' - https://phabricator.wikimedia.org/T373969#10144503 (10Ladsgroup) The ssh key you provided here is different the existi...
[14:39:12] <jinxer-wm>	 FIRING: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[14:39:16] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2117.codfw.wmnet with OS bullseye
[14:39:17] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2117.codfw.wmnet
[14:40:46] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2118.codfw.wmnet
[14:41:25] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 301, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:41:39] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker2118.codfw.wmnet
[14:41:40] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=99) Renumbering for host wikikube-worker2118.codfw.wmnet
[14:42:36] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+1] cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[14:43:22] <wikibugs>	 (03PS1) 10Ladsgroup: admin: Add Cyndywikime to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/1072758 (https://phabricator.wikimedia.org/T374595)
[14:44:11] <wikibugs>	 (03CR) 10CI reject: [V:04-1] admin: Add Cyndywikime to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/1072758 (https://phabricator.wikimedia.org/T374595) (owner: 10Ladsgroup)
[14:44:55] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2120.codfw.wmnet with OS bullseye
[14:47:29] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 383, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[14:48:12] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
[14:48:40] <logmsgbot>	 !log hnowlan@deploy1003 helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
[14:50:29] <wikibugs>	 (03PS1) 10DCausse: cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072759
[14:50:59] <wikibugs>	 (03PS2) 10Ladsgroup: admin: Add Cyndywikime to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/1072758 (https://phabricator.wikimedia.org/T374595)
[14:51:15] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] Remove old parsoid certs [puppet] - 10https://gerrit.wikimedia.org/r/1072737 (https://phabricator.wikimedia.org/T359387) (owner: 10Muehlenhoff)
[14:51:29] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C:03+2] labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[14:51:31] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V:03+2 C:03+2] labs-private: Remove parsoid stub secrets [labs/private] - 10https://gerrit.wikimedia.org/r/1072738 (https://phabricator.wikimedia.org/T357750) (owner: 10Muehlenhoff)
[14:51:36] <wikibugs>	 (03CR) 10Fabfur: [C:03+2] cache:haproxykafka: first stub classes to allow prometheus scraping [puppet] - 10https://gerrit.wikimedia.org/r/1072719 (https://phabricator.wikimedia.org/T374696) (owner: 10Fabfur)
[14:51:49] <wikibugs>	 (03CR) 10CI reject: [V:04-1] admin: Add Cyndywikime to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/1072758 (https://phabricator.wikimedia.org/T374595) (owner: 10Ladsgroup)
[14:53:34] <wikibugs>	 (03Abandoned) 10Ladsgroup: admin: Add Cyndywikime to ldap only users [puppet] - 10https://gerrit.wikimedia.org/r/1072758 (https://phabricator.wikimedia.org/T374595) (owner: 10Ladsgroup)
[14:54:56] <wikibugs>	 (03CR) 10Cwhite: "Seems like there's a local statsite instance currently in use.  Any objections to using it rather than the main one?" [puppet] - 10https://gerrit.wikimedia.org/r/1072632 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[14:57:03] <wikibugs>	 (03CR) 10Cwhite: "We'll need to coordinate a zuul restart to activate this.  Rollback is a revert of this patch.  Does someone from releng want to be involv" [puppet] - 10https://gerrit.wikimedia.org/r/1072633 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[14:58:00] <wikibugs>	 06SRE, 10LDAP-Access-Requests, 13Patch-For-Review: LDAP access to the wmf group for Cyndywikime - https://phabricator.wikimedia.org/T374595#10144551 (10Ladsgroup) You seem to be already in wmf ldap group? https://ldap.toolforge.org/user/cyndywikime
[15:01:10] <wikibugs>	 (03PS1) 10Scott French: kubernetes: re-name / IP mw231[345] [puppet] - 10https://gerrit.wikimedia.org/r/1072762 (https://phabricator.wikimedia.org/T372878)
[15:02:32] <wikibugs>	 (03CR) 10Hnowlan: [C:03+1] kubernetes: re-name / IP mw231[345] [puppet] - 10https://gerrit.wikimedia.org/r/1072762 (https://phabricator.wikimedia.org/T372878) (owner: 10Scott French)
[15:02:32] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: wikikube-worker2114.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:04:12] <jinxer-wm>	 RESOLVED: JobUnavailable: Reduced availability for job sidekiq in ops@eqiad - https://wikitech.wikimedia.org/wiki/Prometheus#Prometheus_job_unavailable - https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets - https://alerts.wikimedia.org/?q=alertname%3DJobUnavailable
[15:05:47] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops, 06Machine-Learning-Team: Q1:rack/setup/install ml-serve1009-1011 (3x), ml-lab1001-1002 (2x), dse-k8s-worker1009 (1x) - https://phabricator.wikimedia.org/T372432#10144567 (10Jclark-ctr) 05Open→03Resolved
[15:12:43] <akosiaris>	 !log homer lsw1-a6-codfw* commit T372878
[15:12:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:47] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[15:14:03] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2313.codfw.wmnet
[15:14:40] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2313.codfw.wmnet
[15:15:06] <icinga-wm>	 PROBLEM - BGP status on lsw1-a6-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:15:08] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2314.codfw.wmnet
[15:15:41] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2314.codfw.wmnet
[15:16:04] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node depool for host mw2315.codfw.wmnet
[15:16:37] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host mw2315.codfw.wmnet
[15:16:49] <wikibugs>	 (03PS1) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:17:18] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2120.codfw.wmnet
[15:17:20] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2120.codfw.wmnet
[15:17:21] <logmsgbot>	 !log akosiaris@cumin1002 END (FAIL) - Cookbook sre.k8s.renumber-node (exit_code=1) Renumbering for host wikikube-worker2120.codfw.wmnet
[15:17:32] <jinxer-wm>	 FIRING: [3x] KubernetesCalicoDown: wikikube-worker2114.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:17:41] <wikibugs>	 (03CR) 10Scott French: [C:03+2] kubernetes: re-name / IP mw231[345] [puppet] - 10https://gerrit.wikimedia.org/r/1072762 (https://phabricator.wikimedia.org/T372878) (owner: 10Scott French)
[15:19:47] <wikibugs>	 (03PS2) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:20:06] <icinga-wm>	 RECOVERY - BGP status on lsw1-a6-codfw.mgmt is OK: BGP OK - up: 36, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:21:53] <wikibugs>	 (03PS3) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:22:11] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.rename from mw2313 to wikikube-worker2121
[15:22:17] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2014.codfw.wmnet
[15:22:19] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2014.codfw.wmnet
[15:22:32] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:22:32] <jinxer-wm>	 RESOLVED: [3x] KubernetesCalicoDown: wikikube-worker2114.codfw.wmnet is not running calico-node Pod - https://wikitech.wikimedia.org/wiki/Calico#Operations  - https://alerts.wikimedia.org/?q=alertname%3DKubernetesCalicoDown
[15:23:07] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2111.codfw.wmnet
[15:23:07] <logmsgbot>	 !log akosiaris@cumin1002 END (ERROR) - Cookbook sre.k8s.pool-depool-node (exit_code=97) pool for host wikikube-worker2111.codfw.wmnet
[15:23:16] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2114.codfw.wmnet
[15:23:18] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2114.codfw.wmnet
[15:23:23] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2115.codfw.wmnet
[15:23:25] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2115.codfw.wmnet
[15:23:30] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2116.codfw.wmnet
[15:23:32] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2116.codfw.wmnet
[15:23:37] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2117.codfw.wmnet
[15:23:39] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2117.codfw.wmnet
[15:23:44] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2118.codfw.wmnet
[15:23:45] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2118.codfw.wmnet
[15:23:50] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2119.codfw.wmnet
[15:23:52] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2119.codfw.wmnet
[15:23:53] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:23:57] <logmsgbot>	 !log akosiaris@cumin1002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2120.codfw.wmnet
[15:23:59] <logmsgbot>	 !log akosiaris@cumin1002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2120.codfw.wmnet
[15:24:26] <icinga-wm>	 PROBLEM - BGP status on cr2-codfw is CRITICAL: BGP CRITICAL - AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw, AS64602/IPv6: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[15:25:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144586 (10phaultfinder)
[15:26:08] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2313 to wikikube-worker2121 - swfrench@cumin2002"
[15:26:31] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2313 to wikikube-worker2121 - swfrench@cumin2002"
[15:26:32] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:26:33] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2121
[15:26:45] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2121
[15:27:25] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2313 to wikikube-worker2121
[15:28:12] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.rename from mw2314 to wikikube-worker2122
[15:28:34] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:28:40] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: replace to_pson with to_json [puppet] - 10https://gerrit.wikimedia.org/r/1071962 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[15:31:53] <wikibugs>	 (03PS4) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:32:04] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2314 to wikikube-worker2122 - swfrench@cumin2002"
[15:32:33] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2314 to wikikube-worker2122 - swfrench@cumin2002"
[15:32:33] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:32:35] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2122
[15:32:40] <jinxer-wm>	 FIRING: KubernetesRsyslogDown: rsyslog on mw2315:9105 is missing kubernetes logs - https://wikitech.wikimedia.org/wiki/Kubernetes/Logging#Common_issues - https://grafana.wikimedia.org/d/OagQjQmnk?var-server=mw2315 - https://alerts.wikimedia.org/?q=alertname%3DKubernetesRsyslogDown
[15:32:54] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2122
[15:33:30] <wikibugs>	 (03CR) 10JHathaway: [V:03+1 C:03+2] puppet8: ensure kerberos keytab type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072593 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[15:33:34] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2314 to wikikube-worker2122
[15:34:10] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.rename from mw2315 to wikikube-worker2123
[15:34:31] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:37:46] <wikibugs>	 (03Abandoned) 10JHathaway: Revert "P:tlsproxy::instance: Drop numa_networking global" [puppet] - 10https://gerrit.wikimedia.org/r/1072290 (owner: 10JHathaway)
[15:37:53] <wikibugs>	 (03PS1) 10Scott French: mw-debug: add initial "next" release (attempt 2) [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072764 (https://phabricator.wikimedia.org/T372604)
[15:38:06] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2315 to wikikube-worker2123 - swfrench@cumin2002"
[15:38:09] <wikibugs>	 (03Abandoned) 10JHathaway: WIP: test pcc do not merge [puppet] - 10https://gerrit.wikimedia.org/r/1057967 (owner: 10JHathaway)
[15:38:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] zuul: set statsd-exporter to relay to local statsite instance [puppet] - 10https://gerrit.wikimedia.org/r/1072632 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[15:38:23] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C:03+1] zuul: send stats to prometheus-statsd-exporter [puppet] - 10https://gerrit.wikimedia.org/r/1072633 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[15:38:42] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2315 to wikikube-worker2123 - swfrench@cumin2002"
[15:38:42] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:38:43] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2123
[15:38:54] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2123
[15:39:34] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2315 to wikikube-worker2123
[15:40:36] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review: Drop PSON support - https://phabricator.wikimedia.org/T372667#10144667 (10jhathaway)
[15:43:33] <wikibugs>	 (03CR) 10CI reject: [V:04-1] sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763 (owner: 10Vgutierrez)
[15:45:35] <wikibugs>	 (03PS5) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:46:18] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2121.codfw.wmnet wikikube-worker2122.codfw.wmnet wikikube-worker2123.codfw.wmnet on all recursors
[15:46:21] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2121.codfw.wmnet wikikube-worker2122.codfw.wmnet wikikube-worker2123.codfw.wmnet on all recursors
[15:48:22] <wikibugs>	 (03CR) 10Scott French: "Alexandros, since you kindly reviewed the original patch, if you could take a look at this attempt #2, that would be greatly appreciated!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072764 (https://phabricator.wikimedia.org/T372604) (owner: 10Scott French)
[15:49:25] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install sretest2001 - https://phabricator.wikimedia.org/T365167#10144676 (10Jhancock.wm) This is where to find the settings in the bios. {F57505755}  once in the bios the ports will be labeled as such (they aren't intuitively named)...
[15:49:31] <wikibugs>	 (03CR) 10Ebernhardson: [C:03+1] cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072759 (owner: 10DCausse)
[15:50:06] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2121.codfw.wmnet
[15:50:36] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2121.codfw.wmnet with OS bullseye
[15:50:48] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2121
[15:51:05] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[15:51:17] <wikibugs>	 (03PS1) 10JHathaway: puppet8: ensure gpg keyring type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072768 (https://phabricator.wikimedia.org/T372667)
[15:51:29] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072768 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[15:52:28] <wikibugs>	 (03CR) 10Bking: [C:03+2] wdqs: do not add categories on main and scholarly endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1070958 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[15:53:05] <wikibugs>	 (03PS6) 10Vgutierrez: sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763
[15:53:21] <wikibugs>	 (03CR) 10Bking: [C:03+2] wdqs: fix CATEGORY_ENDPOINT env var [puppet] - 10https://gerrit.wikimedia.org/r/1071877 (https://phabricator.wikimedia.org/T374016) (owner: 10DCausse)
[15:54:08] <wikibugs>	 (03CR) 10DCausse: [C:03+2] cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072759 (owner: 10DCausse)
[15:54:10] <wikibugs>	 (03PS1) 10JHathaway: puppet8: ensure kerberos keytab type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072769 (https://phabricator.wikimedia.org/T372667)
[15:54:32] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072769 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[15:55:11] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2121 - swfrench@cumin2002"
[15:55:16] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2121 - swfrench@cumin2002"
[15:55:16] <wikibugs>	 (03Merged) 10jenkins-bot: cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072759 (owner: 10DCausse)
[15:55:16] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:55:17] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2121.codfw.wmnet 162.16.192.10.in-addr.arpa 2.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:55:20] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2121.codfw.wmnet 162.16.192.10.in-addr.arpa 2.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[15:55:21] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2121
[15:55:28] <wikibugs>	 (03CR) 10Ssingh: [C:03+1] sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763 (owner: 10Vgutierrez)
[15:55:35] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2121
[15:55:36] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2121
[15:56:40] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[15:56:47] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2122.codfw.wmnet
[15:57:18] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2122.codfw.wmnet with OS bullseye
[15:57:26] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: reimage gerrit1004.wikimedia.org as phab1005.eqiad.wmnet - https://phabricator.wikimedia.org/T372817#10144691 (10Dzahn) I ran the decom cookbook (without and with --force) but it errors out with   ` spicerack.netbox.NetboxHostNotFoundError: gerrit...
[15:57:28] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[15:57:30] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2122
[15:57:35] <wikibugs>	 (03PS1) 10JHathaway: puppet8: ensure dns cookie type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072770 (https://phabricator.wikimedia.org/T372667)
[15:57:45] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072770 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[15:57:59] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[16:01:04] <wikibugs>	 (03PS1) 10DCausse: Revert "cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072771
[16:01:44] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2122 - swfrench@cumin2002"
[16:01:50] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2122 - swfrench@cumin2002"
[16:01:50] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:01:51] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2122.codfw.wmnet 163.16.192.10.in-addr.arpa 3.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[16:01:54] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2122.codfw.wmnet 163.16.192.10.in-addr.arpa 3.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[16:01:55] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2122
[16:02:05] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2122
[16:02:05] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2122
[16:02:23] <wikibugs>	 (03CR) 10DCausse: [C:03+2] Revert "cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072771 (owner: 10DCausse)
[16:02:27] <wikibugs>	 (03PS8) 10Bking: wdqs: common module and profile should not define categories_endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:02:30] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:02:41] <wikibugs>	 (03CR) 10CI reject: [V:04-1] wdqs: common module and profile should not define categories_endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:03:05] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.renumber-node Renumbering for host wikikube-worker2123.codfw.wmnet
[16:03:23] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "cirrus-streaming-updater: test resolve_canonical_bootstrap_servers_only" [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072771 (owner: 10DCausse)
[16:03:32] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.reimage for host wikikube-worker2123.codfw.wmnet with OS bullseye
[16:03:43] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.move-vlan for host wikikube-worker2123
[16:05:29] <wikibugs>	 (03PS9) 10Bking: wdqs: common module and profile should not define categories_endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:05:35] <wikibugs>	 (03PS1) 10JHathaway: puppet8: ensure java ssh key type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072773 (https://phabricator.wikimedia.org/T372667)
[16:05:40] <wikibugs>	 (03CR) 10Bking: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:05:47] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072773 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:05:56] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.netbox
[16:06:02] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
[16:06:08] <logmsgbot>	 !log dcausse@deploy1003 helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
[16:07:18] <wikibugs>	 (03CR) 10Vgutierrez: [C:03+2] sre.cdn.transfer-purged-positions: Do not use transfer_consumer_position [cookbooks] - 10https://gerrit.wikimedia.org/r/1072763 (owner: 10Vgutierrez)
[16:07:31] <dduvall>	 !log performing friday deployment of jenkins-deploy (releases server) to fix broken job (see https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/81)
[16:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:07:42] <logmsgbot>	 !log dduvall@deploy1003 Started deploy [releng/jenkins-deploy@e8b4e0b] (releasing): (no justification provided)
[16:08:25] <logmsgbot>	 !log dduvall@deploy1003 Finished deploy [releng/jenkins-deploy@e8b4e0b] (releasing): (no justification provided) (duration: 00m 43s)
[16:08:58] <wikibugs>	 (03CR) 10Bking: [C:03+2] wdqs: common module and profile should not define categories_endpoint [puppet] - 10https://gerrit.wikimedia.org/r/1070956 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:09:49] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2123 - swfrench@cumin2002"
[16:09:53] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker2123 - swfrench@cumin2002"
[16:09:54] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[16:09:54] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.dns.wipe-cache wikikube-worker2123.codfw.wmnet 164.16.192.10.in-addr.arpa 4.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[16:09:57] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker2123.codfw.wmnet 164.16.192.10.in-addr.arpa 4.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
[16:09:58] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2123
[16:10:12] <wikibugs>	 (03PS7) 10DCausse: wdqs: do not add categories on main and scholarly endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1070958 (https://phabricator.wikimedia.org/T374009)
[16:10:30] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2123
[16:10:31] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker2123
[16:12:07] <wikibugs>	 (03CR) 10Bking: wdqs: do not add categories on main and scholarly endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1070958 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:12:12] <wikibugs>	 (03CR) 10Bking: [V:03+2 C:03+2] wdqs: do not add categories on main and scholarly endpoints [puppet] - 10https://gerrit.wikimedia.org/r/1070958 (https://phabricator.wikimedia.org/T374009) (owner: 10DCausse)
[16:12:37] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2121.codfw.wmnet with reason: host reimage
[16:13:02] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review: Drop PSON support - https://phabricator.wikimedia.org/T372667#10144748 (10jhathaway)
[16:13:24] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.transfer-purged-positions rolling custom on P{cp2027*} and A:cp
[16:15:24] <wikibugs>	 (03PS1) 10JHathaway: puppet8: ensure java keystore type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072777 (https://phabricator.wikimedia.org/T372667)
[16:15:32] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.transfer-purged-positions (exit_code=0) rolling custom on P{cp2027*} and A:cp
[16:15:35] <wikibugs>	 (03CR) 10JHathaway: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072777 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:16:00] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2121.codfw.wmnet with reason: host reimage
[16:16:59] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 06Infrastructure-Foundations: Q4:rack/setup/install sretest2001 - https://phabricator.wikimedia.org/T365167#10144759 (10Jhancock.wm) more exposition! in the case of this particular configuration, these are the names of the ports on the server in the Advanced option menu. The o...
[16:18:29] <logmsgbot>	 !log vgutierrez@cumin1002 START - Cookbook sre.cdn.transfer-purged-positions rolling custom on P{cp[2028-2034,2038-2042].codfw.wmnet,cp[5017,5019-5020,5023,5027-5028,5030].eqsin.wmnet,cp[4038-4052].ulsfo.wmnet} and A:cp
[16:18:47] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2122.codfw.wmnet with reason: host reimage
[16:21:58] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: ensure gpg keyring type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072768 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:23:16] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2122.codfw.wmnet with reason: host reimage
[16:25:09] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: ensure kerberos keytab type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072769 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:27:24] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2123.codfw.wmnet with reason: host reimage
[16:27:58] <sukhe>	 SAL is down it seems, the web interface
[16:29:38] <Lucas_WMDE>	 aw, sounds like https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Tools.sal/SAL&diff=prev&oldid=2225689 didn’t work then 😔
[16:29:39] * Lucas_WMDE looks
[16:30:34] <Lucas_WMDE>	 hm, kubectl get events says “Container webservice failed liveness probe, will be restarted” 4m41s ago at least
[16:30:40] <Lucas_WMDE>	 and now another one 3s ago
[16:31:09] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2123.codfw.wmnet with reason: host reimage
[16:32:23] <Lucas_WMDE>	 sukhe: better now?
[16:32:53] <sukhe>	 thanks Lucas_WMDE <3
[16:35:18] <Lucas_WMDE>	 and now for the actual reason I came back into this channel :D
[16:35:24] <Lucas_WMDE>	 I don’t even remember what the context for https://bash.toolforge.org/quip/HptC3JEBFFSCpsJzSng3 was
[16:35:33] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2121.codfw.wmnet with OS bullseye
[16:35:53] <Lucas_WMDE>	 on the off chance that whoever quipped it is around… would you mind pinging me in future? I find it odd to only discover these via the list of new quips later 😅
[16:36:20] <sukhe>	 usually we take permission for sharing anything there (I wasn't the one who added it, just remarking)
[16:36:21] <Lucas_WMDE>	 (likewise https://bash.toolforge.org/quip/Sj3KapEBKFqumxvtIHYX, though I think I happened to see that one within a day of writing it so I still remembered the context ^^)
[16:37:12] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: ensure dns cookie type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072770 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:38:54] <swfrench-wmf>	 !log running homer lsw1-b3-codfw* commit 'T372878'
[16:38:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:38:59] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[16:41:09] <icinga-wm>	 PROBLEM - BGP status on lsw1-b3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv6: Connect - kubernetes-codfw, AS64602/IPv4: Active - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:41:41] <swfrench-wmf>	 ^ expected - waiting on host reboot for session to come up
[16:42:01] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: ensure java ssh key type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072773 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:43:09] <icinga-wm>	 RECOVERY - BGP status on lsw1-b3-codfw.mgmt is OK: BGP OK - up: 34, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:43:22] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2122.codfw.wmnet with OS bullseye
[16:43:56] <Lucas_WMDE>	 sukhe: I was actually wondering about that and submitted https://github.com/bd808/quips/pull/31/files to document it in the quips tool itself, so feel free to reply there if you like (it sounds like what I inferred / guessed isn’t necessarily what other people think)
[16:46:09] <icinga-wm>	 PROBLEM - BGP status on lsw1-b3-codfw.mgmt is CRITICAL: BGP CRITICAL - AS64602/IPv4: Connect - kubernetes-codfw, AS64602/IPv6: Connect - kubernetes-codfw https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:46:33] <wikibugs>	 10ops-eqiad, 06SRE, 06collaboration-services, 06DC-Ops: reimage gerrit1004.wikimedia.org as phab1005.eqiad.wmnet - https://phabricator.wikimedia.org/T372817#10144861 (10Dzahn) ` [puppetserver1001:~] $ sudo puppet node clean gerrit1004.wikimedia.org Notice: Certificate for gerrit1004.wikimedia.org has b...
[16:46:45] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2121.codfw.wmnet
[16:46:47] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2121.codfw.wmnet
[16:46:48] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2121.codfw.wmnet
[16:47:03] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppet8: ensure java keystore type is binary [puppet] - 10https://gerrit.wikimedia.org/r/1072777 (https://phabricator.wikimedia.org/T372667) (owner: 10JHathaway)
[16:48:09] <icinga-wm>	 RECOVERY - BGP status on lsw1-b3-codfw.mgmt is OK: BGP OK - up: 34, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:50:01] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure, 13Patch-For-Review: Drop PSON support - https://phabricator.wikimedia.org/T372667#10144879 (10jhathaway)
[16:50:57] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2123.codfw.wmnet with OS bullseye
[16:52:32] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2122.codfw.wmnet
[16:52:34] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2122.codfw.wmnet
[16:52:35] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2122.codfw.wmnet
[16:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: wdqs-categories.service on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[16:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[16:55:07] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144911 (10phaultfinder)
[16:55:41] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 3:00:00 on wdqs[2021-2024].codfw.wmnet with reason: T373791
[16:55:45] <stashbot>	 T373791: Transfer a sane journal (subgraph:main) to wdqs2021 from wdqs2022 - https://phabricator.wikimedia.org/T373791
[16:55:57] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wdqs[2021-2024].codfw.wmnet with reason: T373791
[16:56:14] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.downtime for 3:00:00 on wdqs[1021-1024].eqiad.wmnet with reason: T373935
[16:56:18] <stashbot>	 T373935: WDQS graph split: cleanup monitoring/alerting now that we are in production - https://phabricator.wikimedia.org/T373935
[16:56:31] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wdqs[1021-1024].eqiad.wmnet with reason: T373935
[16:57:49] <swfrench-wmf>	 !log running homer cr*codfw* commit 'T372878'
[16:57:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:57:53] <stashbot>	 T372878: Re-IP wikikube servers in codfw row A/B moving to per-rack subnets - https://phabricator.wikimedia.org/T372878
[16:59:03] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 295, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[16:59:07] <logmsgbot>	 !log swfrench@cumin2002 START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2123.codfw.wmnet
[16:59:09] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2123.codfw.wmnet
[16:59:10] <logmsgbot>	 !log swfrench@cumin2002 END (PASS) - Cookbook sre.k8s.renumber-node (exit_code=0) Renumbering for host wikikube-worker2123.codfw.wmnet
[17:02:31] <wikibugs>	 (03PS1) 10Ahmon Dancy: gitlab-settings: v1.7.0 for bugfix [puppet] - 10https://gerrit.wikimedia.org/r/1072785
[17:03:05] <wikibugs>	 (03CR) 10CI reject: [V:04-1] gitlab-settings: v1.7.0 for bugfix [puppet] - 10https://gerrit.wikimedia.org/r/1072785 (owner: 10Ahmon Dancy)
[17:03:37] <wikibugs>	 (03PS2) 10Ahmon Dancy: gitlab-settings: v1.7.0 for bugfix [puppet] - 10https://gerrit.wikimedia.org/r/1072785
[17:06:01] <icinga-wm>	 RECOVERY - BGP status on cr2-codfw is OK: BGP OK - up: 377, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[17:10:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10144964 (10phaultfinder)
[17:11:06] <wikibugs>	 10ops-codfw, 06DC-Ops, 10Prod-Kubernetes, 06serviceops, 07Kubernetes: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T374733 (10Scott_French) 03NEW
[17:11:11] <wikibugs>	 (03PS1) 10Ahmon Dancy: gitlab: Sync people/wmde GitLab group w/ wmde LDAP group [puppet] - 10https://gerrit.wikimedia.org/r/1072786
[17:21:50] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: mediawiki-image-download: Drop to 5% [puppet] - 10https://gerrit.wikimedia.org/r/1070550 (https://phabricator.wikimedia.org/T366778)
[17:29:43] <wikibugs>	 10ops-codfw, 06SRE, 06DC-Ops, 10Prod-Kubernetes, and 2 others: Relabel codfw kubernetes nodes - https://phabricator.wikimedia.org/T374733#10145018 (10akosiaris)
[17:30:16] <wikibugs>	 (03CR) 10Ssingh: "For some reason, PCC is running on old cp hosts (decommissioned for more than a year). Beyond that, I am still checking why this a NOOP an" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[17:38:38] <logmsgbot>	 !log vgutierrez@cumin1002 END (PASS) - Cookbook sre.cdn.transfer-purged-positions (exit_code=0) rolling custom on P{cp[2028-2034,2038-2042].codfw.wmnet,cp[5017,5019-5020,5023,5027-5028,5030].eqsin.wmnet,cp[4038-4052].ulsfo.wmnet} and A:cp
[17:38:40] <wikibugs>	 06SRE, 06Data-Engineering, 10Observability-Logging, 10Wikimedia-Logstash, 10Event-Platform: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645#10145027 (10EBernhardson) This would have been useful to debug T374662, aggregating the times out of elasticsearch is a bit hard as...
[17:38:45] <vgutierrez>	 \o/(
[17:42:20] <wikibugs>	 06SRE, 06Data-Engineering, 10Observability-Logging, 10Wikimedia-Logstash, 10Event-Platform: Integrate Event Platform and ECS logs - https://phabricator.wikimedia.org/T291645#10145036 (10CDanis) Similar but different: {T304373}
[17:49:12] <wikibugs>	 06SRE, 10SRE-Access-Requests, 10LDAP-Access-Requests: Vacation coverage for Katie Francis - https://phabricator.wikimedia.org/T374673#10145270 (10Dzahn) a:05Dzahn→03None
[17:52:38] <wikibugs>	 (03CR) 10JHathaway: "it may be an issue with the regex Hosts line? I'll file a task to look into it." [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[17:53:49] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gitlab-settings: v1.7.0 for bugfix [puppet] - 10https://gerrit.wikimedia.org/r/1072785 (owner: 10Ahmon Dancy)
[17:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[17:55:14] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] gitlab: Sync people/wmde GitLab group w/ wmde LDAP group [puppet] - 10https://gerrit.wikimedia.org/r/1072786 (owner: 10Ahmon Dancy)
[17:55:51] <dancy>	 Thanks mutante!
[17:56:29] <mutante>	 yw 
[17:59:10] <wikibugs>	 (03CR) 10Scott French: [C:03+1] services: add new poolcounter nodes to MW configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/1072717 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[18:00:42] <wikibugs>	 (03CR) 10Scott French: [C:03+1] Swap poolcounter2003 with poolcounter2005 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072206 (https://phabricator.wikimedia.org/T332015) (owner: 10Elukey)
[18:10:06] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10145290 (10phaultfinder)
[18:10:40] <wikibugs>	 (03CR) 10Jforrester: Improve $wgFooterIcons override, remove $wmgWikimediaIcon (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1071712 (owner: 10Bartosz Dziewoński)
[18:11:46] <wikibugs>	 (03PS4) 10Ssingh: haproxy: re-add numa support [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:12:54] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3986/console" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:16:03] <wikibugs>	 (03PS3) 10Jasmine: icinga: add jasmine to icinga authorizations [puppet] - 10https://gerrit.wikimedia.org/r/1071964
[18:22:54] <wikibugs>	 (03CR) 10Jasmine: "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/1071964 (owner: 10Jasmine)
[18:23:29] <wikibugs>	 (03PS5) 10Ssingh: haproxy: re-add numa support [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:23:50] <wikibugs>	 (03CR) 10Scott French: "Thanks, Moritz! I'll keep you posted on an ETA for when this is happening." [puppet] - 10https://gerrit.wikimedia.org/r/1070681 (https://phabricator.wikimedia.org/T352245) (owner: 10Scott French)
[18:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[18:24:18] <wikibugs>	 (03CR) 10RLazarus: [C:03+2] icinga: add jasmine to icinga authorizations [puppet] - 10https://gerrit.wikimedia.org/r/1071964 (owner: 10Jasmine)
[18:24:41] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/label=puppet7-compiler-node/3987/console" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:27:50] <wikibugs>	 (03CR) 10Ssingh: [V:03+1] "Hi Jesse: After looking at this a bit more deeply, we got lucky here when the cp-specific block in realm.pp was removed. That would have r" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:28:53] <wikibugs>	 (03PS6) 10Ssingh: haproxy: re-add numa support [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:28:54] <wikibugs>	 (03CR) 10Ssingh: "Commit message updated." [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:56:32] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] "makes sense, thanks for the careful review and updated patch" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[18:57:00] <wikibugs>	 (03CR) 10JHathaway: [C:03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[19:05:02] <wikibugs>	 (03PS1) 10Scott French: [DNM] service: move mwdebug-next to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/1072796 (https://phabricator.wikimedia.org/T372604)
[19:06:00] <wikibugs>	 (03CR) 10Ssingh: "Will merge Monday morning, to be extra sure (even if it is a NOOP)" [puppet] - 10https://gerrit.wikimedia.org/r/1072566 (https://phabricator.wikimedia.org/T350008) (owner: 10JHathaway)
[19:07:55] <wikibugs>	 (03PS1) 10JHathaway: puppetserver: remove empty hiera data files [puppet] - 10https://gerrit.wikimedia.org/r/1072797
[19:08:37] <wikibugs>	 (03PS1) 10Scott French: [DNM] service: move mwdebug-next to production [puppet] - 10https://gerrit.wikimedia.org/r/1072798 (https://phabricator.wikimedia.org/T372604)
[19:16:49] <wikibugs>	 (03CR) 10JHathaway: [C:03+2] puppetserver: remove empty hiera data files [puppet] - 10https://gerrit.wikimedia.org/r/1072797 (owner: 10JHathaway)
[19:39:56] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Idle - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:45:52] <icinga-wm>	 PROBLEM - BGP status on cr3-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv6: Active - Anycast, AS64605/IPv4: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:45:56] <icinga-wm>	 PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS64605/IPv4: Active - Anycast, AS64605/IPv6: Active - Anycast https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[19:50:08] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10145515 (10phaultfinder)
[19:52:28] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.remove-downtime for wdqs[1021-1024].eqiad.wmnet
[19:52:30] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1021-1024].eqiad.wmnet
[19:52:38] <logmsgbot>	 !log bking@cumin2002 START - Cookbook sre.hosts.remove-downtime for wdqs[2021-2024].codfw.wmnet
[19:52:40] <logmsgbot>	 !log bking@cumin2002 END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[2021-2024].codfw.wmnet
[20:00:55] <jinxer-wm>	 RESOLVED: SystemdUnitFailed: wdqs-categories.service on wdqs2021:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:09:20] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: Drop PSON support - https://phabricator.wikimedia.org/T372667#10145556 (10jhathaway)
[20:25:13] <wikibugs>	 10ops-eqiad, 06SRE, 06DC-Ops: PDU sensor over limit - https://phabricator.wikimedia.org/T374573#10145582 (10phaultfinder)
[20:27:28] <wikibugs>	 (03CR) 10Jdlrobson: "Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1072623 (owner: 10Ebrahim)
[20:40:18] <wikibugs>	 (03CR) 10Bartosz Dziewoński: Improve $wgFooterIcons override, remove $wmgWikimediaIcon (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/1071712 (owner: 10Bartosz Dziewoński)
[20:53:25] <jinxer-wm>	 FIRING: SystemdUnitFailed: dump_ip_reputation.service on puppetserver1002:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed
[20:54:56] <jinxer-wm>	 FIRING: RoutinatorRTRConnections: Important drop of Routinator RTR connections on rpki2002:9556 - https://wikitech.wikimedia.org/wiki/RPKI#RTR_Connections_drop - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRTRConnections
[20:59:31] <wikibugs>	 (03CR) 10Cwhite: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/1072632 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[21:03:57] <wikibugs>	 (03PS1) 10Dwisehaupt: frack: remove fraban2001 from dns for decommissioning [dns] - 10https://gerrit.wikimedia.org/r/1072812 (https://phabricator.wikimedia.org/T374741)
[21:06:14] <wikibugs>	 (03CR) 10Cwhite: [C:03+2] zuul: set statsd-exporter to relay to local statsite instance [puppet] - 10https://gerrit.wikimedia.org/r/1072632 (https://phabricator.wikimedia.org/T233089) (owner: 10Cwhite)
[21:07:22] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: Drop PSON support - https://phabricator.wikimedia.org/T372667#10145782 (10jhathaway)
[21:08:23] <wikibugs>	 (03PS1) 10Dwisehaupt: icinga: remove frban2001 for decommissioning [puppet] - 10https://gerrit.wikimedia.org/r/1072813 (https://phabricator.wikimedia.org/T374741)
[21:11:18] <logmsgbot>	 !log dwisehaupt@cumin1002 START - Cookbook sre.dns.netbox
[21:11:31] <wikibugs>	 06SRE, 06Infrastructure-Foundations, 10Puppet-Infrastructure: Drop PSON support - https://phabricator.wikimedia.org/T372667#10145786 (10jhathaway) 05Open→03Resolved a:03jhathaway All known uses of pson have been removed. However, since we cannot disable support on 7.23, I don't think there is anyth...
[21:14:41] <logmsgbot>	 !log dwisehaupt@cumin1002 START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decommissioning frban2001 - dwisehaupt@cumin1002"
[21:14:45] <logmsgbot>	 !log dwisehaupt@cumin1002 END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decommissioning frban2001 - dwisehaupt@cumin1002"
[21:14:46] <logmsgbot>	 !log dwisehaupt@cumin1002 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[21:21:56] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] vrts: switch inactive host vrts2001 to nftables as firewall provider [puppet] - 10https://gerrit.wikimedia.org/r/1072313 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[21:24:44] <logmsgbot>	 !log dzahn@cumin2002 START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2001.codfw.wmnet with reason: nftables migration
[21:24:47] <logmsgbot>	 !log dzahn@cumin2002 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2001.codfw.wmnet with reason: nftables migration
[21:25:51] <wikibugs>	 (03CR) 10Dzahn: [C:03+2] "looked good, rebooted" [puppet] - 10https://gerrit.wikimedia.org/r/1072313 (https://phabricator.wikimedia.org/T370677) (owner: 10Dzahn)
[21:54:42] <jinxer-wm>	 FIRING: [4x] SwiftObjectCountSiteDisparity: MediaWiki swift object counts site diffs - https://wikitech.wikimedia.org/wiki/Swift/How_To - https://grafana.wikimedia.org/d/OPgmB1Eiz/swift - https://alerts.wikimedia.org/?q=alertname%3DSwiftObjectCountSiteDisparity
[22:23:56] <jinxer-wm>	 FIRING: [2x] RoutinatorRsyncErrors: Routinator rsync fetching issue in codfw - https://wikitech.wikimedia.org/wiki/RPKI#RSYNC_status - https://grafana.wikimedia.org/d/UwUa77GZk/rpki - https://alerts.wikimedia.org/?q=alertname%3DRoutinatorRsyncErrors
[23:38:34] <wikibugs>	 (03PS1) 10TrainBranchBot: Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1072823
[23:38:34] <wikibugs>	 (03CR) 10TrainBranchBot: [C:03+2] Branch commit for wmf/branch_cut_pretest [core] (wmf/branch_cut_pretest) - 10https://gerrit.wikimedia.org/r/1072823 (owner: 10TrainBranchBot)